Most JPEG files have EXIF metadata in them.
For Perl, there is the handy-dandy Image::ExifTool library that encapsulates ExifTool.
With this, one can whip up quick programs to do all kinds of things. For example, I have lots of photos scattered across hard disks and iPhoto libraries. Lots of these are duplicates, and some have broken metadata. Reorganizing by hand is painful, but reorganizing with a program is only painful for the 15 minutes it takes to write one.
Some cameras and most smartphones even have GPS nowadays, so if you want to find all the pictures you took in England, you could do a quick search across the metadata.
Also, since actually scanning hard disks is slow, another thing would be to extract all the metadata and put it in one place. For this, someday I hope we’ll have FUSE-level capabilities for Windows and Mac OS X, so I could write a FUSE filesystem to automatically maintain the metadata database. If you’re unfamiliar with FUSE, here’s a teaser article: http://pramode.net/articles/lfy/fuse/pramode.html.
Here’s an example of what you have access to. This is the metadata that ExifTool pulled out of one of my photos.
---- ExifTool ----
ExifTool Version Number : 9.01
---- File ----
File Name : IMG_0890.JPG
Directory : Z:\Media\Images\bfitz\Pictures\iPhoto Library/Modified/2008/Jul 20, 2008
File Size : 742 kB
File Modification Date/Time : 2008:07:30 12:01:32-07:00
File Permissions : rw-rw-rw-
File Type : JPEG
MIME Type : image/jpeg
Exif Byte Order : Big-endian (Motorola, MM)
Comment : AppleMark\x0A
Image Width : 1200
Image Height : 1600
Encoding Process : Baseline DCT, Huffman coding
Bits Per Sample : 8
Color Components : 3
Y Cb Cr Sub Sampling : YCbCr4:2:2 (2 1)
---- EXIF ----
Make : Apple
Camera Model Name : iPhone
Orientation : Horizontal (normal)
Orientation : Horizontal (normal)
X Resolution : 72
Y Resolution : 72
Resolution Unit : inches
Software : QuickTime 7.5
Modify Date : 2008:07:30 12:01:32
Host Computer : Mac OS X 10.4.9
Y Cb Cr Positioning : Centered
F Number : 2.8
Exif Version : 0220
Date/Time Original : 2008:07:20 14:51:42
Create Date : 2008:07:20 14:51:42
Color Space : sRGB
Compression : JPEG (old-style)
X Resolution : 72
Y Resolution : 72
Resolution Unit : inches
Thumbnail Offset : 470
Thumbnail Length : 5639
Y Cb Cr Positioning : Centered
---- ICC_Profile ----
Profile CMM Type : appl
Profile Version : 2.2.0
Profile Class : Input Device Profile
Color Space Data : RGB
Profile Connection Space : XYZ
Profile Date Time : 2003:07:01 00:00:00
Profile File Signature : acsp
Primary Platform : Apple Computer Inc.
CMM Flags : Not Embedded, Independent
Device Manufacturer : appl
Device Model :
Device Attributes : Reflective, Glossy, Positive, Color
Rendering Intent : Perceptual
Connection Space Illuminant : 0.9642 1 0.82491
Profile Creator : appl
Profile ID : 0
Red Matrix Column : 0.45427 0.24263 0.01482
Green Matrix Column : 0.35332 0.67441 0.09042
Blue Matrix Column : 0.15662 0.08336 0.71953
Media White Point : 0.95047 1 1.0891
Chromatic Adaptation : 1.04788 0.02292 -0.0502 0.02957 0.99049 -0.01706 -0.00923 0.01508 0.75165
Red Tone Reproduction Curve : curv\x00\x00\x00\x00\x00\x00\x00\x01\x023
Green Tone Reproduction Curve : curv\x00\x00\x00\x00\x00\x00\x00\x01\x023
Blue Tone Reproduction Curve : curv\x00\x00\x00\x00\x00\x00\x00\x01\x023
Profile Description : Camera RGB Profile
Profile Copyright : Copyright 2003 Apple Computer Inc., all rights reserved.
Profile Description ML : Camera RGB Profile
Profile Description ML (es-ES) : Perfil RGB para Cámara
Profile Description ML (da-DK) : RGB-beskrivelse til Kamera
Profile Description ML (de-DE) : RGB-Profil für Kameras
Profile Description ML (fi-FI) : Kameran RGB-profiili
Profile Description ML (fr-FU) : Profil RVB de l’appareil-photo
Profile Description ML (it-IT) : Profilo RGB Fotocamera
Profile Description ML (nl-NL) : RGB-profiel Camera
Profile Description ML (no-NO) : RGB-kameraprofil
Profile Description ML (pt-BR) : Perfil RGB de Câmera
Profile Description ML (sv-SE) : RGB-profil för Kamera
Profile Description ML (ja-JP) : カメラ RGB プãƒãƒ•ァイル
Profile Description ML (ko-KR) : ì¹´ë©”ë¼ RGB 프로파ì¼
Profile Description ML (zh-TW) : 數ä½ç›¸æ©Ÿ RGB 色彩æè¿°
Profile Description ML (zh-CN) : 相机 RGB æè¿°æ–‡ä»¶
---- Composite ----
Aperture : 2.8
Image Size : 1200x1600
Thumbnail Image : <thumbnail image data>
It’s pretty daunting. But you can see I took this with an iPhone, the orientation is horizontal, it’s a 1200×1600 image at 72 DPI. Also note that I modified this photo, and you can see when it was modified, and that it was likely modified on a Mac (the “Host Computer” key).
What are some things you would want to fix?
Sometimes a photo orientation is recorded incorrectly. Rather than mess with rotating it, you could just fix the metadata. Obviously, it’s an AI task for a program to look at a photo and say “hey, the orientation is wrong”, but once you’ve determined this, my personal preference would be to fix the original data. Another thing you can do is fix dates – sometimes my photos claim to have been taken in 2000, or 2015, or some other crazy date. So, I would prefer to fix the metadata in the photo.
Also note that JPEGs can have embedded thumbnails. You might want to strip those out. Easy enough to do. Or you can insert new thumbnails. You can even add your own metadata. Or you could fix the GPS location if you have better information than what was stamped in the photo (my iPhone 3G has been spectacularly off for indoor pictures, sometimes a mile off).
Here is the start of a program I wrote to find and manipulate metadata in photos. I guess I should start putting this on Github rather that pasted into blog postings.
#!/usr/bin/perl
use 5.014; # implies use strict
use warnings;
use feature ':5.14';
use Digest::MD5;
use Data::Dumper;
use File::Find;
use Getopt::Long qw();
use Image::ExifTool;
use Time::HiRes qw();
Fix->new()->run();
# =======================================================================================
# =======================================================================================
package Fix;
use parent -norequire, 'Object';
sub run
{
my ($self) = @_;
$self->{'option'} = Option->new()->read_options();
$self->{'diag'} = Diag->new()->init($self->{'option'}->{'verbose'});
$self->do_commands();
}
sub do_commands
{
my ($self) = @_;
Scan->new()->scan($self->{'option'}, $self->{'diag'});
}
# =======================================================================================
# =======================================================================================
package Scan;
use parent -norequire, 'Object';
sub scan
{
my $self = shift;
$self->{'option'} = shift;
$self->{'diag'} = shift;
$self->{'start_time'} = Time::HiRes::time();
$self->{'file_count'} = 0;
$self->{'bytes_processed'} = 0;
$self->{'exif_tool'} = new Image::ExifTool;
$self->{'exif_tool'}->Options(Unknown => 1);
my $process = sub { $self->process($_); };
no warnings 'File::Find'; # don't want noise about dirs we can't iterate
&File::Find::find({wanted => $process}, $self->{'option'}->{'base'});
}
sub process
{
my ($self, $file) = @_;
@{$self->{'stat'}} = stat($file);
return unless -f _;
return unless $file =~ /\.(JPG|JPEG|jpg|jpeg)$/;
return if $file =~ /\.\_/; # skip ._ AppleDouble files
$self->{'file'} = $file;
$self->{'path'} = $File::Find::name;
$self->{'file_size'} = -s _;
return if $self->{'path'} =~ /\.AppleDouble/; # skip fake JPG inside AppleDouble folders
$self->{'file_count'} += 1;
$self->{'bytes_processed'} += $self->{'file_size'};
my $info = $self->{'exif_tool'}->ImageInfo($self->{'path'});
print sprintf("%10s %s\n", main::commify($self->{'file_size'}), $self->{'path'});
my $group = "";
foreach my $tag ($self->{'exif_tool'}->GetFoundTags('Group0'))
{
if ($group ne $self->{'exif_tool'}->GetGroup($tag))
{
$group = $self->{'exif_tool'}->GetGroup($tag);
print "---- $group ----\n";
}
my $value = $info->{$tag};
if (ref($value) eq 'SCALAR')
{
$value = $$value;
}
$value =~ s/([\x00-\x1F])/sprintf("\\x%02X", unpack("C", $1))/eg;
print sprintf("%-32s : %s\n", $self->{'exif_tool'}->GetDescription($tag), $value);
}
print "\n";
$self->{'diag'}->progress(
sub {
my $dur = Time::HiRes::time() - $self->{'start_time'};
$dur = 1 if $dur {'file_count'});
my $procMB = main::bytes_str($self->{'bytes_processed'});
my $mbsec = main::bytes_str($self->{'bytes_processed'} / $dur);
sprintf("%s files, %s (%s/sec) processed",
$proc, $procMB, $mbsec);
},
$self);
}
# =======================================================================================
# Diag - diagnostic output
#
# char, line: print if verbose >= 3 (-v -v)
# status: print if verbose >= 2 (-v)
# progress: print if verbose > 0 (no option)
# =======================================================================================
package Diag;
use parent -norequire, 'Object';
sub init
{
my $self = shift;
$self->{'diag'} = $_[0] // 0;
$self->{'need_newline'} = 0;
$self->{'last_progress'} = 0;
return $self;
}
sub progress
{
my ($self, $callback, $param) = @_;
return unless $self->{'diag'} > 0;
return if time() == $self->{'last_progress'};
$self->{'last_progress'} = time();
print STDERR "\n" and $self->{'need_newline'} = 0 if $self->{'need_newline'};
print STDERR sprintf("%-78s\r", $callback->($param));
}
sub progress_clear
{
my ($self) = @_;
return unless $self->{'diag'} > 0;
print STDERR sprintf("%-78s\r", "");
}
# =======================================================================================
# Option - parse command-line options
# =======================================================================================
package Option;
use parent -norequire, 'Object';
sub read_options
{
my ($self) = @_;
my @opts = ('verbose|v+', 'quiet|q', 'base=s');
my $status = Getopt::Long::GetOptions($self, @opts);
$self->{'verbose'} = 0 if $self->{'quiet'};
# get from ARGV once I add separate commands
return $self;
}
# =======================================================================================
# Object - base class
# =======================================================================================
package Object;
sub new
{
my $self = shift;
my ($class) = ref($self) || $self; # allow both virtual (member) and static (class)
$self = {};
bless $self, $class;
return $self;
}
# =======================================================================================
# Not in a class yet
# =======================================================================================
package main;
sub bytes_str
{
my $val = shift;
if ($val < 10_000) { return sprintf("%.2f KB", $val / 1_000); }
elsif ($val < 100_000) { return sprintf("%.1f KB", $val / 1_000); }
elsif ($val < 1_000_000) { return sprintf("%d KB", int($val / 1_000)); }
elsif ($val < 10_000_000) { return sprintf("%.2f MB", $val / 1_000_000); }
elsif ($val < 100_000_000) { return sprintf("%.1f MB", $val / 1_000_000); }
elsif ($val < 1_000_000_000) { return sprintf("%d MB", int($val / 1_000_000)); }
elsif ($val < 10_000_000_000) { return sprintf("%.3f GB", $val / 1_000_000_000); }
elsif ($val < 100_000_000_000) { return sprintf("%.2f GB", $val / 1_000_000_000); }
elsif ($val < 1_000_000_000_000) { return sprintf("%.1f GB", $val / 1_000_000_000); }
else { return sprintf("%.3f TB", $val / 1_000_000_000_000); }
}
sub commify
{
local $_ = shift;
1 while s/^([-+]?\d+)(\d{3})/$1,$2/;
return $_;
}
# =======================================================================================
=head1 NAME
fix-jpeg: Clean up JPEG files and EXIF information
=head1 SYNOPSIS
fix-jpeg [options] [args]
fix-jpeg --base BASEDIR -v
=head1 DESCRIPTION
Fix file dates to match EXIF dates, or EXIF dates to match file dates. Rename files.
Find duplicates. Fix iPhoto libraries. Other magic stuff.
=cut