Most JPEG files have EXIF metadata in them.
For Perl, there is the handy-dandy Image::ExifTool library that encapsulates ExifTool.
With this, one can whip up quick programs to do all kinds of things. For example, I have lots of photos scattered across hard disks and iPhoto libraries. Lots of these are duplicates, and some have broken metadata. Reorganizing by hand is painful, but reorganizing with a program is only painful for the 15 minutes it takes to write one.
Some cameras and most smartphones even have GPS nowadays, so if you want to find all the pictures you took in England, you could do a quick search across the metadata.
Also, since actually scanning hard disks is slow, another thing would be to extract all the metadata and put it in one place. For this, someday I hope we’ll have FUSE-level capabilities for Windows and Mac OS X, so I could write a FUSE filesystem to automatically maintain the metadata database. If you’re unfamiliar with FUSE, here’s a teaser article: http://pramode.net/articles/lfy/fuse/pramode.html.
Here’s an example of what you have access to. This is the metadata that ExifTool pulled out of one of my photos.
---- ExifTool ---- ExifTool Version Number : 9.01 ---- File ---- File Name : IMG_0890.JPG Directory : Z:\Media\Images\bfitz\Pictures\iPhoto Library/Modified/2008/Jul 20, 2008 File Size : 742 kB File Modification Date/Time : 2008:07:30 12:01:32-07:00 File Permissions : rw-rw-rw- File Type : JPEG MIME Type : image/jpeg Exif Byte Order : Big-endian (Motorola, MM) Comment : AppleMark\x0A Image Width : 1200 Image Height : 1600 Encoding Process : Baseline DCT, Huffman coding Bits Per Sample : 8 Color Components : 3 Y Cb Cr Sub Sampling : YCbCr4:2:2 (2 1) ---- EXIF ---- Make : Apple Camera Model Name : iPhone Orientation : Horizontal (normal) Orientation : Horizontal (normal) X Resolution : 72 Y Resolution : 72 Resolution Unit : inches Software : QuickTime 7.5 Modify Date : 2008:07:30 12:01:32 Host Computer : Mac OS X 10.4.9 Y Cb Cr Positioning : Centered F Number : 2.8 Exif Version : 0220 Date/Time Original : 2008:07:20 14:51:42 Create Date : 2008:07:20 14:51:42 Color Space : sRGB Compression : JPEG (old-style) X Resolution : 72 Y Resolution : 72 Resolution Unit : inches Thumbnail Offset : 470 Thumbnail Length : 5639 Y Cb Cr Positioning : Centered ---- ICC_Profile ---- Profile CMM Type : appl Profile Version : 2.2.0 Profile Class : Input Device Profile Color Space Data : RGB Profile Connection Space : XYZ Profile Date Time : 2003:07:01 00:00:00 Profile File Signature : acsp Primary Platform : Apple Computer Inc. CMM Flags : Not Embedded, Independent Device Manufacturer : appl Device Model : Device Attributes : Reflective, Glossy, Positive, Color Rendering Intent : Perceptual Connection Space Illuminant : 0.9642 1 0.82491 Profile Creator : appl Profile ID : 0 Red Matrix Column : 0.45427 0.24263 0.01482 Green Matrix Column : 0.35332 0.67441 0.09042 Blue Matrix Column : 0.15662 0.08336 0.71953 Media White Point : 0.95047 1 1.0891 Chromatic Adaptation : 1.04788 0.02292 -0.0502 0.02957 0.99049 -0.01706 -0.00923 0.01508 0.75165 Red Tone Reproduction Curve : curv\x00\x00\x00\x00\x00\x00\x00\x01\x023 Green Tone Reproduction Curve : curv\x00\x00\x00\x00\x00\x00\x00\x01\x023 Blue Tone Reproduction Curve : curv\x00\x00\x00\x00\x00\x00\x00\x01\x023 Profile Description : Camera RGB Profile Profile Copyright : Copyright 2003 Apple Computer Inc., all rights reserved. Profile Description ML : Camera RGB Profile Profile Description ML (es-ES) : Perfil RGB para Cámara Profile Description ML (da-DK) : RGB-beskrivelse til Kamera Profile Description ML (de-DE) : RGB-Profil für Kameras Profile Description ML (fi-FI) : Kameran RGB-profiili Profile Description ML (fr-FU) : Profil RVB de l’appareil-photo Profile Description ML (it-IT) : Profilo RGB Fotocamera Profile Description ML (nl-NL) : RGB-profiel Camera Profile Description ML (no-NO) : RGB-kameraprofil Profile Description ML (pt-BR) : Perfil RGB de Câmera Profile Description ML (sv-SE) : RGB-profil för Kamera Profile Description ML (ja-JP) : カメラ RGB プãƒãƒ•ァイル Profile Description ML (ko-KR) : ì¹´ë©”ë¼ RGB í”„ë¡œíŒŒì¼ Profile Description ML (zh-TW) : 數ä½ç›¸æ©Ÿ RGB 色彩æè¿° Profile Description ML (zh-CN) : 相机 RGB æè¿°æ–‡ä»¶ ---- Composite ---- Aperture : 2.8 Image Size : 1200x1600 Thumbnail Image : <thumbnail image data>
It’s pretty daunting. But you can see I took this with an iPhone, the orientation is horizontal, it’s a 1200×1600 image at 72 DPI. Also note that I modified this photo, and you can see when it was modified, and that it was likely modified on a Mac (the “Host Computer” key).
What are some things you would want to fix?
Sometimes a photo orientation is recorded incorrectly. Rather than mess with rotating it, you could just fix the metadata. Obviously, it’s an AI task for a program to look at a photo and say “hey, the orientation is wrong”, but once you’ve determined this, my personal preference would be to fix the original data. Another thing you can do is fix dates – sometimes my photos claim to have been taken in 2000, or 2015, or some other crazy date. So, I would prefer to fix the metadata in the photo.
Also note that JPEGs can have embedded thumbnails. You might want to strip those out. Easy enough to do. Or you can insert new thumbnails. You can even add your own metadata. Or you could fix the GPS location if you have better information than what was stamped in the photo (my iPhone 3G has been spectacularly off for indoor pictures, sometimes a mile off).
Here is the start of a program I wrote to find and manipulate metadata in photos. I guess I should start putting this on Github rather that pasted into blog postings.
#!/usr/bin/perl use 5.014; # implies use strict use warnings; use feature ':5.14'; use Digest::MD5; use Data::Dumper; use File::Find; use Getopt::Long qw(); use Image::ExifTool; use Time::HiRes qw(); Fix->new()->run(); # ======================================================================================= # ======================================================================================= package Fix; use parent -norequire, 'Object'; sub run { my ($self) = @_; $self->{'option'} = Option->new()->read_options(); $self->{'diag'} = Diag->new()->init($self->{'option'}->{'verbose'}); $self->do_commands(); } sub do_commands { my ($self) = @_; Scan->new()->scan($self->{'option'}, $self->{'diag'}); } # ======================================================================================= # ======================================================================================= package Scan; use parent -norequire, 'Object'; sub scan { my $self = shift; $self->{'option'} = shift; $self->{'diag'} = shift; $self->{'start_time'} = Time::HiRes::time(); $self->{'file_count'} = 0; $self->{'bytes_processed'} = 0; $self->{'exif_tool'} = new Image::ExifTool; $self->{'exif_tool'}->Options(Unknown => 1); my $process = sub { $self->process($_); }; no warnings 'File::Find'; # don't want noise about dirs we can't iterate &File::Find::find({wanted => $process}, $self->{'option'}->{'base'}); } sub process { my ($self, $file) = @_; @{$self->{'stat'}} = stat($file); return unless -f _; return unless $file =~ /\.(JPG|JPEG|jpg|jpeg)$/; return if $file =~ /\.\_/; # skip ._ AppleDouble files $self->{'file'} = $file; $self->{'path'} = $File::Find::name; $self->{'file_size'} = -s _; return if $self->{'path'} =~ /\.AppleDouble/; # skip fake JPG inside AppleDouble folders $self->{'file_count'} += 1; $self->{'bytes_processed'} += $self->{'file_size'}; my $info = $self->{'exif_tool'}->ImageInfo($self->{'path'}); print sprintf("%10s %s\n", main::commify($self->{'file_size'}), $self->{'path'}); my $group = ""; foreach my $tag ($self->{'exif_tool'}->GetFoundTags('Group0')) { if ($group ne $self->{'exif_tool'}->GetGroup($tag)) { $group = $self->{'exif_tool'}->GetGroup($tag); print "---- $group ----\n"; } my $value = $info->{$tag}; if (ref($value) eq 'SCALAR') { $value = $$value; } $value =~ s/([\x00-\x1F])/sprintf("\\x%02X", unpack("C", $1))/eg; print sprintf("%-32s : %s\n", $self->{'exif_tool'}->GetDescription($tag), $value); } print "\n"; $self->{'diag'}->progress( sub { my $dur = Time::HiRes::time() - $self->{'start_time'}; $dur = 1 if $dur {'file_count'}); my $procMB = main::bytes_str($self->{'bytes_processed'}); my $mbsec = main::bytes_str($self->{'bytes_processed'} / $dur); sprintf("%s files, %s (%s/sec) processed", $proc, $procMB, $mbsec); }, $self); } # ======================================================================================= # Diag - diagnostic output # # char, line: print if verbose >= 3 (-v -v) # status: print if verbose >= 2 (-v) # progress: print if verbose > 0 (no option) # ======================================================================================= package Diag; use parent -norequire, 'Object'; sub init { my $self = shift; $self->{'diag'} = $_[0] // 0; $self->{'need_newline'} = 0; $self->{'last_progress'} = 0; return $self; } sub progress { my ($self, $callback, $param) = @_; return unless $self->{'diag'} > 0; return if time() == $self->{'last_progress'}; $self->{'last_progress'} = time(); print STDERR "\n" and $self->{'need_newline'} = 0 if $self->{'need_newline'}; print STDERR sprintf("%-78s\r", $callback->($param)); } sub progress_clear { my ($self) = @_; return unless $self->{'diag'} > 0; print STDERR sprintf("%-78s\r", ""); } # ======================================================================================= # Option - parse command-line options # ======================================================================================= package Option; use parent -norequire, 'Object'; sub read_options { my ($self) = @_; my @opts = ('verbose|v+', 'quiet|q', 'base=s'); my $status = Getopt::Long::GetOptions($self, @opts); $self->{'verbose'} = 0 if $self->{'quiet'}; # get from ARGV once I add separate commands return $self; } # ======================================================================================= # Object - base class # ======================================================================================= package Object; sub new { my $self = shift; my ($class) = ref($self) || $self; # allow both virtual (member) and static (class) $self = {}; bless $self, $class; return $self; } # ======================================================================================= # Not in a class yet # ======================================================================================= package main; sub bytes_str { my $val = shift; if ($val < 10_000) { return sprintf("%.2f KB", $val / 1_000); } elsif ($val < 100_000) { return sprintf("%.1f KB", $val / 1_000); } elsif ($val < 1_000_000) { return sprintf("%d KB", int($val / 1_000)); } elsif ($val < 10_000_000) { return sprintf("%.2f MB", $val / 1_000_000); } elsif ($val < 100_000_000) { return sprintf("%.1f MB", $val / 1_000_000); } elsif ($val < 1_000_000_000) { return sprintf("%d MB", int($val / 1_000_000)); } elsif ($val < 10_000_000_000) { return sprintf("%.3f GB", $val / 1_000_000_000); } elsif ($val < 100_000_000_000) { return sprintf("%.2f GB", $val / 1_000_000_000); } elsif ($val < 1_000_000_000_000) { return sprintf("%.1f GB", $val / 1_000_000_000); } else { return sprintf("%.3f TB", $val / 1_000_000_000_000); } } sub commify { local $_ = shift; 1 while s/^([-+]?\d+)(\d{3})/$1,$2/; return $_; } # ======================================================================================= =head1 NAME fix-jpeg: Clean up JPEG files and EXIF information =head1 SYNOPSIS fix-jpeg [options] [args] fix-jpeg --base BASEDIR -v =head1 DESCRIPTION Fix file dates to match EXIF dates, or EXIF dates to match file dates. Rename files. Find duplicates. Fix iPhoto libraries. Other magic stuff. =cut
What does the “profile copyright” refer to (vs “copyright”)? Does this mean that the image is copyright protected? I don’t want to violate anyone’s copyright restrictions so when I see this, should I stay away from the image?