The Exif file format is, shall we say, not robust and not documented well. And yet it’s at the heart of hundreds of billions of data items – the photos that people take with digital cameras in ever-increasing numbers.
Strictly speaking, Exif data is not needed; it’s metadata about your image, not the image itself. And yet, sans that metadata, many photographers and their tools are worse off – the metadata stores timestamps, GPS locations, camera information, and more.
Where things go wrong is in importing or editing. For example, Apple’s iPhoto adds its own metadata when importing photos from cameras; so do most other importing tools. Photoshop needs to copy or remove metadata when you modify a photo. Social media sites are currently in a tiny bit of hot water for their active removing of metadata from photos uploaded to their sites.
Changing byte order
One of the worst things you can do is to change the byte-order in Exif metadata. Or at least, this is one that is fraught with peril, because there are a lot of specialized formats buried inside Exif, and if you fix up some but not all of the offsets, you’ll break metadata; not for you, because you’re skipping that bit for failing to understand it, but for some other tool more savvy than yours (or knowledgeable in a different way).
For example, I had taken some photos with a Nikon E990 in 2007. In looking at them recently (while trying to develop a more comprehensive JPEG file parser), I found that iTunes had decided to change an Exif chunk from little-endian to big-endian. It fixed up all the data it knew about, but evidently this older version of iTunes did not know about Nikon MakerData. And so it left the MakerData alone, which meant that its offsets were still little-endian. Photos I’d imported previously with Nikon’s custom software were fine, because Nikon of course knows how to handle its own metadata.
I don’t actually know if the Nikon E990 writes Exif as little-endian or big-endian, because I only have photos that were imported with a tool, not copied directly from its media. That is an interesting experiment I need to run at some point.
Maker Note – no specific format
The Maker Note data itself is very problematic, because there is no one format for it. About half the cameras in existence seem to write their Maker Note data in IFD format, and half have a custom format. This makes it very hard to do anything with the whole file, because MakerNote data offsets are absolute offsets, not relative to the MakerNote chunk, and there is actually nothing forcing Maker Note data to be stored inside the declared MakerNote chunk (although one suspects it usually is). So if you edit a Exif file in a way that causes the location of the MakerNote data to shift, this has the high likelihood of breaking the MakerNote data, since its embedded offsets are now pointing to the wrong place. Microsoft attempted to fix this, but their fix was useless.
What other people have said
Problems with current Metadata Standards. Phil Harvey is the author of ExifTool, the most comprehensive (if a bit too ad-hoc) Exif-type metadata tool around. Also, the ExifTool FAQ has tidbits of information in it about Exif-format issues.
TIFF, Tag Image File Format, FAQ from Aware Systems has the defining line about the TIFF file format (which Exif piggy-backed onto).
Except for one more thing: the TIFF specification explicitly forbids tag data to include any offsets into the file or into any other TIFF data block, except for the documented special cases (TileOffsets, StripOffsets,…). This enables total relocatability of all structures and data blocks. That is a major corner stone of the format. It means it is actually easy to, e.g. unlink an IFD or concatenate multiple single-page TIFF to a single multi-page TIFF, or vice versa. If any tag’s data could contain offsets pointing anywhere in the file, then software doing this or otherwise relocating data blocks should be aware of the exact nature of every tag’s data in order to find all data blocks, and know what pointers ought to be changed. That is unfeasible, on the one hand, due to the number of defined tags and, on the other hand, it inhibits extendability and private tags.
Instead, the specification says that all tag data needs to be ‘self-contained’, and that only a selected few special tags are allowed to point to other locations in the file. Thus, all blocks become freely relocatable, can be read and written out in any order, and any software can quite simply joggle around all this TIFF data, with only inbuilt knowledge of these highest level structures, and of the selected few special tags.
If Exif had followed this, then we wouldn’t be having the problems currently faced. I presume that the original spec authors were thinking of Exif files as write-once magic that would not be edited (very short-sighted). On the other hand, TIFF files are a flat structure, basically an array of independent IFD elements. Sometimes structure is desired.
The (much later) suggestion was to have an IFD TIFF type that pointed to a self-contained IFD data block, which would basically be its own embedded TIFF file (with its own byte-orientation, and block-relative offsets).
The actual answer is a new file format that is backwards-compatible and forwards-compatible. This is a pretty daunting thing to imagine, but given the ever-increasing number of Exif files that are generated each year, something that we should do.