JFIF and EXIF share a heritage in that they use JPEG Application segments to store data in order to encapsulate JPEG compressed images in file format.
As a side note (and I’ll find a place to put this somewhere), iPhoto, when importing photos from a camera, is changing the data. It is at the least rearranging chunks of data, and it is probably adding or altering metadata. This shouldn’t have surprised me, I suppose. I noticed this when I was looking at photos that had been imported multiple times – the JPEG data itself was the same, but the files were different because chunks were in different orders. Perhaps successive versions of iPhoto have changed how import works. In any case, it’s just ever so slightly distressing, I’d prefer for the original to be truly original. I’ll investigate at some point by comparing files on the camera (extracted through copying directly from flash) versus the same photos imported by various programs.
Catalog of application segment types
As far as I know, one of these needs to be the second chunk in the file (the first chunk is always SOI = xFF xD8). It’s undefined behavior of several of these exist in the same file.
I show the identifier strings ag as upper-case, but the strings are case-insensitive – Apple, for example, stores the EXIF tag as ‘Exif\x00\x00″.
I need to write these as an actual grammar soon, but for now, I’ll illustrate them with hex dumps.
The file format document can be found at http://www.w3.org/Graphics/JPEG/jfif3.pdf. It’s pretty sparse, all things considered. This document introduced the idea that the APP0 marker had to be right after the SOI marker. All values in JFIF are big-endian. JFIF image orientation is always top-down (JPEG allows bottom-up).
FF E0 ; APP0 nn nn ; length 4A 46 49 4F 00 ; JFIF\x00 01 02 ; version 1.02 xx ; 0=no units 1=px/in 2=px/cm xx xx ; horizontal pixel density xx xx ; vertical pixel density xx ; thumbnail pixel width xx ; thumbnail pixel height xx xx xx yy yy yy ... ; 3n bytes 24-bit RGB thumbnail
This is actually an extension segment to the APP0 JFIF marker. It can only appear in files with JIFIF version 1.02 and above. The first byte after the identifier is an extension code, but while theoretically there can different kinds of extensions, the only ones defined to date are for different kinds of thumbnails. Presumably if a thumbnail is stored in a JFXX extension segment, it would not be also stored in the JFIF main segment. And I’m betting that this extension is mostly used for JPEG thumbnails. EXIF also does JPEG thumbnails.
FF E0 ; APP0 nn nn ; length 4A 46 58 58 00 ; JFXX\x00 xx ; 10=thumbnail, JPEG ; 11=thumbnail, 8-bit ; 13=thumbnail, 24-bit xx xx ... ; extension data
As alluded to above, EXIF and JFIF are competing file formats, so you can’t have both an EXIF and JFIF chunk in the same file. Also, EXIF is loosely based on and somewhat subsumes TIFF (you can store TIFF data in EXIF files, as well as JPEG, and many RAW formats are EXIF or EXIF-like files). Except, I found a file that had an Exif chunk followed by a JFIF chunk, how confusing (this was a photo sent in a text message, perhaps that’s why). And I found another file that had a JFIF chunk followed by an Exif chunk. See http://partners.adobe.com/public/developer/en/tiff/TIFF6.pdf for TIFF format info.
FF E1 ; APP1 nn nn ; length 45 58 49 46 00 00 ; EXIF\x00\x00 (or\xFF at end) 49 49 2A 00 08 00 00 00 ; little-endian TIFF (4D 4D big) 2A 00 ; length 08 00 00 00 ; offset to IFD0 (main image) nn nn ; IFD0: count of directory entries nn nn ... ; entry 0: 12 bytes ... nn nn ... ; entry N-1: 12 bytes 00 00 00 00 ; offset to IFD 1 (thumbnail image) ... 00 00 00 00 ; end of IFD list
Each 12-byte entry is formatted as follows
nn nn ; exif tag nn nn ; data format nn nn nn nn ; number of components nn nn nn nn ; data or offset to data
The data format field is a value from 1 to 12 that determines the data type of the components. Total data length is the size of the array, so multiply the component size by the number of components. If the total length is 4 or less, than the data is stored in the last field, otherwise an offset (from the start of the EXIF chunk) is stored.
1 = unsigned byte (1 byte/component)
2 = ASCII char (1 byte/component)
3 = unsigned short (2 byte/component)
4 = unsigned long (4 byte/component)
5 = unsigned rational (8 byte/component)
6 = signed byte (1 byte/component)
7 = undefined
8 = signed short (2 byte/component)
9 = signed long (4 byte/component)
10 = signed rational (8 byte/component)
11 = single-precision float (4 byte/component)
12 = double-precision float (8 byte/component)
The types are all as they would be in the C language, with the exception of rational: a rational number is two 4-byte unsigned longs stored in sequence, the first for the numerator, and the second for the denominator.
This segment is used to embed XMP data into JPEG files. See http://www.adobe.com/devnet/xmp.html for details.
FF E1 ; APP1 nn nn ; length 48 54 54 50 3A 2F 2F 4E ; http://ns.adobe.com/xap/1.0/\x00 53 2E 41 44 4F 42 45 2F 58 41 50 2F 31 2E 30 2F 00
FF E2 ; APP2 nn nn ; length 49 43 43 5F 50 52 4F 46 49 4C 45 00 ; ICC_PROFILE\x00
FF E3 ; APP3 nn nn ; length 4D 45 54 41 00 00 ; META\x00\x00
The JPEG APP12 “Picture Info” segment was used by some older cameras, and contains ASCII-based meta information.
FF EC ; APP12 nn nn ; length 51 69 63 74 75 49 6E 66 70 00 ; PictureInfo\x00 xx xx xx xx ; quality xx xx xx ... ; comment string xx xx xx ... ; copyright string
Photoshop uses the JPEG APP12 “Ducky” segment to store some information in “Save for Web” images.
FF EC ; APP12 nn nn ; length 44 75 63 6B 79 00 ; Ducky\x00 xx xx xx xx ; quality xx xx xx ... ; comment string xx xx xx ... ; copyright string
Adobe IRB data. The spec I could find says “Adobe Photoshop 6.0, File Formats Specification, Version 6.0, Release 2, November 2000″. http://oldschoolprg.x10.mx/downloads/ps6ffspecsv2.pdf. This describes the old Mac format, which stored lots of metadata in ‘8BIM’ resources. There is an updated version on Adobe’s site titled “Adobe Photoshop, File Formats, Specification, June 2012″. http://www.adobe.com/devnet-apps/photoshop/fileformatashtml/. It looks like IRB stands for “Image Resource Block”. So the IRB segment is used for tunneling Photoshop data inside JFIF/EXIF files.
FF ED ; APP13 nn nn ; length 50 68 6F 74 6F 73 68 6F 70 33 2E 30 00 ; Photoshop 3.0\x00
FF EE ; APP14 nn nn ; length 41 64 6F 62 65 00 ; Adobe\x00
Jeffrey Friedl’s Exif Viewer: http://regex.info/exif.cgi
TIFF file format: http://partners.adobe.com/public/developer/en/tiff/TIFF6.pdf