Timestamps

This is an incomplete rant. I’ll return to this at some point, when I have a more concrete proposal for better handling of time values in programs and data.

Notes on using time

ZIP archives

The base ZIP format stores a file entry last-modified date-time as 4 bytes: epoch is 1 Jan 1980, 2 bytes for time, 2 bytes for date, in MS-DOS format, including FAT’s 2-second resolution. There is no idea of timezone in basic ZIP files, so timestamps are ambiguous; you must know how the creator of the archive stored time values. Some ZIP programs store time as UTC, but this is not mandated anywhere. The original PKWARE ZIP program used local time.

There are several extensions that record time in a more meaningful way (7-Zip stores NTFS timestamps by default, for example).

  • NTFS Extra Field (0x000a), which stores NTFS-compatible mtime, atime and ctime (8-byte values, UTC, epoch is 1 Jan 1601, 100-nanosecond tick).
  • UNIX Extra Field (0x000d), which stories Posix-compatible mtime and atime (4-byte values, UTC, epoch is 1 Jan 1970, 1 second tick).
  • third-party extended timestamp (0x5455), Posix-compatible mtime, atime and ctime.
  • Info-ZIP Unix extra field (0x0001) – obsolete, similar to 0x5455 in layout.

Without one of these extended fields that mandate UTC, you have to guess at what the timestamps mean. It’s probably best to assume UTC by default anyway, and then have some way to manually tweak times as needed.

See

  • http://www.opensource.apple.com/source/zip/zip-6/unzip/unzip/proginfo/extra.fld
  • http://en.wikipedia.org/wiki/Zip_(file_format)
  • http://www.pkware.com/documents/casestudies/APPNOTE.TXT.
  • https://users.cs.jmu.edu/buchhofp/forensics/formats/pkzip.html

MS-DOS time format

The only remaining relic of this is in ZIP files.

The epoch starts at 1 Jan 1980.

Both date and time are 16-bit unsigned values, packed as follows:

  • date: YYYYYYYM MMMDDDDD (7 bits for year, 4 bits for month, 5 bits for day)
  • time: HHHHHMMM MMMSSSSS (5 bits for hour, 6 bits for minute, 5 bits for second)

The day of the month is in the range 1-31. Month is in the range 1-12. Year is in the range 0..127 (1980 to 2107).

Seconds are stored as 0-29 and multiplied by two, e.g. 0-58 (this is where the 2-second time resolution comes from). Minutes are in the range 0-59. Hours are in the range 0-23.

A rant

I hope there is a special circle in hell for all the engineers who designed and wrote code that deals with timestamps – not elapsed time, but “absolute” time. The APIs all suck, there is mass confusion for something that really is quite simple, and users pay for it (Windows users for many years suffered when daylight savings changes happened, because file times recorded around that point jump forwards or backwards in time).

In one respect, it’s really simple. There is a global timescale, it runs linearly, and you can do simple math on it.

But it’s complicated by the fact that we don’t work with global time, we work with local time.

There’s the complexity of relativity (two observers see their local time as the global time, and see each others’ local time differently). But while that’s important in some kinds of time measurement, it’s not the pain most of us deal with.

I’m referring to the idea of local offsets. Because we inherited ways of recording time values, our idea of a time value has some sort of offset in it. In fact, it has multiple offsets, and it can even run non-linearly.

The most common offsets are known as time zone and calendar. Instead of working with a timestamp like 1269400476, we call it “Wed Mar 24 03:14:36 2010″. The time zone affects the value by some number of hours (perhaps fractional) , and the calendar affects what the timestamp actually means.

That still doesn’t complicate things all that much. We could trivially convert to/from absolute time T to some local time t, or some vector of local time t0..tn. It’s just math

T =  t1*a1 + t2*a2 … + tn*an

What makes it complicated is that we don’t actually attach unambigious meanings to our recorded time values. We neglect to say whether we have an global time or a local time, and we neglect to say which local time we are referring to.

Now, there’s definitely a circle of hell for the people who mess with the linearity of the timescale, by adding leap seconds. This was a “clever” idea to keep the local time in sync with the rotation of the Earth – we decided that the Earth rotates once every 24 hours, and since the rotation is slowing down, we need to add a stray second here and there so that our local clock doesn’t shift compared to the Earth’s rotation. Of course, that makes it much more complicated to do math on time values. It doesn’t introduce discontinuities per se, it’s simply that some day has 86401 seconds in it instead of 86400 seconds. But we stopped doing that – for now.

The way back to sanity is to convert all your input time values into the global time, and exclusively work in global time, and then only convert back to some local time for display. Never store local time values.

Of course, there is no currently good global time. UT isn’t a global time.

References

http://www.epochconverter.com/

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>