Out-of-order parsing

More and more, I think parsing needs to be modernized. Parsing was the hot subject in the 1960s, and may very well have been synonymous with computer science up through the early 1970s, but has stayed in that form ever since.

We can do better, and we have to relax the restrictions imposed by thinking of parsed texts as strings.

Yes, Tomita was a big step (http://pdf.aminer.org/000/211/233/recognition_of_context_free_and_stack_languages.pdf), but that was 30 years ago (!!) and a pretty small step, in hindsight.

Using CMake

Like most build systems, CMake is not clearly documented enough for me. I’m going to use libgit2 as an example of something real that is available to everyone. I’m doing this because there’s still not a single build system that’s good enough for general use, at least not when it comes to working on multi-platform projects.


You’ll almost always be using CMake to generate and use Visual Studio projects, although you have a choice of:

  • Makefile: MinGW, MSYS, NMake
  • Visual Studio projects (6-12)
  • nmake

Let’s start with Visual Studio projects, since that’s the common case.

Visual Studio generation

Grab the libgit2 source. Since I’m going to build for PyGit, I want a specific tag for compatibility. I definitely don’t want the development branch :)

> git clone git@github.com:libgit2/libgit2.git
> cd libgit2
> git checkout -b local-v0.20.0 v0.20.0

You’re meant to run CMake from the output folder. This is weird, but whatever. So here’s the naive way to use CMake.

> mkdir build
> cd build
> cmake ..
-- Building for: Visual Studio 12
-- The C compiler identification is MSVC 18.0.21005.1
-- Check for working C compiler using: Visual Studio 12
-- Check for working C compiler using: Visual Studio 12 -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
zlib was not found; using bundled 3rd-party sources.
-- Found PythonInterp: C:/Python27/python.exe (found version "2.7.6")
-- Configuring done
-- Generating done
-- Build files have been written to: C:/projects/git/github/libgit2/build

Of course, this will auto-pick a Visual Studio toolchain, and since it’s Windows, it won’t use the toolchain found in my path (that I very carefully put there), since it’s actually not common that the Visual Studio toolchain is in the path. CMake will default to the newest version it finds, and while that’s a reasonable thing to do, I need to be specific. So you need to tell CMake about the toolchain.

> mkdir build
> cd build
> cmake -G "Visual Studio 11" ..
-- The C compiler identification is MSVC 17.0.61030.0
-- Check for working C compiler using: Visual Studio 11
-- Check for working C compiler using: Visual Studio 11 -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
zlib was not found; using bundled 3rd-party sources.
-- Found PythonInterp: C:/Python27/python.exe (found version "2.7.6")
-- Configuring done
-- Generating done
-- Build files have been written to: C:/projects/git/github/libgit2/build

It doesn’t look like it’s possible to hard-code a generator in the CMakeLists.txt file. A kickstarter CMakeCache.txt file can go into the target folder, but there’s a chicken-and-egg issue there.

In CMake, there is a distinction between “generator” and “toolchain”. E.g. you can use the “Visual Studio 12″ generator but have it create projects that use the “Visual Studio 11″ toolchain.

> cmake -G "Visual Studio 12" -T "Visual Studio 11" ..

Up until now, all we’ve done is create a Visual Studio project file. While that’s useful, we actually want some built libraries and binaries.

You can build from the command-line like so:

> cmake --build .

(assuming you were in the build directory). However, it’s of limited use, because (on Windows with Visual Studio) you can only do debug builds this way. There’s no way to tell CMake to do a release build with Visual Studio. If you do this

> cmake --build . --target Release

you’ll get an error; the target functionality only works for makefile generators. You’ll also find out that cmake is using devenv, when it should now be using msbuild to be a good Windows citizen. CMake is great for creating cross-platform projects, but less good as an actual build tool. So you’ll want to directly use MSBuild.

> msbuild libgit2.sln /t:Build /p:Configuration=Release;Platform=Win32

And now I have libraries and binaries in libgit2/build/Release. If you really want to use devenv (against Microsoft’s desires, but what the heck), then

> devenv libgit2.sln /build Release /project ALL_BUILD

There is nothing that mandated the output folder being named build, it’s merely a convention.





General comments

Once you’ve generated makefiles with a specific generator, you can’t change the generator. You need to wipe the build folder, or pick a new build folder. So for doing cross-platform builds on a single machine, you’ll want some consistent naming for multiple build folders.

CMake likes to generate projects for a single architecture.

> cmake -G "Visual Studio 12 Win32" ..
> cmake -G "Visual Studio 12 Win64" ..

I don’t know how to generate a multi-architecture project. Or rather, the CMake philosophy is to use multiple build directories, with a single source tree, and it sounds like from the architecture of CMake that it just won’t be possible.


CMake documentation


Specific platforms




The origins of life are still mysterious. There are efforts at the biological level to recreate the conditions where biological life presumably started – the idea of a primordial soup. Write a program that does the equivalent – not a simulation of biological life, but a primordial equivalent for A-life. Real A-life.

From Copland to Mac OS X and Swift

This is an insightful 2005 article by John Siracusa on the lessons from Copland, Apple’s attempt to turn Classic Mac OS into something modern, and predictions/advice for Mac OS X on future-proofing:

Avoiding Copland 2010

This is a follow-up article where the author re-visited his predictions:

Copland 2010 revisited: Apple’s language and API future

And now we have Apple introducing a new language: Swift


Introducing a new language is interesting. We have Dylan (failed), Dart (maybe), Go (doing quite well), and Rust as “modern” mainstream languages aimed at replacing some previous language. Objective-C is only successful because the iPhone and iPad are successful.

Disable base initialization (Xcode/Mac)

From the xcode-users mailing list:

> My app built with Xcode on 10.9 won’t launch on 10.7, but I don’t use any new OS features.

The most likely cause of your problem is base internationalization. Newly created Xcode projects have base internationalization turned on, but base internationalization works only on OS X 10.8 and later.

To support 10.7, turn off base internationalization. Select your project from the project navigator to open the project editor. Select your project from the left side of the project editor. Deselect the Use Base Internationalization checkbox.


This is an incomplete rant. I’ll return to this at some point, when I have a more concrete proposal for better handling of time values in programs and data.

Notes on using time

ZIP archives

The base ZIP format stores a file entry last-modified date-time as 4 bytes: epoch is 1 Jan 1980, 2 bytes for time, 2 bytes for date, in MS-DOS format, including FAT’s 2-second resolution. There is no idea of timezone in basic ZIP files, so timestamps are ambiguous; you must know how the creator of the archive stored time values. Some ZIP programs store time as UTC, but this is not mandated anywhere. The original PKWARE ZIP program used local time.

There are several extensions that record time in a more meaningful way (7-Zip stores NTFS timestamps by default, for example).

  • NTFS Extra Field (0x000a), which stores NTFS-compatible mtime, atime and ctime (8-byte values, UTC, epoch is 1 Jan 1601, 100-nanosecond tick).
  • UNIX Extra Field (0x000d), which stories Posix-compatible mtime and atime (4-byte values, UTC, epoch is 1 Jan 1970, 1 second tick).
  • third-party extended timestamp (0x5455), Posix-compatible mtime, atime and ctime.
  • Info-ZIP Unix extra field (0x0001) – obsolete, similar to 0x5455 in layout.

Without one of these extended fields that mandate UTC, you have to guess at what the timestamps mean. It’s probably best to assume UTC by default anyway, and then have some way to manually tweak times as needed.


  • http://www.opensource.apple.com/source/zip/zip-6/unzip/unzip/proginfo/extra.fld
  • http://en.wikipedia.org/wiki/Zip_(file_format)
  • http://www.pkware.com/documents/casestudies/APPNOTE.TXT.
  • https://users.cs.jmu.edu/buchhofp/forensics/formats/pkzip.html

MS-DOS time format

The only remaining relic of this is in ZIP files.

The epoch starts at 1 Jan 1980.

Both date and time are 16-bit unsigned values, packed as follows:

  • date: YYYYYYYM MMMDDDDD (7 bits for year, 4 bits for month, 5 bits for day)
  • time: HHHHHMMM MMMSSSSS (5 bits for hour, 6 bits for minute, 5 bits for second)

The day of the month is in the range 1-31. Month is in the range 1-12. Year is in the range 0..127 (1980 to 2107).

Seconds are stored as 0-29 and multiplied by two, e.g. 0-58 (this is where the 2-second time resolution comes from). Minutes are in the range 0-59. Hours are in the range 0-23.

A rant

I hope there is a special circle in hell for all the engineers who designed and wrote code that deals with timestamps – not elapsed time, but “absolute” time. The APIs all suck, there is mass confusion for something that really is quite simple, and users pay for it (Windows users for many years suffered when daylight savings changes happened, because file times recorded around that point jump forwards or backwards in time).

In one respect, it’s really simple. There is a global timescale, it runs linearly, and you can do simple math on it.

But it’s complicated by the fact that we don’t work with global time, we work with local time.

There’s the complexity of relativity (two observers see their local time as the global time, and see each others’ local time differently). But while that’s important in some kinds of time measurement, it’s not the pain most of us deal with.

I’m referring to the idea of local offsets. Because we inherited ways of recording time values, our idea of a time value has some sort of offset in it. In fact, it has multiple offsets, and it can even run non-linearly.

The most common offsets are known as time zone and calendar. Instead of working with a timestamp like 1269400476, we call it “Wed Mar 24 03:14:36 2010″. The time zone affects the value by some number of hours (perhaps fractional) , and the calendar affects what the timestamp actually means.

That still doesn’t complicate things all that much. We could trivially convert to/from absolute time T to some local time t, or some vector of local time t0..tn. It’s just math

T =  t1*a1 + t2*a2 … + tn*an

What makes it complicated is that we don’t actually attach unambigious meanings to our recorded time values. We neglect to say whether we have an global time or a local time, and we neglect to say which local time we are referring to.

Now, there’s definitely a circle of hell for the people who mess with the linearity of the timescale, by adding leap seconds. This was a “clever” idea to keep the local time in sync with the rotation of the Earth – we decided that the Earth rotates once every 24 hours, and since the rotation is slowing down, we need to add a stray second here and there so that our local clock doesn’t shift compared to the Earth’s rotation. Of course, that makes it much more complicated to do math on time values. It doesn’t introduce discontinuities per se, it’s simply that some day has 86401 seconds in it instead of 86400 seconds. But we stopped doing that – for now.

The way back to sanity is to convert all your input time values into the global time, and exclusively work in global time, and then only convert back to some local time for display. Never store local time values.

Of course, there is no currently good global time. UT isn’t a global time.