Category Archives: Software Development

SCons Environment in depth, part 3

I’m going to focus on the Microsoft toolchain, with the aim of being able to put a Microsoft toolchain into a package that can be loaded at build time. The plus side to this is that you don’t need toolchains installed to systems, but it require a little finagling of SCons. And to do that, we need to understand what it’s doing. I covered individual Microsoft-specific tools in the past part, but in isolation, and with less understanding than I have now. So, onwards.

Note – this is super-sketchy and should be filled in. I started keeping notes for myself as I was working on Visual-C++-in-a-package, and afer the initial exploration, I started working. I need to circle back and update this.

How does SCons configure Microsoft Visual C++?

There is a debugging environment variable that you can set that will enable some SCons spew from Tool/MSCommon/ If you do that with a simple SConstruct

env = Environment(tools=[], platform='win32', MSVC_VERSION='11.0')

then you’ll get some output that will guide you. Since we’re trying to use specific Microsoft products, there are well-known registry keys pointing to each version. Visual Studio 2012 has a registry key pointing to the on-disk location for Visual C++:

Software\Wow6432Node\Microsoft\VisualStudio\11.0\Setup\VC\ProductDir = C:\dev\VC11\VC\

If you don’t specify a Visual C++ version SCons will enumerate every possible version of Visual Studio going back to to the dawn of time, and then pick the first one it finds – since the list it searches is ordered from newest to oldest, this will find the most recent Visual C++ that you have installed.

If you do this while specifying a specific Visual C++ version, you’ll see that it skips the registry scanning and goes straight to enumerating the hard disk. However, something later forgets this, and it scans anyway. This is because vc.msvc_exists() is defective – it uses the (cached) list of versions as proof that Visual C++ exists, but nothing set it up for the case where you bypass it. This is an easy fix. I’ll add to the list of things I want to patch.

Another nit is that find_vc_pdir is not memoized – it’s called at least three times during setup. The only reason I care is that SCons on Linux (even in a VM) is about 0.5 sec faster at startup than on Windows – this might be Python overhead on the two systems, or it could be the Microsoft tools init. I’ll profile it at some point.

Then it finds the magic BAT file that Microsoft supplies for command-line use, that sets up all the environment variables that the toolchains need to run. There is an boolean environment variable MSVC_USE_SCRIPT that lets you disable the use of the Microsoft script – if this is set to False (it defaults to True), then SCons assumes you have done all the setup yourself.

And it scans for installed SDKs. This part is missing a preconfigure step to let you select a specific SDK. In general, SDKs are loosely coupled with the Visual Studio install, but only very loosely.

Visual C++ vcvarsall.bat

This is a batch file that Microsoft has been supplying for a while, as a convenience for configuring an environment for building with Visual C++. It takes an optional architecture parameter that if not supplied defaults to ‘x86′. And if you’re curious, this just runs a different batch file at \bin\amd64\vcvars64.bat, and this  makes registry queries and calls another batch file, Common7\Tools\VCVarsQueryRegistry.bat, which does most of the real work.

If you run it like this

C:\Program Files (x86)\Microsoft Visual Studio 11.0\VC\vcvarsall.bat amd64

then it will set the following environment variables:

ExtensionSdkDir=C:\Program Files (x86)\Microsoft SDKs\Windows\v8.0\ExtensionSDKs
FSHARPINSTALLDIR=C:\Program Files (x86)\Microsoft SDKs\F#\3.0\Framework\v4.0\
  C:\Program Files (x86)\Windows Kits\8.0\include\shared;
  C:\Program Files (x86)\Windows Kits\8.0\include\um;
  C:\Program Files (x86)\Windows Kits\8.0\include\winrt;
  C:\Program Files (x86)\Windows Kits\8.0\lib\win8\um\x64;
  C:\Program Files (x86)\Windows Kits\8.0\References\CommonConfiguration\Neutral;
  C:\Program Files (x86)\Microsoft SDKs\Windows\v8.0\ExtensionSDKs\Microsoft.VCLibs\11.0\References\CommonConfiguration\neutral;
  C:\Program Files (x86)\HTML Help Workshop;
  C:\dev\VC11\Team Tools\Performance Tools\x64;
  C:\dev\VC11\Team Tools\Performance Tools;
  C:\Program Files (x86)\Windows Kits\8.0\bin\x64;
  C:\Program Files (x86)\Windows Kits\8.0\bin\x86;
  C:\Program Files (x86)\Microsoft SDKs\Windows\v8.0A\bin\NETFX 4.0 Tools\x64;
  C:\Program Files (x86)\Microsoft SDKs\Windows\v7.0A\Bin\x64;
  C:\Program Files (x86)\Microsoft SDKs\Windows\v8.0A\bin\NETFX 4.0 Tools;
  C:\Program Files (x86)\Microsoft SDKs\Windows\v7.0A\Bin\
WindowsSdkDir=C:\Program Files (x86)\Windows Kits\8.0\
WindowsSdkDir_35=C:\Program Files (x86)\Microsoft SDKs\Windows\v7.0A\Bin\
WindowsSdkDir_old=C:\Program Files (x86)\Microsoft SDKs\Windows\v8.0A\

If environment variables already exist, it prepends to them.

Now, this may not be entirely accurate, because I had a few environment variables already set for some reason (I’m assuming the Visual Studio installer did this)

VS100COMNTOOLS=C:\Program Files (x86)\Microsoft Visual Studio 10.0\Common7\Tools\
VS110COMNTOOLS=C:\Program Files (x86)\Microsoft Visual Studio 11.0\Common7\Tools\

I removed these from an environment and ran vcvars64.bat for VC11, and got the VS110COMNTOOLS environment variable. I think this comes from the “Visual Studio Tools” folder which contains Spy++ and other top-level tools that you would run from the IDE, not as part of the build environment.

This may be a side-light to you, but I want to package Visual C++ into a downloadable tool that is used by the build system to allow builds on arbitrary machines. Yes, we’ll have to make sure we only do this where we’re appropriately licensed.

HKLM\SOFTWARE\Microsoft\Microsoft SDKs\Windows\v8.0\InstallationFolder

Path to the installed Windows SDK, put into environment variable WindowsSdkDir. The default is C:\Program Files (x86)\Windows Kits\8.0\

Alternate locations

  • HKCU\SOFTWARE\Microsoft\Microsoft SDKs\Windows\v8.0\InstallationFolder
  • HKLM\SOFTWARE\Wow6432Node\Microsoft\Microsoft SDKs\Windows\v8.0\InstallationFolder
  • HKCU\SOFTWARE\Wow6432Node\Microsoft\Microsoft SDKs\Windows\v8.0\InstallationFolder

HKLM\SOFTWARE\Microsoft\Microsoft SDKs\Windows\v8.0A\InstallationFolder

Path to an older Windows SDK (for Visual Studio 2012), put into environment variable WindowsSdkDir_old.

Alternate locations

  • HKCU\SOFTWARE\Microsoft\Microsoft SDKs\Windows\v8.0a\InstallationFolder
  • HKLM\SOFTWARE\Wow6432Node\Microsoft\Microsoft SDKs\Windows\v8.0a\InstallationFolder
  • HKCU\SOFTWARE\Wow6432Node\Microsoft\Microsoft SDKs\Windows\v8.0a\InstallationFolder

Environment variables

Microsoft build tools need to have some environment variables set up.


PATH needs to contain the paths to the various tools that will be invoked. For example, it might look something like this. I edited a tiny bit for clarity, where C:\dev\VC11 is the installation folder for Visual Studio 2012 (typically C:\Program Files (x86)\Microsoft Visual Studio 2011), and C:\dev\SDKs is the installation folder for Microsoft SDKs (typically C:\Program Files (x86)\Microsoft SDKs).

  C:\dev\VC11\Team Tools\Performance Tools\x64
  C:\dev\VC11\Team Tools\Performance Tools
  C:\Program Files (x86)\Windows Kits\8.0\bin\x64
  C:\Program Files (x86)\Windows Kits\8.0\bin\x86
  C:\dev\SDKs\Windows\v8.0A\bin\NETFX 4.0 Tools\x64
  C:\dev\SDKs\Windows\v8.0A\bin\NETFX 4.0 Tools

As mentioned above, the paths come from executing vcvarsall.bat.

Two programming challenges

First off, these aren’t programming challenges in the classic sense. Second, you do real work. So, get to work!

Matsano Crypto Challenges has you write attacks on cryptographic algorithms and systems. It requires no pre-existing crypto knowledge, and a very moderate level of math (about 9th grade level).

The Eudyptula Challenge has you work on the Linux kernel, including having patches accepted. Think of it as a guide to “how can I contribute to the Linux kernel?”, or, really, any large open-source project. It does require you to be a fairly strong C programmer.


Software that always works (part 1)

This is part one of a long series. This is a preamble explaining the background.

Many of the advances that humans have made in the past 50 years have been made possible through the use of computer programs, or software. Many of the problems we’ve faced in the past 50 years have been through software that fails to work as intended. I think the focus in remedying that has been too narrow. We’ve been focusing on writing program without bugs, or writing programs that meet all the specifications. This is both impossible and setting our goals too low. We will never be able to write large programs without bugs, and even if we could write programs that correctly met all the specifications, we are unable to actually list all the specifications out in advance. And even if we could do that, hardware fails in the real world, and that should not be an excuse for software to fail to function. Our actual goal is to write software that accomplishes its tasks regardless of any internal or external problems.

This is a very hard task. But it’s the actual meaningful task.

Imagine that you hired someone to run your accounting department. You give them the task of making payroll happen. Let’s say you go to this person four weeks later, to find out that nothing has happened for the past two payroll periods, and the response was “the printer ran out of paper, so I couldn’t mail out checks.” You would not think “man, I failed, I should have made sure paper was available”. You’d fire that person. Or let’s say that every few hours this person stops working and calls you because something is blocking the work: “the N key on the keyboard doesn’t work”, “the cleaning person turned off the power to my computer”, “I typed the wrong name into the system and I need you to fix it”, “IT wants to upgrade all our systems to Windows Insanity and that will take 3 weeks”, and so on. You’d also fire that person.

And yet we treat our software like the above. If something happens outside the domain, we congratulate ourselves by saying: “look, we caught this exceptional case of failing to read from a file, we clearly printed an error message and exited.” Really? That’s what the user of the software wanted, an error message? No, directly or indirectly, the user needed the information from that file, and while your error handling probably prevented greater harm, at best you can say your program made the user less unhappy.

Let’s phrase it a little more rigorously, but not too much so. We have a goal G we want accomplished. We have a program P to get us to that goal. And we have an environment E that, alas, we don’t get to control.

G = P(E)

That looks simple. And if the environment were completely known to us, then creating program P is a math problem. It might be a very hard math problem, but it is a math problem.

However, the difference between theory and practice is that, in theory, theory and practice are the same thing. We don’t get to specify our environment. I want to stop using the word control because, to some extent, we can affect the environment, we can provide inputs to the environment, we just can’t determine the environment. Here’s our challenge – if we supply a different environment to our program than we predicted, we will get a different result. Note that I’m only talking about E containing the parts of the universe that are relevant to the operation of P.

G’ = P(E’)

If G = G’, then we’re good, and this could happen either through luck, or through conscious effort on our part to write multiple programs that each handle a different environment, then concatenate them together into a final program.

Of course, we are pretty finite individuals, and our ability to write programs for each possible environment is limited, much less our ability to predict possible environments. In point of fact, we can’t. Our ability to predict relates to our ability use models to extrapolate, and we don’t have all the models yet, and probably never will.

There has been some progress recently through the emulation of mechanisms we see in nature – through evolution, organisms have found very clever ways to handle an unpredictable world. However, this process is very slow, and involves lots and lots of individual organisms failing. Since we can’t make a credible simulation of the world (that modeling issue again), our software agents need to learn in the real world, and that can be very expensive for many kinds of problems. Imagine rocket software that learned through trying different things to see what happened; I don’t think we’d be happy with the outcome.

Intelligence would also help greatly in this, but that’s also a little (or a lot) out of our reach.

So the question is, what can we do to write non-sentient software that can still accomplish the goal even when the environment is stacked against it? I’ll explore some ideas in the next article in this series. We’re going to start very small but with something meaningful – file I/O – and see if we can apply a technique I call “programming with expectations”.

Easy way to create .mailmap for git log

The git log features have a way to remap author and committer names and email address into canonical names, via the creation of a .mailmap file. For more details, see the help for git shortlog.

So, you know you need one of these – how do you create it? Well, use the output from git shortlog to do this, and feed it into perl or sed to remove the counts at the beginning of each line. For example

git shortlog -se | perl -ple "s/^\s*\d+\t//"  > .mailmap.txt

(on Windows, you’ll need to use double-quotes “” instead of single-quotes ” to contain the string).

This leaves you with the start of the .mailmap file. Now, you just edit it; pick the canonical name you want, and create any mappings needed to turn the non-canonical entries into your canonical ones.

Note that this can be very tedious for large repos with hundreds or even thousands of contributors (the git.git repo has just over 1200 contributors as of 2014). You might want to focus on the oddballs. In that case, add -n to the shortlog invocation – this will sort by occurrence instead of alphabetically. This way, you can go from the bottom of the list until it’s “good enough”. Once you’ve edited it, you probably want to sort it alphabetically, to make future life easier.

You can commit the .mailmap file to your repository so that others can benefit from what you’ve done, or you can leave it for just your use.

As for updating it in the future, I’d suggest diffing the output of shortlog “now” and back at the point in time where you last updated the .mailmap, to see if you have any new authors that need remapping.

Note that mailmap files aren’t automatically used by anything other than shortlog. For example, for git log, you either need to put it in the config, or add it to the git log command line

git log --use-mailmap [more options]


SCons tidbits

Here are a few things I learned that don’t appear to be in the documentation.

Passing variables to SConscripts

The documentation states two ways to make variables available for import:

SConscript('build/src/SConscript', exports = 'env')

which exports a variable for just this SConscript to import, or


which adds to a global export list that all SConscripts can import from. They can be combined, and variables exported in the SConscript line take precedence over ones in the global list.

However, there is a third way to do this that is undocumented as far as I can tell

SConscript('build/src/SConscript', 'env')

which is the same as the first method, just sans the export keyword. And of course, it can be a list of variables, and they can be remapped.


This are not documented, but is useful. Detect (or env.Detect) is the call that is used to find an executable. It’s called Detect() because it’s used by the tool config system to see if a tool is installed. It searches through the paths in the PATH environment variable.

The scalar version returns the path to the executable, if the executable can be found in the system:

path = env.Detect('protoc')

The list version returns the path to the first executable that can be found. This is useful if there’s a single conceptual tool that might have multiple names:

path = env.Detect(['protoc', 'protoc9', 'protoc10'])

The latter would only be used for cases where each variant is equivalent in functionality, because it will just pick the first one found. Alternatively, there could be a case for listing from superset to subset tool, if your usage code can detect and handle fallbacks, but it’s probably better to do that with individual Detect() calls.

You can pass in the file with extension, if you only want to find that exact name. Otherwise, SCons will add the extension appropriate to the operating system (on Windows, it will iterate through PATHEXT).

The idiom is to do this in your tool’s exists() function, if your tool is a wrapper around an installed program.