Vagrant for Windows images

On Linux, Vagrant (https://docs.vagrantup.com/v2) is nice – you have a simple vagrant script, run it, and out pops a completely fresh Linux image. This makes maintaining systems easy – as long as you keep per-machine data largely on servers or off the root image, you can always be using brand-new fit-to-purpose systems. And to that purpose, there are a large number of base boxes you can build your Vagrant-customized VM on top of:

http://www.vagrantbox.es/

I’d like the same for Windows, for a large number of reasons. Vagrant is ALMOST that tool.

http://kamalim.github.io/blogs/how-to-create-you-own-vagrant-base-boxes/

http://www.thomasvjames.com/2013/09/create-a-windows-base-box-for-vagrant/

http://serverfault.com/questions/331953/using-vagrant-and-chef-to-setup-a-windows-vm-in-ubuntu

http://www.haidongji.com/2013/04/01/setting-up-windows-development-environment-with-virtualbox/

http://tech.wonga.com/blog/blog-view/create-windows-virtual-machines-with-vagrant-and-virtualb

Vagrant 1.6 or later supports Windows fairly well, there are workarounds if you use older versions of Vagrant

https://github.com/WinRb/vagrant-windows

You can boot Windows from a VHD disk image, meaning you can run Vagrant to create your image, then actually boot and run from it, rather than running it in a VM.

http://blogs.msdn.com/b/knom/archive/2009/04/07/windows-7-vhd-boot-setup-guideline.aspx

http://blogs.technet.com/b/haroldwong/archive/2012/08/18/how-to-create-windows-8-vhd-for-boot-to-vhd-using-simple-easy-to-follow-steps.aspx

http://technet.microsoft.com/en-us/library/hh825691.aspx

http://www.zdnet.com/blog/bott/how-to-use-a-vhd-to-dual-boot-windows-8-on-a-windows-7-pc/4847

http://www.hanselman.com/blog/LessVirtualMoreMachineWindows7AndTheMagicOfBootToVHD.aspx

Your host OS has to be Windows 7 Ultimate/Enterprise, Windows 8, or Windows Server 2008/2012. I think.

 

Two programming challenges

First off, these aren’t programming challenges in the classic sense. Second, you do real work. So, get to work!

Matsano Crypto Challenges has you write attacks on cryptographic algorithms and systems. It requires no pre-existing crypto knowledge, and a very moderate level of math (about 9th grade level).

The Eudyptula Challenge has you work on the Linux kernel, including having patches accepted. Think of it as a guide to “how can I contribute to the Linux kernel?”, or, really, any large open-source project. It does require you to be a fairly strong C programmer.

 

Python 2.7 end-of-life extended to 2020

Guido Van Rossum evidently announced at PyCon 2014 that Python 2.7 would be supported through 2020 (the previous cut-off date was 2015).

http://www.i-programmer.info/news/216-python/7179-python-27-to-be-maintained-until-2020.html

A HackerNews thread started by intimating that this was partly in release to RedHat needing long-term support for the version of Python in RHEL 7, and that version of Python will almost certainly be Python 2.7.

https://news.ycombinator.com/item?id=7581434

I doubt that it was anything more than a very minor contributing factor.

Software that always works (part 1)

This is part one of a long series. This is a preamble explaining the background.

Many of the advances that humans have made in the past 50 years have been made possible through the use of computer programs, or software. Many of the problems we’ve faced in the past 50 years have been through software that fails to work as intended. I think the focus in remedying that has been too narrow. We’ve been focusing on writing program without bugs, or writing programs that meet all the specifications. This is both impossible and setting our goals too low. We will never be able to write large programs without bugs, and even if we could write programs that correctly met all the specifications, we are unable to actually list all the specifications out in advance. And even if we could do that, hardware fails in the real world, and that should not be an excuse for software to fail to function. Our actual goal is to write software that accomplishes its tasks regardless of any internal or external problems.

This is a very hard task. But it’s the actual meaningful task.

Imagine that you hired someone to run your accounting department. You give them the task of making payroll happen. Let’s say you go to this person four weeks later, to find out that nothing has happened for the past two payroll periods, and the response was “the printer ran out of paper, so I couldn’t mail out checks.” You would not think “man, I failed, I should have made sure paper was available”. You’d fire that person. Or let’s say that every few hours this person stops working and calls you because something is blocking the work: “the N key on the keyboard doesn’t work”, “the cleaning person turned off the power to my computer”, “I typed the wrong name into the system and I need you to fix it”, “IT wants to upgrade all our systems to Windows Insanity and that will take 3 weeks”, and so on. You’d also fire that person.

And yet we treat our software like the above. If something happens outside the domain, we congratulate ourselves by saying: “look, we caught this exceptional case of failing to read from a file, we clearly printed an error message and exited.” Really? That’s what the user of the software wanted, an error message? No, directly or indirectly, the user needed the information from that file, and while your error handling probably prevented greater harm, at best you can say your program made the user less unhappy.

Let’s phrase it a little more rigorously, but not too much so. We have a goal G we want accomplished. We have a program P to get us to that goal. And we have an environment E that, alas, we don’t get to control.

G = P(E)

That looks simple. And if the environment were completely known to us, then creating program P is a math problem. It might be a very hard math problem, but it is a math problem.

However, the difference between theory and practice is that, in theory, theory and practice are the same thing. We don’t get to specify our environment. I want to stop using the word control because, to some extent, we can affect the environment, we can provide inputs to the environment, we just can’t determine the environment. Here’s our challenge – if we supply a different environment to our program than we predicted, we will get a different result. Note that I’m only talking about E containing the parts of the universe that are relevant to the operation of P.

G’ = P(E’)

If G = G’, then we’re good, and this could happen either through luck, or through conscious effort on our part to write multiple programs that each handle a different environment, then concatenate them together into a final program.

Of course, we are pretty finite individuals, and our ability to write programs for each possible environment is limited, much less our ability to predict possible environments. In point of fact, we can’t. Our ability to predict relates to our ability use models to extrapolate, and we don’t have all the models yet, and probably never will.

There has been some progress recently through the emulation of mechanisms we see in nature – through evolution, organisms have found very clever ways to handle an unpredictable world. However, this process is very slow, and involves lots and lots of individual organisms failing. Since we can’t make a credible simulation of the world (that modeling issue again), our software agents need to learn in the real world, and that can be very expensive for many kinds of problems. Imagine rocket software that learned through trying different things to see what happened; I don’t think we’d be happy with the outcome.

Intelligence would also help greatly in this, but that’s also a little (or a lot) out of our reach.

So the question is, what can we do to write non-sentient software that can still accomplish the goal even when the environment is stacked against it? I’ll explore some ideas in the next article in this series. We’re going to start very small but with something meaningful – file I/O – and see if we can apply a technique I call “programming with expectations”.

Easy way to create .mailmap for git log

The git log features have a way to remap author and committer names and email address into canonical names, via the creation of a .mailmap file. For more details, see the help for git shortlog.

So, you know you need one of these – how do you create it? Well, use the output from git shortlog to do this, and feed it into perl or sed to remove the counts at the beginning of each line. For example

git shortlog -se | perl -ple "s/^\s*\d+\t//"  > .mailmap.txt

(on Windows, you’ll need to use double-quotes “” instead of single-quotes ” to contain the string).

This leaves you with the start of the .mailmap file. Now, you just edit it; pick the canonical name you want, and create any mappings needed to turn the non-canonical entries into your canonical ones.

Note that this can be very tedious for large repos with hundreds or even thousands of contributors (the git.git repo has just over 1200 contributors as of 2014). You might want to focus on the oddballs. In that case, add -n to the shortlog invocation – this will sort by occurrence instead of alphabetically. This way, you can go from the bottom of the list until it’s “good enough”. Once you’ve edited it, you probably want to sort it alphabetically, to make future life easier.

You can commit the .mailmap file to your repository so that others can benefit from what you’ve done, or you can leave it for just your use.

As for updating it in the future, I’d suggest diffing the output of shortlog “now” and back at the point in time where you last updated the .mailmap, to see if you have any new authors that need remapping.

Note that mailmap files aren’t automatically used by anything other than shortlog. For example, for git log, you either need to put it in the config, or add it to the git log command line

git log --use-mailmap [more options]

 

Assignment operator could not be generated

What does this warning mean, and how do you fix it?

warning C4512: '<some type>' : assignment operator could not be generated

The compiler will auto-generate some class members for you

  • default constructor (if no other constructor is explicitly declared)
  • destructor
  • copy constructor (if no move constructor or move assignment operator is explicitly declared)
  • copy assignment operator (if no move constructor or move assignment operator is explicitly declared)

C++ 11 added two new auto-generated class members (and it added “if destructor then copy constructor and copy assignment operator generation is deprecated”):

  • move constructor (if no copy constructor, move assignment operator or destructor is explicitly declared)
  • move assignment operator (if no copy constructor, copy assignment operator or destructor is explicitly declared)

Compiler-generated functions are public and non-virtual. As a reminder, here are the signatures of all of these functions:

class Object {
    Object();                               // default constructor
    Object(const Object& other);            // copy constructor
    Object(Object&& other);                 // move constructor
    Object& operator=(const Object& other); // copy assignment operator
    Object& operator=(Object&& other);      // move assignment operator
    ~Object();                              // destructor
};

So, what if you can’t actually create a meaningful copy assignment operator? For example, if you have const data, you can’t assign to it. Remember that the auto-generated copy assignment operator just generates assignment operator code for each member of the class, recursively, and you can’t assign to const int, you can only construct it.

struct ConstantOne
{
  ConstantOne() : value(1) {}
  const int value;
};

int main(int /*argc*/, char ** /*argv*/)
{
  ConstantOne b;

  return 0;
}

This will give you a warning when you compile, because the auto-generated assignment operator is technically illegal, and so the compiler won’t generate it. It’s a warning, because your code probably doesn’t need an assignment operator. For Visual C++, you’ll see something like this:

warning C4512: 'ConstantOne' : assignment operator could not be generated

You have several options. The easiest is just to declare an assignment operator without a body. As long as you never actually try to use the assignment operator, you’ll be fine. And, the contract for this object says that assignment would be illegal anyway, so you’ll get a valid-to-you compile error if you accidentally try to use it.

struct ConstantOne
{
  ConstantOne() : value(1) {}
  const int value;
private:
  ConstantOne& operator=(const ConstantOne&);
};

int main(int /*argc*/, char ** /*argv*/)
{
  ConstantOne b;
  ConstantOne c;
  c = b;

  return 0;
}

The standard is to make these private, to reinforce that they are not meant to be used. If you compile code with an assignment operator, you’ll get a compile-time error.

error C2248: 'ConstantOne::operator =' : cannot access private member declared in class 'ConstantOne'

And in C++11, there’s even a keyword to add here to declare that it indeed should not be allowed:

struct ConstantOne
{
  ConstantOne() : value(1) {}
  const int value;
  ConstantOne& operator=(const ConstantOne&) = delete;
};

Note that you don’t need the trickery of making it private, and you get a nicer compile-time error if you try to use the assignment operator.

This happens in big real-world projects quite often. In fact, it happens enough that the delete keyword was added in C++11. Visual Studio 2013 and up, GCC 4.7 and up, and Clang 2.9 and up support the delete and default keywords.

Now, there is another approach to the assignment operator when you have const data – generate an assignment operator that can write to const data with const_cast. You’re almost always doing the wrong thing if you do this, but sometimes it has to be done anyway. It looks horrible, and it is horrible. But maybe it’s necessary.

struct ConstantOne
{
  ConstantOne() : value(1) {}
  const int value;
  ConstantOne& operator=(const ConstantOne& that)
  {
    if (this != &that)
    {
      *const_cast(&this->value) = that.value;
    }
    return *this;
  }
};

int main(int /*argc*/, char ** /*argv*/)
{
  ConstantOne b;
  ConstantOne c;
  c = b;

  return 0;
}

The reason this is horrible is that you are violating the contract – you’re altering a constant value in the LHS of the equation. Depending on your circumstance, that can still be a correct thing to do.

Weekend tidbits

The spiped secure pipe daemon

Spiped (pronounced “ess-pipe-dee”) is a utility for creating symmetrically encrypted and authenticated pipes between socket addresses, so that one may connect to one address (e.g., a UNIX socket on localhost) and transparently have a connection established to another address (e.g., a UNIX socket on a different system). This is similar to ‘ssh -L’ functionality, but does not use SSH and requires a pre-shared symmetric key.

Spaced Repetition

One of the most fruitful areas of computing is making up for human frailties… My current favorite prosthesis is the class of software that exploits the spacing effect, a centuries-old observation in cognitive psychology…The spacing effect essentially says that if you have a question (“What is the fifth letter in this random sequence you learned?”), and you can only study it, say, 5 times, then your memory of the answer (‘e’) will be strongest if you spread your 5 tries out over a long period of time – days, weeks, and months

Google Now

Need to set it up on iOS or Android first, even if you mostly access it from a desktop computer.

Alan Kay talk: The Future Doesn’t Have To Be Incremental. The stuff we have now was invented by a few dozen people over 5 years.

Alan Kays’s Reading List.

Understanding the Linux Virtual Memory Manager, Mel Gorman, 2007.

Build your own Lisp. Write a Lisp interpreter in 1000 lines of C.

America’s Young Adults at 27: Labor Market Activity, Education, and Household Composition: Results From a Longitudinal Survey Summary

GitBook

 

SCons tidbits

Here are a few things I learned that don’t appear to be in the documentation.

Passing variables to SConscripts

The documentation states two ways to make variables available for import:

SConscript('build/src/SConscript', exports = 'env')

which exports a variable for just this SConscript to import, or

Export('env')
SConscript('build/src/SConscript')

which adds to a global export list that all SConscripts can import from. They can be combined, and variables exported in the SConscript line take precedence over ones in the global list.

However, there is a third way to do this that is undocumented as far as I can tell

SConscript('build/src/SConscript', 'env')

which is the same as the first method, just sans the export keyword. And of course, it can be a list of variables, and they can be remapped.

Detect()

This are not documented, but is useful. Detect (or env.Detect) is the call that is used to find an executable. It’s called Detect() because it’s used by the tool config system to see if a tool is installed. It searches through the paths in the PATH environment variable.

The scalar version returns the path to the executable, if the executable can be found in the system:

path = env.Detect('protoc')

The list version returns the path to the first executable that can be found. This is useful if there’s a single conceptual tool that might have multiple names:

path = env.Detect(['protoc', 'protoc9', 'protoc10'])

The latter would only be used for cases where each variant is equivalent in functionality, because it will just pick the first one found. Alternatively, there could be a case for listing from superset to subset tool, if your usage code can detect and handle fallbacks, but it’s probably better to do that with individual Detect() calls.

You can pass in the file with extension, if you only want to find that exact name. Otherwise, SCons will add the extension appropriate to the operating system (on Windows, it will iterate through PATHEXT).

The idiom is to do this in your tool’s exists() function, if your tool is a wrapper around an installed program.