To-do: make ‘git rebase’ a first-class citizen

git rebase is awesome, because it lets you fix your history after the fact. It’s not about rewriting history, it’s about improving it so that it makes more sense, or so that it’s composable.

But git rebase also breaks workflows. If you rebase work that others have built on, you create problems for them, because Git will get confused about how to merge. That’s bad, because a lot of Git’s power is based on the idea of constant branching and merging.

One of my pet projects is to figure out how to allow both constant branch/remerge and rebase to co-exist happily. I have no clear idea how to do this, just some vague ones. At some point, I’ll work on this. But if someone else were to do it before me, I’d be just as happy.

Hint, hint.

Patent discussion from 1869

TechDirt has an article on historical arguments about patent law that’s worth reading:

Discussions On The Abolition Of Patents In The UK, France, Germany And The Netherlands, From 1869

This is excerpting from an book written in 1869 that sounds like it was written today. I particularly liked this bon mot:

We acknowledge that the man who first constructed a hut was perfectly right in making good his claim against those who would have deprived him of it, and that he was justified in vindicating his claim by force. He had employed his time and strength in building this hut; it was undoubtedly his, and his neighbours acted up to their natural rights and in their own interests in helping him to oppose the intruder. But there ended both the right of the individual and that of the community.

If this first man, not content with claiming his hut had pretended that the idea of building it belonged exclusively to him, and that consequently no other human being had a right to build a similar one, the neighbours would have revolted against so monstrous a pretension, and never would have allowed so mischievous an extension of the right which he had in the produce of his labour….

And if, in our day, imitation of an invention is not generally considered as guilty an act as robbery of tangible property, it is because every one understands the difference between an idea and a thing made or done.

I am pretty anti-patent, and not really pro-copyright. I think that copyright is OK only if the period for copyright is really limited, to 30 years or less (maybe 15 years with one extension that can be filed). And for patents, we probably shouldn’t have them. But if we feel we have to, then patent lifetimes need to be pretty short. I don’t see the need for software patents at all. and os.symlink for Win32 Python

For some reason, the Win32 Python doesn’t implement hardlinks and symlinks, even though it’s there in the operating system as of Windows Vista. Here’s a simple version that’s a demonstration (a real version should memoize the lookup for the two functions, and probably handle Unicode, maybe some more error checking etc etc).

import os
import platform
import sys

source = sys.argv[1]
dest = sys.argv[2]
type = sys.argv[3]

def CreateHardLink(src, dst): 
  import ctypes
  flags = 1 if source is not None and os.path.isdir(src) else 0
  if not ctypes.windll.kernel32.CreateHardLinkA(dst, src, flags):
    raise OSError 

def CreateSymbolicLink(src, dst):
  import ctypes
  flags = 1 if source is not None and os.path.isdir(src) else 0
  if not ctypes.windll.kernel32.CreateSymbolicLinkA(dst, src, flags):
    raise OSError

if platform.system() == 'Windows':
  print 'hi there' = CreateHardLink
  os.symlink = CreateSymbolicLink

if type == 'link':, dest)
elif type == 'symlink':
  os.symlink(source, dest)
  raise Exception('what?')

The Consensus Monte Carlo Algorithm

A paper was released recently:

Bayes and Big Data: The Consensus Monte Carlo Algorithm

This describes an approach to doing Monte Carlo on big data sets that requires little communication between machines. This is important if you want a problem to scale. Sounds cool, but also not new, and evidently not as universal as it seems.

Hacker News has a thread on this at:

which had some posts by what appear to be experts in the field. What I found unique about this thread was that the snark was reigned in really quickly and the comments were useful. I’m not an expert by any means. What I see is some validity to the complaints, but not enough attention paid to practical versus theoretical; perhaps what these people discovered wasn’t breaking ground in a theoretical sense, but can it be used in a practical fashion?

I found the paper more readable than many other papers of its ilk.

There’s another complaint levied against the paper, and that is that our scale of data may make Bayesian inference computationally intractable:

It’s not really a complaint against the paper as much as a fear that the technique may become irrelevant in the near future.

Mercurial versus Git part 999

Here’s an interesting blog post by one of the Mercurial developers, in response to some questions from Git partisans grumbling about having to use Mercurial.!msg/codereview-discuss/ilUffSph68I/NCldEt2Ii-4J

I still think Git is better at the foundation and in usage, but there are things I would steal from Mercurial. One thing Git really needs is a way to make rebase palatable even after you’ve pushed to others. Mercurial has a feature called changeset evolution that might be what I want: Another interesting feature in Mercurial is that you can tag commits with “secret” (hg phase-fs) to prevent them from being pushed by default.

I need to do some timing tests to see what happens to Mercurial when you have repositories with lots of files in them (1 million+), since their manifest is flat, as opposed to using the direntry-style that Git uses (tree objects). But on the other hand, Git has a lot of tree objects due to this.