Category Archives: Revision control

Fun with Git tags

Git tags have several uses to me.

There’s the classic use of “here’s something we released in the past”. It doesn’t need a branch, because it’s no longer under development, but you may need to refer to it at some point. Presumably you have some regular patterns for naming tags, and perhaps you use annotated tags to contain release information.  It’s just good release practice to have branches be for active development only, because you can always create a branch from a tag if you need to start doing work on it again.

There’s another use of “I have some dead/obsolete development work, but I’d still like it to stick around in the permanent record”. I prefer this to spelunking in the reflog, because sooner or later you’ll garbage-collect, and if there are no live references to commits, those commits will go away. Obviously, you should not keep actual garbage, but a historical record can be a valuable thing. And when you tire of that history, you can delete it just by removing the tags. I switched to this instead of keeping branches around, and it makes my repos feel a bit cleaner.

Lightweight tags have the advantage of not actually being blobs, but simply associating a string with a commit. Annotated tags let you add extra information, in the form of a commit message, and there are other benefits as well (you can sign tags, for example). I see both as valuable for both kinds of tags. Some projects only use annotated tags – for example, in looking through the Git source itself, it seems like all the tags are annotated tags. My preference is to just have annotated tags.

There’s one troublespot where it comes to sharing tags, and that is that tags are in a single namespace, unlike branch refs. Since people rarely share tags, this isn’t an issue. But if you fetch tags from a remote repository, they go into the same .git/refs/tags location as your local tags. One suggestion I saw that was interesting was to have a pattern for naming tags based on remotes, so that you could keep your tags separate from pulled-in remote tags. It’s not automatic, though, you have to do it manually. There aren’t common workflows yet around sharing tags, as far as I know.

While tags are normally stored in .git/refs/tags, if you look in that directory, you might only see a few tag files. Refs (tags and branches) can be packed up into a single .git/packed_refs file for efficiency’s sake, and this works very well for tags, since tag refs normally never change. A ref will get unpacked if it needs to change. This can be done manually with git pack-refs, or a git gc will also do it when it runs automatically.

As of Git 1.9.0, git fetch –tags fetches both branches and tags. By itself, git fetch will only get tags referenced by commits that are brought down, but it won’t bring down new tags pointing to commits that you already have. One down-side to git fetch –tags is that it will fetch and replace all tags. Normally this is fine, but may be dangerous if you have multiple remotes attached to a single repository, especially if those remotes are disjoint. Just keep this in mind that you may need to explicitly pull tags in some cases.

See a separate post I have yet to write about git log/git rev-list and proper use of –all, –branches, –tags and –remotes.

Examples

Create an annotated tag (assumes that the tag message is in the file <tagmessage>):

git tag -a release-1.5.1 -F <tagmessage>

Show the tag and/or related commit (for annotated tag, will show the annotated tag and then the commit; for lightweight tag, will show just the commit):

git show release-1.5.1

Show tags in <remote> repository, where <remote> is the name of a remote attached to your local repository:

git ls-remote --tags <remote>

Show the most recent annotated tag on the current branch:

git describe

Push a specific tag (and related objects) to a remote repository:

git push <remote> release-1.5.1

Push all tags not already in the remote repository:

git push <remote> --tags

Delete a tag in the local repository

git tag -d release-1.5.1

Delete a tag in a remote repository (note: this has the same perils as rebasing, others could be depending on this tag, but it’s not bad in and of itself):

git push <remote> :refs/tags/release-1.5.1

Reference

Git: git-tag

Git book: Git Basics – Tagging

Git Tag Mini Cheat Sheet Revisited

Git Tip of the Week: Tags

On the Perils of Importing Remote Tags in Git

Git Data File Formats

Git Internals – Maintenance and Data Recovery

StackOverflow: Git: distinguish between local and remote tags

Easy way to create .mailmap for git log

The git log features have a way to remap author and committer names and email address into canonical names, via the creation of a .mailmap file. For more details, see the help for git shortlog.

So, you know you need one of these – how do you create it? Well, use the output from git shortlog to do this, and feed it into perl or sed to remove the counts at the beginning of each line. For example

git shortlog -se | perl -ple "s/^\s*\d+\t//"  > .mailmap.txt

(on Windows, you’ll need to use double-quotes “” instead of single-quotes ” to contain the string).

This leaves you with the start of the .mailmap file. Now, you just edit it; pick the canonical name you want, and create any mappings needed to turn the non-canonical entries into your canonical ones.

Note that this can be very tedious for large repos with hundreds or even thousands of contributors (the git.git repo has just over 1200 contributors as of 2014). You might want to focus on the oddballs. In that case, add -n to the shortlog invocation – this will sort by occurrence instead of alphabetically. This way, you can go from the bottom of the list until it’s “good enough”. Once you’ve edited it, you probably want to sort it alphabetically, to make future life easier.

You can commit the .mailmap file to your repository so that others can benefit from what you’ve done, or you can leave it for just your use.

As for updating it in the future, I’d suggest diffing the output of shortlog “now” and back at the point in time where you last updated the .mailmap, to see if you have any new authors that need remapping.

Note that mailmap files aren’t automatically used by anything other than shortlog. For example, for git log, you either need to put it in the config, or add it to the git log command line

git log --use-mailmap [more options]

 

msysgit

I love Git. But msysgit, while not horrible, could be a lot better.

For example, there’s an extension called git-cheetah. When you install msysgit, that’s one of the questions for you to answer, and it’s asked in a nicely slanted way:

  • Simple context menu (Registry based)
  • Advanced context menu (git-cheetah plugin)

I’ve found people who say “avoid this like the plague”, and others who use it. It actually took a little while to find out that this is shell integration a la TortoiseSVN. Shell integration is wonderful, except when it breaks, and it’s just complex enough that most people can’t troubleshoot it to fix the problem.

And it’s really something that you could ignore. Do you use context menus while developing? If not, then this choice is irrelevant. If you do use context menus, then (1) they need to always work, and (2) you should be able to use either or both.

Another issue with msysgit is the dreadful out-of-the-box performance with ssh connections. This is particularly bad because ssh is a very common way to connect to remote git repositories; it’s quite nearly the default for pushing to remote repositories. So you’d think that, of all the things to make sure worked well, it would be SSH. But if you want good SSH, you need to drag PuTTY in and use pagent and plink.

There’s little every written about msysgit, I don’t know much about how it’s developed or what choices are made. On the other hand, the main Git development is about as visible as you can be.

Another reason I feel awkward about msysgit is that there’s a lot of whinging on the part of the main developer; at least as of several years ago, he clashed with the Git mainline developers, and was pretty rude and insulting to boot. I mean, irrespective of who’s right and wrong, you don’t cut off your oxygen. If you’re dependent on someone, you want them to look favorably on you.

I really wish that the mainline Git developers would take a more cross-platform approach and mindset, but they are Linux developers first and foremost. It’s not surprising, given that Git’s parent was Linus Torvalds; I definitely wouldn’t expect him to care about Windows or Mac. However, Git is becoming the best revision control system on the planet, and a lot of people use Windows and Mac machines, and they aren’t going to switch just because of Git.

We need a first-class Windows client for Git, but it has to be part of the mainline development, it can’t be some parallel development process. Those always start out well, and then die after a year or so.

To-do: make ‘git rebase’ a first-class citizen

git rebase is awesome, because it lets you fix your history after the fact. It’s not about rewriting history, it’s about improving it so that it makes more sense, or so that it’s composable.

But git rebase also breaks workflows. If you rebase work that others have built on, you create problems for them, because Git will get confused about how to merge. That’s bad, because a lot of Git’s power is based on the idea of constant branching and merging.

One of my pet projects is to figure out how to allow both constant branch/remerge and rebase to co-exist happily. I have no clear idea how to do this, just some vague ones. At some point, I’ll work on this. But if someone else were to do it before me, I’d be just as happy.

Hint, hint.

Mercurial versus Git part 999

Here’s an interesting blog post by one of the Mercurial developers, in response to some questions from Git partisans grumbling about having to use Mercurial.

https://groups.google.com/forum/#!msg/codereview-discuss/ilUffSph68I/NCldEt2Ii-4J

I still think Git is better at the foundation and in usage, but there are things I would steal from Mercurial. One thing Git really needs is a way to make rebase palatable even after you’ve pushed to others. Mercurial has a feature called changeset evolution that might be what I want: https://air.mozilla.org/changesets-evolution-with-mercurial/. Another interesting feature in Mercurial is that you can tag commits with “secret” (hg phase-fs) to prevent them from being pushed by default.

I need to do some timing tests to see what happens to Mercurial when you have repositories with lots of files in them (1 million+), since their manifest is flat, as opposed to using the direntry-style that Git uses (tree objects). But on the other hand, Git has a lot of tree objects due to this.