Aug 1 2012

Get your house in order with Mercurial

At some point in the last few years, I migrated all my code over to Mercurial, and basically never looked back.

In addition to code, I've also always maintained separate machine repositories, housing configuration files for miscellaneous colocation servers and home systems -- basically, anything that would consume a bunch of my time to rework from scratch. Make a change to something in /etc? Copy and commit it to the repo, with a comment on what warranted the change. Poor man's configuration management. (Side note: If you have > 1 machines and aren't at least doing this much with config files in ANY version control system, why not?)

This was obviously pretty far from foolproof, and horribly inefficient. What if I forgot to copy a modified file? If a machine needed rebuilding, how best to deploy the configs in the repo back to the system?

The problem grew when I started tracking my home directory. Far more files to manage, and I wanted a "base" set of config files that were slightly different on each of the machines I work on. Branching per workstation seemed the best method to initially deal with this, but I soon found myself spending far too much time in branch merges, rsyncing from repo to homedir and back, and writing helper scripts. There's gotta be a better way!

Yeah, there totally is.

Machine Configuration

The trick with both machine configuration and homedir management lies in exploiting Mercurials "directory walking" to find a repository. If you're currently in /usr/local/etc and run an hg command, Mercurial checks for a repository in /usr/local/etc, then /usr/local, then /usr, then /. It stops as soon as it finds one.

This means that if you initialize a new repository at the root of the filesystem, Mercurial will always find a repository, no matter where you're at. If you're actually working in a "regular" code repository elsewhere in the filesystem, Mercurial will see that first, so it behaves exactly as you'd expect it to.

Doing something like an 'hg stat' is of course very slow when walking an entire machine, so there's a second trick -- an .hgignore file that tells Mercurial to ignore everything by default. With this file in place, Mercurial just steps aside -- hg commands only operate on explicitly added files, and it's super quick.

A new machine repository is setup like so:

$ cd /
$ sudo hg init
$ echo '.*' | sudo tee .hgignore
$ sudo hg add .hgignore
$ sudo hg commit -u Mahlon -m "Initial commit of `hostname` repository"

And that's it. You'll have to be root to add files to or edit the repo. When there's something you want to track, just hg add it and commit. Even better, a backup of the important config files on a machine just becomes a simple:

$ cd /tmp; sudo hg bundle -a `hostname -s`.bndl

Or from a remote machine:

$ hg clone ssh://root@example.com// example.com

Redeploying configs to a fresh system has a few small, but arguably unintuitive additional steps. Mercurial is careful to not stomp on existing files, and cloning to / will give you a 'destination is not empty' error by default. This is a good thing, but we have to work around it by cloning elsewhere, then moving the .hg directory manually to /. There's no need to populate the resulting checkout directory with the files, so we pass the '-U' flag to clone.

$ hg clone -U /tmp/example.bndl /tmp/confs
$ sudo mv /tmp/confs/.hg /
$ rm -rf /tmp/confs

The repository state now needs to be resynced before we can use it, late binding the repository to the machine. (Thanks to Michael Granger for pointing this out.) Since Mercurial will find the repo in /, you can run these commands from anywhere on the system (that isn't within another Mercurial repo):

$ hg debugsetparent tip
$ hg debugrebuildstate

Finally, extract the files from the repo over the existing system files.

$ hg revert --all

Fwa-tow! Okay, moving onwards.

Homedir files

With per-machine config repos, you very rarely have a need for branching, or pulling changes from one place to another. It's possible, of course, but I like to think of it as a bonus, instead of a regular way to work with them.

My homedir config, on the other hand, is used between no less than 6 different systems. Different operating systems, different versions of applications, and different environments. I really don't want separate repositories for each permutation -- I want to check my homedir out and get to work.

The idea here is to have a foundation -- base configurations that are identical between all workstations, and then layer on changes needed for each workstation that can be tracked separately, but still carried with the primary repository. (This method works equally well for two workstations or twenty.)

Enter MQ.

MQ has the concept of "guards" -- tags you can apply to individual patches, that alter the default patch stack. If you have a patch (or patches) that change the base configurations for a workstation, you can guard it with an arbitrary label, then select/set that label for the workstations that it should be valid for. Machine "A"'s configurations are completely ignored when on machine "B", and visa versa.

I keep one primary repo that all machines with my homedir environment sync from (and to.) Initial setup looks like this -- very similar to the machine repo setup, but with the addition of MQ:

$ hg init repo/homedir
$ hg init --mq repo/homedir
$ cd repo/homedir
$ echo '.*' > .hgignore
$ hg add .hgignore
$ hg commit -m "Initial commit of homedir repository"

Before continuing, I recommend setting up a shell alias for working with MQ, as mentioned in the hgbook.

For each workstation, perform the same initial trick as above with the machine repos. You only have to do perform this 'bootstrapping' once per system. Note the use of qclone instead of clone, so we snag the MQ patch repo too.

$ cd ~
$ hg qclone -U repo/homedir /tmp/homedir
$ mv /tmp/homedir/.hg .
$ rm -rf /tmp/homedir
$ hg debugsetparent tip
$ hg debugrebuildstate

Add your dotfiles and whatever else you want as part of your base homedir foundation.

$ hg add .i3/*
$ hg add .bashrc .vimrc ...
$ ...
$ hg commit -m 'Added some files for all environments'
$ hg push

Ok. Now's where it gets interesting. Lets say you're on a machine called "hotsoup", and you want specific configurations that ONLY apply to it. Create a new MQ patch file and guard! You can call it whatever you want, but I like to name both the patches and the guards after the hostname they are supposed to apply to.

$ hg qnew hotsoup -m 'Configs specific to hotsoup'
$ hg qpop
$ hg qguard hotsoup +hotsoup
$ mq commit -m 'Initial commit of hotsoup patch'
$ mq push

Now that the patch is guarded, it won't apply to any machine that doesn't have a matching 'hotsoup' guard. So on the real 'hotsoup' machine, select the guard. You should see the following:

$ hg qselect hotsoup
number of guarded, applied patches has changed from 1 to 0

You only need to do this once, after the initial homedir repo qcloning. Repeat the process on any number of machines. One guard per workstation, and each workstation will only "see" its specific patch. If you want to commit new home directory files for all workstations, just hg qpop the machine patch and commit as normal. Changes while the patch is applied (hg qpush) will be applied only to that particular workstation.

Pretty. Dang. Nice.

Gotchas?

Mercurial doesn't track permissions, other than the executable bit. If some of your tracked files require specific perms, you'll need to save them on checkin and reapply them on checkout. There are lots of tools to do this, depending on your operating system. My preference is mtree for FreeBSD.

You can go a step further and add hooks to the checked out repository's .hgrc file, so retaining and reapplying permissions is automatic. This StackOverflow question outlines that method nicely!