Downsize and rebuild repos with Mercurial

The situation is: you have a project that requires a lot of binary assets (photoshop files in my case) and they are changing a lot. You think "It would be awesome to keep them all in your repo, because you'd always know where the latest version is!"

Then 2 months later you have used mv a few times, you've changed a single layer on every PSD four or five times and you are making some awesome commit messages about it all. That is, until you realise your repo is 140mb. For a few HTML/CSS templates. Whoops.

We can fix this assuming two main points:

  1. You know where the files you don't want were stored within your repo (relative to the root) throughout the whole history.
  2. You have the ability to force every user of the repo to delete their current repo and for them to pull a brand new one.

Neither are particularly painful (though number 2 might be for larger systems) so we should, usually, be okay. What we're going to do is use the hg convert Mercurial extension to rebuild the repo, sans files we don't want. The reason why this causes concern number two is because each commit hash is based on the files that are changed within that commit. By removing a file (or more) from a commit the hash is going to be different. This is why the new repo will be completely incompatible with the existing one even if all the files (bar exclusions) are the same and have been through the same history of changes.

What the extension does is create a brand new repo which starts at the first commit of your existing repo and runs through each commit applying the changes to all files, except those we want to exclude.

First step is to ensure you have the convert extension running. The easiest way to do this is edit your .hgrc file and add the following lines:

[extensions]
hgext.convert=

Next we're going to create an exclude list (a filemap) which will specify what we're going to ignore in our new repository. This file can be called anything and just requires the following lines for each file or directory to exclude:

exclude "src/"

This will exclude every file (and directory) within the src/ directory relative from from the root of the repo. You can also exclude a single file.

exclude ".DS_Store"
exclude "src/.DS_Store"

There is no support for glob so don't bother trying exclude "*.psd" as it doesn't work. Been there. This part made the job a little difficult as the directories I needed to remove had been moved around a lot. Luckily there were only 20 or so commits so I went through them and added all the locations where there was a .psd file to my exclude folder. You could build this list with something like hg log | grep .psd in no time for a large repository.

Once you have the file ready we run the command to build a new repo with our exclusions.

hg convert old-repo new-repo --filemap exclude_list

This process may take a while, depending on the size of your repo and it's history length, as it needs to apply every changeset. When complete you have a brand new repo, exactly how you need it. I managed to take my 140mb repo all the way down to 7.5mb by removing all the PSD files. I also took the opportunity to remove all .DS_Store files as well!