Home  |   About  |   Energy  |   Politics  |   Software  |   Music

24 January 2013

Migrate a SVN repository to Git preserving tags

I've used SVN for code versioning for as long as I can remember. Both on Windows as on Linux, it has been perfect to manage and store source code and documents, especially in a context where few users commit changes and public access is only considered at the later stages of a project. In the MUSIC project the context has been quite different, there's a multitude of code projects that compose a larger system and public access to the code has been a requirement early on. One of my collegues suggested us to start using Git for the purpose. I remember when initialy reading about it that Git was like colour TV, once you've seen it you'd never want to go back. Although Git is far more complex than SVN, making it easier to mess up, it is indeed quite more powerful. Beyond that, the GitHub web/social repository gives coding a whole new meaning.

So here's a use case: migrate a local SVN repositoy to Git, correctly keeping tags identifying releases and then push it to GitHub. This happened to be not so easy, so here's this log entry for future reference.

After the first internet searches I learnt about a Ruby tool that supposedly does exactly what I needed: svn2git. Unfortunately this tool doesn't run on my system, Ubuntu 12.04 64bit, and so I had to try a more artisanal path.

Git itself has functionalities to interact with SVN repositories, but if you happen to have branches or tags things are not straightforward. If you follow the recomendations in the SVN book, your SVN repository may have a structure like this:

/trunk
/releases
  /v0.1
  /v0.2


Where in the trunk folder is the working version of the code and in releases are stored the tags created with svn copy. Applying the plain git svn clone command to a repository like this will simply reproduce this folder strucutre, replicating the code in each tag or branch, and it doesn't create any Git tags or branches.

But there are ways of correctly cloning an SVN repository using Git. It starts with a checkout of the SVN project into a temporary folder, in case it isn't already in the system (note the dummy names for the server and project, replaced them for those that may apply):

$ cd ~/temp

$ svn checkout http://myserver/svn/myproject ./myproject

According to some sources it is useful to extract author names before cloning the repository:

$ cd myproject

$ svn log -q | awk -F '|' '/^r/ {sub("^ ", "", $2); sub(" $", "", $2); print $2" = "$2" <"$2">"}' | sort -u > ../authors-transform.txt

$ cd ..

Edit the authors-transform.txt file and change the author names, my advice is to make them match user names at GitHub. Jump to John Albin's blog if you need more details.

Now for the cloning. The trick here is to include the tags flag specifying the folder containing the SVN tags. The authors file also comes into play at this time:

$ git svn clone http://myserver/svn/myproject --tags=releases -A authors-transform.txt myproject.git

If you move into the new repository you'll see that now, instead of the trunk and releases folders you only have the code itself, the original contents of trunk.

$ cd myproject.git

$ ls -la

But this isn't everything, while git svn clone was able to identify the SVN tags, it recreates them as remote branches and replicates each one of them, adding a prefix with the '@' character followed by the number of the commit that created the tag. This is what you might have:

$ git branch -r
v0.1
v0.1@51
v0.2
v0.2@67

Before going on let us get rid of these extra branches:

$ for t in `git branch -r | grep @`; do
   git branch -r -d $t
  done

Now the remaining remote branches can be converted into tags, as they should be. This little trick I must thank to Thomas Rast:

$ git for-each-ref --format="%(refname)" refs/remotes/tags/ |
 while read tag; do
  GIT_COMMITTER_DATE="$(git log -1 --pretty=format:"%ad" "$tag")" \
  GIT_COMMITTER_EMAIL="$(git log -1 --pretty=format:"%ce" "$tag")" \
  GIT_COMMITTER_NAME="$(git log -1 --pretty=format:"%cn" "$tag")" \
  git tag -m "$(git for-each-ref --format="%(contents)" "$tag")" \
   ${tag#refs/remotes/tags/} "$tag"
done

Check that the tags where successfully created and in that case delete all the remote branches:

$ git tag -l
v0.1
v0.2

$ for b in `git branch -r`; do
   git branch -r -d $b
  done

Everything is now set to push this project to GitHub, assuming a repository has already been created at the website:

$ git remote add origin https://github.com/myuser/myproject.git

$ git push -u origin master

One final step: by default git push doesn't send any local branches or tags upstream, it has to be made explicitly:

$ git push origin --tags

And that's about it, I hope it is useful.