Fernando J. Pereda’s blag

June 11, 2008

How I migrated Paludis to Git

Filed under: blag — Tags: , , — Fernando J. Pereda @ 7:41 pm

Paludis has been using (and it currently still uses) Subversion to manage its source. We’ve been using git-svn for some months now and recently ciaranm agreed to fully migrate to Git.

To migrate the repository I used my old git-svn clone. This made some stuff a bit trickier, but it was both faster and nicer with pioto‘s server. Things that had to be done:

  • Remove ChangeLog and ChangeLog.old.bz2
  • Remove metadata added by git-svn
  • Rewrite authors and emails since I didn’t use an authors-file
  • Remove empty commits (these are commits that only touched ChangeLog or ChangeLog.old.bz2)

This looks like a good task for git-filter-branch. I probably could have done everything in one go, but I decided to do it one at a time.

Since filter-branch is mostly IO-bound, we’ll try to speed it up as much as possible:

$ sudo mount -t tmpfs -o size=100M none paludis/.git-rewrite
$ myrefs="0.4 0.6 0.8 0.20 0.24 0.26 ....."

The first task is easy:

$ git filter-branch -f --tree-filter 'git update-index --remove ChangeLog' $myrefs
$ git filter-branch -f --tree-filter 'git update-index --remove ChangeLog.old.bz2' $myrefs

To remove metadata created by git-svn I came up with:

tac | sed -n -e '1d' -e '/[^[:blank:]]/,$p' | tac

but dleverton came up with this perl one-liner, and I used it instead:

perl -ne 'print @blanks, $last and undef @blanks if defined $last; if (m/\S/) { $last = $_ } else { undef $last; push(@blanks, $_) }'

I put it in a file and ran:

git filter-branch -f --msg-filter ~/munge-commit-message $myrefs

Changing authors needed a script like the following (with proper mail-addresses):

case ${GIT_AUTHOR_NAME} in
        ciaranm)   n="Ciaran McCreesh"      ; m="foo@bar.com" ;;
        spb)       n="Stephen P. Bennett"   ; m="foo@bar.com" ;;
        halcyon)   n="Mark Loeser"          ; m="foo@bar.com" ;;
        allanonjl) n="John N. Laliberte"    ; m="foo@bar.com" ;;
        steev)     n="Stephen Klimaszewski" ; m="foo@bar.com" ;;
        kugelfang) n="Danny van Dyk"        ; m="foo@bar.com" ;;
        ferdy)     n="Fernando J. Pereda"   ; m="foo@bar.com" ;;
        arachnist) n="Robert S. Gerus"      ; m="foo@bar.com" ;;
        drizzt)    n="Timothy Redaelli"     ; m="foo@bar.com" ;;
        djm)       n="David Morgan"         ; m="foo@bar.com" ;;
        pioto)     n="Mike Kelly"           ; m="foo@bar.com" ;;
        piotr)     n="Piotr Rak"            ; m="foo@bar.com" ;;
        rbrown)    n="Richard Brown"        ; m="foo@bar.com" ;;
        baptux)    n="Baptiste Daroussin"   ; m="foo@bar.com" ;;
        eroyf)     n="Alexander Færøy"      ; m="foo@bar.com" ;;
        compnerd)  n="Saleem Abdulrasool"   ; m="foo@bar.com" ;;
        omp)       n="David Shakaryan"      ; m="foo@bar.com" ;;
        dleverton) n="David Leverton"       ; m="foo@bar.com" ;;
        peper)     n="Piotr Jaroszyński"    ; m="foo@bar.com" ;;
        dev-zero)  n="Tiziano Müller"       ; m="foo@bar.com" ;;
        zlin)      n="Bo Ørsted Andresen"   ; m="foo@bar.com" ;;
        buildtest) n="Nightly Buildtest"    ; m="foo@bar.com" ;;
        flameeyes) n="Diego Pettenò"        ; m="foo@bar.com" ;;
        iluxa)     n="Ilya Volynets"        ; m="foo@bar.com" ;;
        dercorny)  n="Stefan Cornelius"     ; m="foo@bar.com" ;;


git commit-tree "$@"

and ran:

$ git filter-branch -f --commit-filter ~/rewrite-authors $myrefs

Removing empty commits requires a bit more foo:

        while [[ -n $1 ]] ; do
                map "$1"

our_parent_tree=$(map $3)

if [[ -z ${our_parent_tree} ]] || [[ -n $(git diff-tree ${our_tree} ${our_parent_tree}:) ]] ; then
        git commit-tree "$@"
        skip_commit "$@"

This one could have just tested whether the current tree is the same as our parent’s tree (that is, no changes were made by this commit):

[[ ${our_tree} == $(git rev-parse $(map $3):) ]]

But it wouldn’t have made a big difference and I noticed it while filter branch was already running something like:

$ git filter-branch -f --commit-filter '. ~/empty-commits.bash' $myrefs

There’s still stuff to do like tags and adding scratch and probably converting the overlay; but the big thing is done. I think that history is stable already, that is, I won’t have to rewrite it again.

It is sitting in my home in bach and will be published soon.

Update: Re-tagging every paludis was the last step. I thought it was going to be cumbersome and boring, however, git makes this kind of stuff pretty easy. Since ciaranm should sign the tags himself, I did:

$ git log --pretty=oneline origin/releases |
> sed -n -e '/^\([0-9a-f]\{40\}\) Tag\( release\)\? \(.*\)/s--\3|\1|Tag release \3-p' \
> > ~/paludis-git-tags

After some hand editing of the file, creating the tags can be done with something like:

$ while read name msg head ; do
> git tag -m "${msg}" ${name} ${head} ;
> done < paludis-git-tags

To checkout an exact version (assuming ~/git/paludis is your repo, doesn’t have to be a local repo):

$ cd somewhere
$ git archive --format=tar --remote=~/git/paludis --prefix=paludis- 0.4.0 0.4.0 | tar xf -

— ferdy


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Blog at WordPress.com.

%d bloggers like this: