June 11, 2008

How I migrated Paludis to Git

June 11, 2008

How I migrated Paludis to Git

Paludis has been using (and it currently still uses) Subversion to manage its source. We’ve been using git-svn for some months now and recently ciaranm agreed to fully migrate to Git.

To migrate the repository I used my old git-svn clone. This made some stuff a bit trickier, but it was both faster and nicer with pioto‘s server. Things that had to be done:

  • Remove ChangeLog and ChangeLog.old.bz2
  • Remove metadata added by git-svn
  • Rewrite authors and emails since I didn’t use an authors-file
  • Remove empty commits (these are commits that only touched ChangeLog or ChangeLog.old.bz2)

This looks like a good task for git-filter-branch. I probably could have done everything in one go, but I decided to do it one at a time.

Since filter-branch is mostly IO-bound, we’ll try to speed it up as much as possible:

$ sudo mount -t tmpfs -o size=100M none paludis/.git-rewrite
$ myrefs="0.4 0.6 0.8 0.20 0.24 0.26 ....."

The first task is easy:

$ git filter-branch -f --tree-filter 'git update-index --remove ChangeLog' $myrefs
$ git filter-branch -f --tree-filter 'git update-index --remove ChangeLog.old.bz2' $myrefs

To remove metadata created by git-svn I came up with:

tac | sed -n -e '1d' -e '/[^[:blank:]]/,$p' | tac

but dleverton came up with this perl one-liner, and I used it instead:

perl -ne 'print @blanks, $last and undef @blanks if defined $last; if (m/\S/) { $last = $_ } else { undef $last; push(@blanks, $_) }'

I put it in a file and ran:

git filter-branch -f --msg-filter ~/munge-commit-message $myrefs

Changing authors needed a script like the following (with proper mail-addresses):

case ${GIT_AUTHOR_NAME} in
        ciaranm)   n="Ciaran McCreesh"      ; m="foo@bar.com" ;;
        spb)       n="Stephen P. Bennett"   ; m="foo@bar.com" ;;
        halcyon)   n="Mark Loeser"          ; m="foo@bar.com" ;;
        allanonjl) n="John N. Laliberte"    ; m="foo@bar.com" ;;
        steev)     n="Stephen Klimaszewski" ; m="foo@bar.com" ;;
        kugelfang) n="Danny van Dyk"        ; m="foo@bar.com" ;;
        ferdy)     n="Fernando J. Pereda"   ; m="foo@bar.com" ;;
        arachnist) n="Robert S. Gerus"      ; m="foo@bar.com" ;;
        drizzt)    n="Timothy Redaelli"     ; m="foo@bar.com" ;;
        djm)       n="David Morgan"         ; m="foo@bar.com" ;;
        pioto)     n="Mike Kelly"           ; m="foo@bar.com" ;;
        piotr)     n="Piotr Rak"            ; m="foo@bar.com" ;;
        rbrown)    n="Richard Brown"        ; m="foo@bar.com" ;;
        baptux)    n="Baptiste Daroussin"   ; m="foo@bar.com" ;;
        eroyf)     n="Alexander Færøy"      ; m="foo@bar.com" ;;
        compnerd)  n="Saleem Abdulrasool"   ; m="foo@bar.com" ;;
        omp)       n="David Shakaryan"      ; m="foo@bar.com" ;;
        dleverton) n="David Leverton"       ; m="foo@bar.com" ;;
        peper)     n="Piotr Jaroszyński"    ; m="foo@bar.com" ;;
        dev-zero)  n="Tiziano Müller"       ; m="foo@bar.com" ;;
        zlin)      n="Bo Ørsted Andresen"   ; m="foo@bar.com" ;;
        buildtest) n="Nightly Buildtest"    ; m="foo@bar.com" ;;
        flameeyes) n="Diego Pettenò"        ; m="foo@bar.com" ;;
        iluxa)     n="Ilya Volynets"        ; m="foo@bar.com" ;;
        dercorny)  n="Stefan Cornelius"     ; m="foo@bar.com" ;;


git commit-tree "$@"

and ran:

$ git filter-branch -f --commit-filter ~/rewrite-authors $myrefs

Removing empty commits requires a bit more foo:

        while [[ -n $1 ]] ; do
                map "$1"

our_parent_tree=$(map $3)

if [[ -z ${our_parent_tree} ]] || [[ -n $(git diff-tree ${our_tree} ${our_parent_tree}:) ]] ; then
        git commit-tree "$@"
        skip_commit "$@"

This one could have just tested whether the current tree is the same as our parent’s tree (that is, no changes were made by this commit):

[[ ${our_tree} == $(git rev-parse $(map $3):) ]]

But it wouldn’t have made a big difference and I noticed it while filter branch was already running something like:

$ git filter-branch -f --commit-filter '. ~/empty-commits.bash' $myrefs

There’s still stuff to do like tags and adding scratch and probably converting the overlay; but the big thing is done. I think that history is stable already, that is, I won’t have to rewrite it again.

It is sitting in my home in bach and will be published soon.

Update: Re-tagging every paludis was the last step. I thought it was going to be cumbersome and boring, however, git makes this kind of stuff pretty easy. Since ciaranm should sign the tags himself, I did:

$ git log --pretty=oneline origin/releases |
> sed -n -e '/^\([0-9a-f]\{40\}\) Tag\( release\)\? \(.*\)/s--\3|\1|Tag release \3-p' \
> > ~/paludis-git-tags

After some hand editing of the file, creating the tags can be done with something like:

$ while read name msg head ; do
> git tag -m "${msg}" ${name} ${head} ;
> done < paludis-git-tags

To checkout an exact version (assuming ~/git/paludis is your repo, doesn’t have to be a local repo):

$ cd somewhere
$ git archive --format=tar --remote=~/git/paludis --prefix=paludis- 0.4.0 0.4.0 | tar xf -

June 9, 2008

Getting rid of ChangeLog files

June 9, 2008

Getting rid of ChangeLog files

There’s been some discussion on exherbo’s development list about getting rid of ChangeLog files. With the move of our repositories to git this would be easy.

This doesn’t mean users won’t see ChangeLog files, it means we won’t have to manually generate them. It has some improvements:

  • Maintaining ChangeLog files by hand is a PITA
  • People can’t void ChangeLog files, if you commit, it automagically goes to the ChangeLog.
  • We can easily link ChangeLog entries with Git history (by including commit hashes)
  • Cross-package/category commits become less annoying (because of the two bullets above)
  • The repository size doesn’t increase stupidly

Generatiing ChangeLog files is quite easy using Git, and can be done incrementally.

The cons…. I really don’t know whether those exist in this particular case :) If you can think of any, I’m all ears (or eyes in this case).

May 29, 2008

Exherbo Workflow

May 29, 2008

Exherbo Workflow

I’ve found myself using these functions a lot while working on exherbo (which are suitable for Gentoo overlays using git too):

 1 # Fernando J. Pereda
 2 # exherbo workflow
 4 reponame()
 5 {
 6     local reponame=$1
 7     if [[ -z ${reponame} ]] ; then
 8         local t=$(git rev-parse --git-dir)
 9         t=${t//\.git}
10         if [[ -z ${t} ]] ; then
11             t=${PWD}
12         else
13             t=${t%/}
14         fi
15         reponame=${t##*/}
16     fi
17     echo ${reponame}
18 }
20 repoclean()
21 {
22     local n=$(reponame $1)
23     (
24         cd /var/repositories/"${n}"
25         sudo git clean -fd
26         sudo git checkout -f
27     )
28 }
30 repoexport()
31 {
32     local n=$(reponame $1) rhead repo
33     repo=/var/repositories/"${n}"
34     rhead=$(git --git-dir="${repo}"/.git rev-parse HEAD)
35     if ! git cat-file -t ${rhead} >/dev/null 2>&1 ; then
36         echo >&2 "Remote HEAD not found. Need 'git pull --rebase'?"
37         return 127
38     fi
39     GIT_PAGER=cat git log --pretty=format:'- %s' ${rhead}..
40     echo
41     git diff ${rhead}.. | ( cd ${repo} ; sudo git apply - )
42 }

I usually work in my own clones of the repositories I have configured in paludis, for instance:

$ for repo in x11 arbor ; do git clone --reference /var/repository/${repo} git://git.exherbo.org/${repo} ; done

Then I work on topic branches until they are ready:

$ git checkout -b mybranch
[ gvim + git commit + ... ]
$ repoclean
$ repoexport
$ sudo paludis ...... # try the package(s) I'm working on until I'm happy

At that point, I do:

$ git fetch
$ git rebase origin/master
$ git checkout master
$ git merge mybranch
$ git push

Update: As rbrown points out. The clone command was wrong.

