A (git) workflow for Debian packaging

martin f krafft madduck at debian.org
Fri Oct 5 13:08:23 UTC 2007


Dear list, after messing up the pkg-mdadm Git repository, I finally
decided that it was time for me to figure out how to *properly*
package with Git. And with the help of the #git/freenode channel,
I think I managed to understand Git enough to the point where
I could come up with an acceptable workflow. I am soliciting your
feedback.

I know about git-buildpackage and gitpkg, but I prefer to do it by
hand, at least for now. I also decided to *not* mention
guilt/quilt/stgit in this post, it's already too long.

Before I publish my findings, I would like to make sure it's clear
and correct, or I'd cause more confusion. Thus I am very glad that
a couple of Git experts have recently joined this list.

As I (hope to) receive feedback, I will update the version of this
post at
  http://scratch.madduck.net/blog__drafts__2007.10.03_packaging-with-git.rst

but I shall include my 1.0 version inline to make it easier to reply
in context.

Now I am looking forward to your input on this very long post:

Setting up the infrastructure
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
First, we prepare a shared repository on `git.debian.org
<http://git.debian.org>`_ for later use (using ``collab-maint`` for
illustration purposes), download the Debian source package we want to import
(version ``2.6.3+200709292116+4450e59-3`` at time of writing, but I pretend
it's ``-2`` because we shall create ``-3`` further down…), set up a local
repository, and link it to the remote repository::

  $ ssh alioth
  $ cd /git/collab-maint
  $ ./setup-repository pkg-mdadm mdadm Debian packaging
  $ exit
  $ apt-get source --download-only mdadm
  $ mkdir mdadm && cd mdadm
  $ git init
  $ git remote add origin ssh://git.debian.org/git/collab-maint/pkg-mdadm
  $ git config branch.master.merge refs/heads/master

Now we can use ``git-pull`` and ``git-push``, except the remote repository is
empty and we can't pull from there yet. We'll save that for later,
tell the repository about upstream's Git repository instead::

  $ git remote add upstream-repo git://neil.brown.name/mdadm

and pull down upstream's history and branch a local branch off it. The "no
common commits" warning can be safely ignored since we don't have any commits
at all at that point (so there can't be any in common between the
the local and remote repository), but we know what we're doing::

  $ git fetch upstream-repo
  […]
  warning: no common commits
  $ git checkout -b upstream upstream-repo/master
  $ git branch
  * upstream
  $ ls | wc -l
  77

Importing the Debian package
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Now it's time to import Debian's ``diff.gz`` — remember how I pretend to use
version control for package maintenance for the first time. Oh, and
sorry about the messy filenames, but I decided it's best to stick
with real data in case you are playing along:

Since we're applying the diff against version ``2.6.3+200709292116+4450e59``,
we ought to make sure to have the repository at the same state. Upstream never
"released" that version, but I encoded the commit ID of the tip when
I snapshotted it: ``4450e59``, so we tag it and branch off there::

  $ git tag -s mdadm-2.6.3+200709292116+4450e59 4450e59
  $ git checkout -b master mdadm-2.6.3+200709292116+4450e59
  $ zcat ../mdadm_2.6.3+200709292116+4450e59-2.diff.gz | git apply

The local tree is now "debianised", but Git does not know about the new and
changed files, which you can verify with ``git-status``. We will split the
changes made by Debian's ``diff.gz`` across several branches.

The idea of feature branches
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
We could just create a ``debian`` branch, commit all changes made by the
``diff.gz`` there, and be done with it. However, we might want to keep certain
aspects of Debianisation separate, and the way to do that is with feature
branches (also known as "topic" branches). For the sake of this demonstration,
let's create the following four branches in addition to the ``master`` branch,
which holds the standard Debian files, such as ``debian/changelog``,
``debian/control``, and ``debian/rules``:

* ``upstream-patches`` will includes patches against the upstream code, which
  I submit for upstream inclusion.
* ``deb/conffile-location`` makes ``/etc/mdadm/mdadm.conf`` the default over
  ``/etc/mdadm.conf`` and is Debian-specific (thus the ``deb/`` prefix).
* ``deb/initramfs`` includes the ``initramfs`` hook and script, which I want
  to treat separately but not submit upstream.
* ``deb/docs`` similarly includes Debian-only documentation I add to the
  package as a service to Debian users.

If you're importing a Debian package using ``dpatch``, you might want to
convert every dpatch into a single branch, or at least collect logical units
into separate branches. Up to you. For now, our simple example suffices. Keep
in mind that it's easy to merge two branch and less trivial to split one into
two.

Why? Well, good question. As you will see further down, the separation between
``master`` and ``deb/initramfs`` actually makes things more complicated when
you are working on an issue spanning across both. However, feature branches
also bring a whole lot of flexibility. For instance, with the above
separation, I could easily create ``mdadm`` packages without ``initramfs``
integration (see `#434934 <http://bugs.debian.org/434934>`_),
a disk-space-conscious distribution like `grml <http://grml.org>`_ might
prefer to leave out the extra documentation, and maybe another derivative
doesn't like the fact that the configuration file is in a different place from
upstream. With feature branches, all these issues could be easily addressed by
leaving out unwanted branches from the merge into the integration/build branch
(see further down).

Whether you use feature branches, and how many, or whether you'd like to only
separate upstream and Debian stuff is entirely up to you. For the purpose of
demonstration, I'll go the more complicated way.

Setting up feature branches
~~~~~~~~~~~~~~~~~~~~~~~~~~~
So let's commit the individual files to the branches. The output of the
``git-checkout`` command shows modified files that have not been committed yet
(which I trim after the first example); Git keeps these across
checkouts/branch changes. Note that the ``./debian/`` directory does not show
up as Git does not know about it yet (``git-status`` will tell you that it's
untracked, or rather: contains untracked files since Git does not track
directories at all)::

  $ git checkout -b upstream-patches mdadm-2.6.3+200709292116+4450e59
  M Makefile
  M ReadMe.c
  M mdadm.8
  M mdadm.conf.5
  M mdassemble.8
  M super1.c
  $ git add super1.c     #444682
  $ git commit

    # i now branch off master, but that's the same as 4450e59 actually
    # i just do it so i can make this point…
  $ git checkout -b deb/conffile-location master
  $ git add Makefile ReadMe.c mdadm.8 mdadm.conf.5 mdassemble.8
  $ git commit

  $ git checkout -b deb/initramfs master
  $ git add debian/initramfs/*
  $ git commit

  $ git checkout -b deb/docs master
  $ git add RAID5_versus_RAID10.txt md.txt rootraiddoc.97.html
  $ git commit

    # and finally, the ./debian/ directory:
  $ git checkout master
  $ chmod +x debian/rules
  $ git add debian
  $ git commit

And finally, push our work so it won't get lost if, at this moment, aliens
land on the house, or any other completely plausible event of apocalypse
descends upon you.

We'll push our work to ``git.debian.org`` (the ``origin``, which is the
default destination and thus needs not be specified) by using ``git-push
--all``, which conveniently pushes all local branches, thus including the
upstream code; you may not want to push the upstream code, but I prefer it
since it makes it easier to work with the repository, and since most of the
objects are needed for the other branches anyway — after all, we branched off
the ``upstream`` branch. The option ``--tags`` adds all local tags to the
upload pack; you couldn't have guessed that!

::

  $ git push --all --tags

Done. Well, almost…

Building the package (theory)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Let's build the package. There seem to be two (sensible) ways we could do
this, considering that we have to integrate (merge) the branches we just
created, before we fire off the building scripts:

1. by using a temporary (or "throw-away") branch off ``upstream``, where we
   integrate all the branches we have just created, build the package, tag our
   ``master`` branch (it contains ``debian/changelog``), and remove the
   remporary branch. When a new package needs to be built, we repeat the
   process.

2. by using a long-living integration branch off ``upstream``, into which we
   merge all our branches, tag the branch, and build the package off the tag.
   When a new package comes around, we re-merge our branches, tag, and build.

Both approaches have a certain appeal to me, but I settled for the second, for
two reasons, the first of which leads to the second:

1. When I upload a package to the Debian archive, I want to create a tag which
   captures the exact state of the tree from which the package was built, for
   posterity (I will return to this point later). Since the throw-away
   branches are not designed to persist and are not uploaded to the archive,
   tagging the merging commit makes no sense. Thus, the only way to properly
   identify a source tree across all involved branches would be to run
   ``git-tag $branch/$tagname $branch`` for each branch, which is purely
   semantic and will get messy sooner or later.

2. As a result of the above: when Debian makes a new stable release, I would
   like to create a branch corresponding to the package in the stable archive
   at the time, for security and other proposed updates. I could rename my
   throw-away branch, if it still existed, or I could create a new branch and
   merge all other branches, using the (semantic) tags, but that seems rather
   unfavourable.

So instead, I use a long-living integration branch, notoriously tag the merge
commits which produced the tree from which I built the package I uploaded, and
when a certain version ends up in a stable Debian release, I create
a maintenance branch off the one, single tag which corresponds to the very
version of the package distributed as part of the Debian release.

So much for the theory. Let's build, already!

Building the package (practice)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
So we need a long-living integration branch, and that's easier done than
said::

  $ git checkout -b build mdadm-2.6.3+200709292116+4450e59

Now we're ready to build, and the following procedure should really be
automated. I thus write it like a script, called ``poor-mans-gitbuild``::

  #!/bin/sh
  set -eu
  git checkout master
  debver=$(dpkg-parsechangelog | sed -ne 's,Version: ,,p')
  git checkout build
  git merge $(git branch | egrep -v '[* ] (tmp|maint)')
  git tag -s debian/$debver
  debuild   # will ignore .git automatically
  git checkout master

If you discover during the build that you forgot something, then you can
remove the tag, undo the merge, checkout the branch to which you need to
commit to fix the issue, and then repeat the above build process::

  $ git tag -d debian/$debver
  $ git reset --hard HEAD^
  $ git checkout master
  $ editor debian/rules    # or whatever
  $ git add debian/rules
  $ git commit
  $ git checkout master
  $ poor-mans-gitbuild

Before you upload, it's a good idea to invoke ``gitk --all`` and verify that
all goes according to plan [if someone follows this and could create
a screenshot of ``gitk``, I'd appreciate if you could send it to me so I can
include it here; Mine looks way too complicated to be released].

When you're done and the package has been uploaded, push your work to
``git.debian.org``, as before::

  $ git push --all --tags

Now take your dog for a walk, or play outside, or do something else not
involving a computer or entertainment device.

Uploading a new Debian version
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
If you are as lucky as I am, the package you uploaded still has a bug in the
upstream code *and* someone else fixes it before upstream releases a new
version, then you might be in the position to release a new Debian version. Or
maybe you just need to make some Debian-specific changes against the same
upstream version. I'll let the commands speak for themselves::

  $ git checkout upstream-patches
  $ git-apply < patch-from-lunar.diff   #444682 again
  $ git add super1.c
  $ git commit

  $ git checkout master
  $ dch
  $ dpkg-parsechangelog | sed -ne 's,Version: ,,p'
  mdadm-2.6.3+200709292116+4450e59-3

  $ poor-mans-gitbuild

  $ git push --all --tags

Hacking on the software
~~~~~~~~~~~~~~~~~~~~~~~
Imagine: on a rainy Saturday afternoon you get bored and decide to implement
a better way to tell ``mdadm`` `when to start which array
<http://bugs.debian.org/398310>`_. Since you're a genius, it'll take you only
a day, but you do make mistakes here and there, so what could be better than
to use version control? However, rather than having a branch that will live
forever, you are just creating a local branch, which you will not publish.
When you are done, you'll feed your work back into the existing branches.

Git makes branching really easy and as you may have spotted, the
``poor-mans-gitbuild`` script reserves an entire branch namespace for people
like you::

  $ git checkout -b tmp/start-arrays-rework master

Unfortunately (or fortunately), fixing this issue will require work on two
branches, since the ``initramfs`` script and hook are maintained in a separate
branch. There are (again) two ways in which we can (sensibly) approach this:

* create two separate, temporary branches, and switch between them as you
  work.

* merge both into the temporary branch and later cherry-pick the commits into
  the appropriate branches.

I am undecided on this, but maybe the best would be a combination: merge both
into a temporary branch and later cherry-pick the commits into two additional,
temporary branches until you got it right, and then fast-forward the official
branches to their tips::

  $ git merge master deb/initramfs
  $ editor debian/mdadm-raid                     # …
  $ git commit debian/mdadm-raid
  $ editor debian/initramfs/script.local-top     # …
  $ git commit debian/initramfs/script.local-top
  [many hours of iteration pass…]

  [… until you are done]
  $ git checkout -b tmp/start-arrays-rework-init master
    # for each commit $c in tmp/start-arrays-rework
    # applicable to the master branch:
  $ git cherry-pick $c
  $ git checkout -b tmp/start-arrays-rework-initramfs deb/initramfs
    # for each commit $c in tmp/start-arrays-rework
    # applicable to the deb/initramfs branch:
  $ git cherry-pick $c

This is assuming that all your commits are logical units. If you find several
commits which would better be bundled together into a single commit, this is
the time to do it::

  $ git cherry-pick --no-commit <commit7>
  $ git cherry-pick --no-commit <commit4>
  $ git cherry-pick --no-commit <commit5>
  $ git commit

Before we now merge this into the official branches, let me briefly intervene
and introduce the concept of a fast-forward. Git will "fast-forward" a branch
to a new tip if it decides that no merge is needed. In the above example, we
branched a temporary branch (T) off the tip of an official branch (O) and then
worked on the temporary one. If we now merge the temporary one into the
official one, Git determines that it can actually squash the ancestry into
a single line and push the official branch tip to the same ref as the
temporary branch tip. In cheap (poor man's), ASCII notation::

  - - - O             >> merge T >>     - - - = - - OT
         ` - - T      >>  into O >>

This works because no new commits have been made on top of O (if there would
be any, we might be able to rebase, but let's not go there quite yet; rebasing
is how you shoot yourself in the foot with Git). Thus we can simply do the
following::

  $ git checkout deb/initramfs
  $ git merge tmp/start-arrays-rework-initramfs
  $ git checkout master
  $ git merge tmp/start-arrays-rework-init

and test/build/push the result. Or well, since you are not an ``mdadm``
maintainer (We^W I have open job positions! Applications welcome!), you might
want to submit your work::

  $ git format-patch -s -M origin/master

This will create a number of files in the current directory, one corresponding
for each commit you made since ``origin/master``. Assuming each commit is
a logical unit, you can now submit these to an email address. The
``--compose`` option lets you write an introductory message, which is
optional::

  $ git send-email --compose --to your at email.address <file1> <file2> <…>

Once you've verified that everything is alright, swap your email address for
the bug number (or the `pkg-mdadm-devel list
<http://lists.alioth.debian.org/mailman/listinfo/pkg-mdadm-devel>`_ address).

Thanks (in advance) for your contribution!

Of course, you may also be working on a feature that you want to go upstream,
in which case you'd probably branch off ``upstream-patches`` (if it depends on
a patch not yet in upstream's repository), or ``upstream`` (if it does not)::

  $ git checkout -b tmp/cool-feature upstream
  […]

… when a new upstream version comes around
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
After a while, upstream may have integrated your patches, in addition to
various other changes, to give birth to ``mdadm-2.6.4``. We thus first fetch
all the new refs and merge them into our upstream branch::

  $ git fetch upstream-repo
  $ git checkout upstream
  $ git merge upstream-repo/master

we *could* just as well have executed ``git-pull``, which with the default
configuration would have done the same; however, I prefer to separate the
process into fetching and merging.

Now comes the point when many Git people think about rebasing. And in fact,
rebasing is exactly what you should be doing, iff you're still working on an
*unpublished* branch (yes, **UNPUBLISHED**; that means **NEVER EVER
PUBLISHED**), such as the previous ``tmp/cool-feature`` off ``upstream``. By
rebasing your branch onto the updated ``upstream`` branch, you are making sure
that your patch will apply cleanly when upstream tries it, because potential
merge conflicts would be handled by you as part of the rebase, rather than by
upstream::

  $ git checkout tmp/cool-feature
  $ git rebase upstream

What rebasing does is quite simple actually: it takes every commit you made
since you branched off the parent branch and records the diff and commit
message. Then, for each diff/commit_message pair, it *creates a new commit* on
top of the new parent branch tip, thus rewrites history, and orphans all your
original commits. Thus, you should only do this if your branch has never been
published or else you would leave people who cloned from your published branch
with orphans.

  If this still does not make sense, try it out: create a (source) repository,
  make a commit (with a meaningful commit message), branch B off the tip, make
  a commit on top of B (with a meaningful message), clone that repository and
  return to the source repository. There, checkout the master, make a commit
  (with a …), checkout B, rebase it onto the tip of master, make a commit
  (with a …), and now ``git-pull`` from the clone; use ``gitk`` to figure out
  what's going on.

So you should never rebase a published branch, and since all your branches
outside of the ``tmp/*`` namespace are published on ``git.debian.org``, you
should never rebase those. Thus, you have to merge. At first, it suffices to
merge the new ``upstream`` into the long-living ``build`` branch, and to call
``poor-mans-gitbuild``, but if you run into merge conflicts or find that
upstream's changes affect the functionality contained in your feature
branches, you need to actually fix those.

For instance, let's say that upstream started providing ``md.txt`` (which
I previously provided in the ``deb/docs`` branch), then I need to fix that
branch::

  $ git checkout deb/docs
  $ git rm md.txt
  $ git commit

That was easy, since I could evade the conflict. But what if upstream made
a change to ``Makefile``, which got in the way with my configuration file
location change? Then I'd have to merge ``upstream`` into
``deb/conffile-location``, resolve the conflicts, and commit the change::

  $ git checkout deb/conffile-location
  $ git merge upstream
  CONFLICT!
  $ git-mergetool
  $ git commit

When all conflicts have been resolved, I can prepare a new release, as
before::

  $ git checkout master
  $ dch
  $ dpkg-parsechangelog | sed -ne 's,Version: ,,p'
  mdadm-2.6.3+200709292116+4450e59-3

  $ poor-mans-build

  # git push --all --tags

Do note that Git is smart about commits that percolated upstream: since
upstream included the two commits in ``upstream-patches`` in his ``2.6.4``
release, my ``upstream-patches`` branch got effectively annihilated, and Git
was smart enough to figure that out *without* a conflict. Rejoice!

Creating and using a maintenance branch
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Let's say Debian "lenny" is released with ``mdadm`` ``2.7.6-1``, then::

  $ git checkout -b maint/lenny debian/2.7.6-1

You might do this to celebrate the release, or you may wait until the need
arises. We've already left the domain of reality ("lenny" is not yet
released), so the following is just theory.

Now, assume that a security bug is found in ``mdadm`` ``2.7.6`` after "lenny"
was released. Upstream is already on ``mdadm`` ``2.7.8`` and commits
``deadbeef`` and ``c0ffee`` fix the security issue, then you'd cherry-pick
them into the ``maint/lenny`` branch::

  $ git checkout maint/lenny
  $ git cherry-pick deadbeef
  $ git cherry-pick c0ffee

If there are no merge conflicts (which you'd resolve with ``git-mergetool``),
we can just go ahead to prepare the new package::

  $ dch
  $ dpkg-parsechangelog | sed -ne 's,Version: ,,p'
  mdadm-2.7.6-1lenny1

  $ poor-mans-gitbuild

  $ git push --all --tags

Future directions
~~~~~~~~~~~~~~~~~
It should be trivial to create the Debian source package directly from the
repository, and in fact, in response to a recent blog post of mine on `the
dispensability of pristine upstream tarballs
<http://blog.madduck.net/debian/2007.10.01_pristine-tarballs-and-vcs>`_, two
people showed me their scripts to do it.

My post also caused `Joey Hess to clarify his position on pristine tarballs
<http://kitenet.net/~joey/blog/entry/pristine-tar_followup/>`_, before he went
out to implement ``dpkg-source v3`` [TODO: URL forthcoming]. This looks very
promising.

In addition to creating source packages from version control, a couple of
other ideas have been around for a while:

* create ``debian/changelog`` from commit log summaries when you merge into
  the ``build`` branch.

* integrate version control with the BTS, bidirectionally:

  * given a bug report, create a temporary branch and applly any patches found
    in the bug report.

  * upon merging the temporary branch back into the feature branch it
    modifies, generate a patch, send it to the BTS and tag the bug report
    ``+ pending patch``.

And I am sure there are more. If you have any, I'd be interested to hear about
them!




Thanks for your time!

-- 
 .''`.   martin f. krafft <madduck at debian.org>
: :'  :  proud Debian developer, author, administrator, and user
`. `'`   http://people.debian.org/~madduck - http://debiansystem.info
  `-  Debian - when you have better things to do than fixing systems
 
"one should never do anything that
 one cannot talk about after dinner."
                                                        -- oscar wilde
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature (see http://martin-krafft.net/gpg/)
Url : http://lists.alioth.debian.org/pipermail/vcs-pkg-discuss/attachments/20071005/810eca77/attachment.pgp 


More information about the vcs-pkg-discuss mailing list