How to cope with patches sanely

Fri Feb 29 03:11:48 UTC 2008

Manoj Srivastava wrote:
>> Yes.  Feature branches are effectively forking a particular version of
>> a project - this is not a problem, and is essential for efficient
>> development.  People jumbling together changes in "trunk" branches is
>> perhaps one of the worst upshots of the 2002-2006 or so obsession with
>> poorly designed centralised systems and in my opinion sank many
>> projects.
> 
>         Err. If you go back and read this thread in the archive, You'll
>  note that I have stated that my feature branches are always kept up to
>  date with the latest upstream branch I am basing my Debian package
>  on. 

This technique is also called rebasing the patch set; it's fine, but
it's just one approach.

>         When I have been creating patches for inclusion with upstream, I
>  essentially feed them the source patch and a changelog entry --
>  essentially, creating a single patch series; squashing the underlying
>  history.  Most upstream do not care about the messy history of my
>  development; and most do not grok arch well enough to pull directly.

This is sometimes worthwhile and sometimes a bad idea.  The driving
motive, if you want to aim for patches to be easily reviewed, is that
each patch should introduce a single change, which is well explained.  I
agree that the upstream will not want a messy history; which is why you
reshape the individual changes using a tool such as Quilt, Stacked Git,
Guilt, Mercurial Queues, etc, so that they are more easily reviewed.

>> They mean that a later merge back the other way, to merge the feature
>> branch into the target branch, can happen painlessly.  ASSUMING that
>> you're using a system which has commutative merge characteristics,
>> such as git or mercurial.
> 
>         I use Arch.

Arch is critically deficient in this respect; it doesn't really have a
concept of tracking branches, and merging is not commutative; if you
merge a branch that just merged from your branch, an unnecessary new
changeset is made.  But if you are rebasing then you don't need to worry
about that.  As I said, it's just more work.

>> Can you express this problem with reference to a particular history of
>> an integration branch?  I will provide some short git commands to
>> extract the information in the form you are after.
> 
>  http://arch.debian.org/cgi-bin/archzoom.cgi/srivasta@debian.org--lenny?color=sunny?expand
> 
>         Take any package. Say, flex. Or flex-old. You have all my
>  feature branches there. The --devo branch is the integration branch.
>  Please show me an automated way you can grab the feature branches and
>  generate a quilt series that gives you the devo branch.  The diff.gz is
>  how we get from upstream to the devo branch (modulo ./debian); if you
>  can break that down nicely for the folks who want each feature
>  separate, that would work as well.

Thanks for restating the problem clearly.  While the underlying problem
is easily approached and I would still call it trivial, the details of
what you are asking for make it impossible - because quilt series cannot
contain merges (someone correct me here if it can and I can go forward).

Shipping changes for upstream inclusion as a *single* set of quilt
patches is not possible if you are including merges, but if you allow
the patches to be grouped, and introduce a new type of patch which
encapsulates a merge (gitk has one example of this; it uses different
identifiers to represent which file's lines are included), then it can
be done.  The apply-patches script would need extending to support this,
but I don't think that's particularly show-stopping.

However, ignoring the merges, so far we're not that far away from the
"script" being 'git-log -p' or 'git format-patch upstreamrev'

Also having never really used arch, if you can provide me with the
commands to get a copy of those branches (the man page is sadly not very
forthcoming), and I'll give the git-archimport script a whorl and see if
I can get it imported and show how this can work in practice.  If
someone with git-archimport experience can perform this and publish the
repositories somewhere, I'd be very grateful.

>      If you code works well enough every single time a new upstream
> comes around and I release a new version of flex or whatever,  I'll
> throw in the generated quilt patches.

I think what is required is a rethink of the problem.  What is being
tried to be achieved, and are there any other ways to achieve it which
will solve the problem in a vastly more effective way.

Version control systems that have content-addressable filesystems
(essentially, git and Monotone) are inherently efficient to distribute;
as only the changes between versions need be distributed.  The notion of
stream compressing tarballs is archaic compared with being able to
search for deltas anywhere in the source tree.

You can see this in effect with git, which is capable of very quickly
identifying which objects are new, and sending them all in impressively
small packs on the network.  It's amazing how many tarballs will then
fit into the same space as is occupied by, say, three tarballs.  For
example: xteddy -
http://planet.catalyst.net.nz/blog/2006/07/17/samv/xteddy_caught_consuming_rampant_amounts_of_disk_space

The essence of what I'm saying is to view a distributed git archive as a
/replacement/ (or, if you prefer, complement) for the source archives;
and going by previous results, this will result in an overall reduction
of the size of the archive, faster distribution - even P2P - and
preserving more history.

Instead of distributing large source archive packages, the upstream
sources are imported (perhaps as a tarball, or perhaps using a rich
history-preserving git archive), and the patches are applied as commits
on feature branches.  When you 'apt-get source', you are simply checking
out the head version (and probably the upstream as well).  All the
information you are after - individual changes from the upstream - are
available.  If the upstream updates, then the source archive can
represent that in the most convenient fashion to the maintainer - be it
a rebase of the applied patches as you have used previously, or a simple
merge.

And please, I'm not looking to start a VCS flamewar here - I'm talking
about git in its capacity as a file distribution and archival mechanism.
 At this task, it excels.  It doesn't matter what the upstream uses;
they can all be converted to git well.

Sam.