Intent to commit craziness - source package unpacking
Ian Jackson
ijackson at chiark.greenend.org.uk
Mon Sep 26 14:37:19 UTC 2016
tl;dr:
* dpkg developers, please tell me whether I am making assumptions
that are likely to become false. Particularly, on the behaviour of
successive runs of dpkg-source --before-build with successively
longer series files.
* git-buildpackage and git-dpm developers, please point me to
information about what metadata to put into the commit message for
a git commit which represents a dpkg-source quilt patch. I would
like these commits to be as convenient for gbp and git-dpm users as
possible.
Hi.
Currently when dgit needs to import a .dsc into git, it just uses
dpkg-source -x, and git-add. The result is a single commit where the
package springs into existence fully formed. This is not as good as
it could be. I would like to represent (in the git pseudohistory) the
way that the resulting tree is constructed from the input objects.
In particular, I would like to: represent the input tarballs as a
commit each (which all get merged together as if by git merge -s
subtree), and for quilt packages, each patch as a commit. But I want
to avoid (as much as possible) reimplementing the package extraction
algorithm in dpkg-source.
dpkg-source does not currently provide interfaces that look like they
are intended for what I want to do. And dgit wants to work with old
versions of dpkg, so I don't want to block on getting such interfaces
added (even supposing that a sane interface could be designed, which
is doubtful).
So I intend to do as follows. (Please hold your nose.)
* dgit will untar each input tarball (other than the Debian tarball).
This will be done by scanning the .dsc for things whose names look
like (compressed) tarballs, and using the interfaces provided by
Dpkg::Compression to get at the tarball.
Each input tarball unpack will be done separately, and will be
followed by git-add and git-write tree, to obtain a git tree object
corresponding to the tarball contents.
That tree object will be made into a commit object with no parents.
(The package changelog will be searched for the earliest version
with the right upstream version component, and the information found
there used for the commit object's metadata.)
* dgit will then run dpkg-source -x --skip-patches.
Again, git plumbing will be used to make this into a tree and a
commit. The commit will have as parents all the tarballs previous
mentioned. The metadata will come from the .dsc and/or the
final changelog entry.
* dgit will look to see if the package is `3.0 (quilt)' and if so
whether it has a series file. (dgit already rejects packages with
distro-specific series files, so we need worry only about a single
debian/patches/series file.)
If there is a series file, dgit will read it into memory. It will
then iterate over the series file, and each time:
- write into its playground a series file containing one
more non-comment non-empty line to previously
- run dpkg-source --before-build (which will apply that
additional patch)
- make git tree and commit objects, using the metadata from
the relevant patch file to make the commit (if available)
- each commit object has as a parent the previous commit
(either the previous commit, or the commit resulting from
dpkg-source -x)
After this the series file has been completely rewritten.
* dgit will then run one final invocation of dpkg-source
--before-build. This ought not to produce any changes, but if
it does, they will be represented as another commit.
* As currently, there will be a final no-change-to-the-tree
pseudomerge commit which stitches the package into the relevant dgit
suite branch; ie something that looks as if it was made with git
merge -s ours.
* As currently, dgit will take steps so that none of the git trees
discussed above contain a .pc directory.
This has the following properties:
* Each input tarball is represented by a different commit; in usual
cases these commits will be the same for every upload of the same
upstream version.
* For `3.0 (quilt)' each patch's changes to the upstream files appears
as a single git commit (as is the effect of the debian tarball).
For `1.0' non-native, the effect of the diff is represented as a
commit. So eg `git blame' will show synthetic commits corresponding
to the correct parts of the input source package.
* It is possible to `git-cherry-pick' etc. commits representing `3.0
(quilt)' patches. It is even possible fish out the patch stack as
git branch and rebase it elsewhere etc., since the patch stack is
represented as a contiguous series of commits which make only the
relevant upstream changes.
* Every orig tarball in the source package is decompressed twice, but
disk space for only one extra copy of its unpacked contents is
needed. (The converse would be possible in principle but would be
very hard to arrange with the current interfaces provided by the
various tools.)
* No back doors into the innards of dpkg-source (nor changes to
dpkg-dev) are required.
* dgit does grow a dependency on Dpkg::Compression.
* Knowledge of the source format embedded in dgit is is restricted to
iterating over tarballs and manipulating debian/patches/series,
which dgit already does.
* dgit now depends on dpkg-source --before-build idempotently applying
patches as they successively appear on debian/patches/series.
* Perhaps the git commits generated by dgit to represent patches can
be made to round-trip nicely into tools like git-dpm and
git-buildpackage.
I have found the information about tags in gbp-dch(1), but that
doesn't seem like it's applicable.
I have also found the information about tags in gbp-pq(1). From
that it looks like I ought to generate "Gbp-Pq: Name" and "Gbp-Pq:
Topic".
* The scheme I describe avoids introducing a dependency from dgit to
git-buildpackage. I might be able to replace the
successive-patch-application part with an appropriate invocation of
gbp-pq. Would that be better ?
Bear in mind that because the output of gbp-pq import doesn't
contain debian/patches, I would need to rewrite its output (perhaps
with git-filter-branch).
Comments welcome. Please be quick - this is very close to the top of
my dgit todo list.
Thanks,
Ian.
--
Ian Jackson <ijackson at chiark.greenend.org.uk> These opinions are my own.
If I emailed you from an address @fyvzl.net or @evade.org.uk, that is
a private address which bypasses my fierce spamfilter.
More information about the vcs-pkg-discuss
mailing list