[Reproducible-builds] Moving towards buildinfo on the archive network

Jonathan McDowell noodles at earth.li
Mon Jul 25 20:29:39 UTC 2016


Having been impressed by the current status of reproducible builds and
the fact it looks like we're close to having the important pieces in
Debian proper, I have started to have a look at how I could help out
with this bug. I've done some poking around in the dak code, and think I
have a vague idea of how to achieve what I think is wanted.

First, it is helpful to describe what I think is wanted. What I think we
need is the archive network to have, alongside the binary packages it
contains, details of exactly how to build those binaries. This is, I
believe, the information contained in the .buildinfo files.

This bug has previously talked about a tarball of .buildinfo files,
presented as Buildinfos.tgz alongside the Packages file. From looking at
the current architecture of dak I do not believe that this is an easy
option.

I propose instead a Buildinfo.xz (or gz or whatever) file, which is
single text file with containing all of the buildinfo information that
corresponds to the Packages list. What is lost by this approach are the
OpenPGP signatures that .buildinfo files can have on them. I appreciate
this is an important part of the reproducible builds aim, but I believe
one of its strengths is the ability for multiple separate package builds
to attest that they have used that buildinfo information to build the
exact same set of binary artefacts. This is not something that easily
scales on the archive network and I think it is better served by a
separate service; it would be possible to take the package snippet from
the buildinfo file and sign that alone, uploading the signature to the
attestation service. For "normal" Debian operation the usual archive
signatures would provide a basic level of attestation of chain of build
information.

The rest of this mail continues on the above assumptions. If you do not
agree with the above the below is probably null and void, so ignore it
and instead educate me about what the requirements are and I'll try and
adjust my ideas based on that.

So. If a single Buildinfo.xz file is acceptable, with the attestation
being elsewhere, I think this is doable without too much hackery in dak.
There are some trade-offs to make though, and I need to check which are
acceptable and which are viewed as too much.

Firstly, there is currently no concept of "build ids" that I can see;
essentially the primary key for a build is (source-package,
architecture, version). This assumes we never have the same version of a
package with different binaries produced; I understand there is
sometimes skew between security + the main archive but it's not clear to
me if this will continue to be the case when we're doing things
reproducibly. Even if it's not adding a simple build id doesn't actually
help AFAICT.

Secondly, buildinfo files that I've seen so far include arch all .debs
with the architecture .debs. I believe on the archive these should be
separate; so a build + upload that includes arch all + arch amd64 (for
example) debs will actually end up with an entry (for just the all debs)
in the all Buildinfo.xz and an entry (for just the amd64 debs) in the
amd64 Buildinfo.xz. Why? Binary NMUs, which don't rebuild the all .debs.
Otherwise you end up changing the buildinfo information (to drop the
rebuild amd64 debs) or keeping around old buildinfo information (+ you
have to track the fact you need it and know when to clean it up).

Thirdly, as the information is generated from a database, there needs to
be a defined order in which the fields are generated. This is purely to
ensure that the buildinfo information for each package is generated in a
reproducible fashion so any external signatures remain valid over time.

If these are acceptable I think that projectb needs 2 additional tables,
buildinfo_keys, similar to metadata_keys, and binaries_buildinfo, which
would have a 3 column primary key of (source-package, architecture,
version), and then key_id/value fields (similar to binaries_metadata) to
hold the buildinfo information that is not already present elsewhere in
the database. At present the main information these will hold is
Installed-Build-Depends field - the rest that I've actively seen are
available already.

Have I missed anything? I don't think the code to implement the above
ends up particularly complex in dak, and the resulting Buildinfo.xz
files should not add a particularly large amount of new data to the
mirror network. The main loss is that of the attestation information as
part of the mirror network (and actually, I can see a way we could add
that as a buildinfo field that wasn't part of the signature at some
point in the future).

(Additionally it is not clear to me where the dpkg status for
buildinfo creation is; I have heard that it's close to happening, but I
can't find anything on recent list archives about it - pointers
appreciated!)

J.

-- 
/-\                             |  I get the feeling that I've been
|@/  Debian GNU/Linux Developer |              cheated.
\-                              |



More information about the Reproducible-builds mailing list