[Reproducible-builds] Moving towards buildinfo on the archive network

Ximin Luo infinity0 at debian.org
Sat Aug 20 15:13:00 UTC 2016

Hey, Lunar has stopped doing reproducible builds as a regular thing, and I'm
taking over his previous responsibilities. I was also the main other person in
formulating the ideas behind the "next iteration" of buildinfo, that dkg
described in message #10 earlier in this thread, with Message-ID
<87vb8f58rg.fsf at alice.fifthhorseman.net>.


Jonathan McDowell:
> Having been impressed by the current status of reproducible builds and
> the fact it looks like we're close to having the important pieces in
> Debian proper, I have started to have a look at how I could help out
> with this bug. I've done some poking around in the dak code, and think I
> have a vague idea of how to achieve what I think is wanted.
> First, it is helpful to describe what I think is wanted. What I think we
> need is the archive network to have, alongside the binary packages it
> contains, details of exactly how to build those binaries. This is, I
> believe, the information contained in the .buildinfo files.

In our newest discussions, this purpose is secondary. The primary purpose of
buildinfo files is to record what *one particular builder actually did in order
to produce some output*. Or, equivalently:

  | A buildinfo file, abstractly, is a *claim* C by some builder entity B that
  | "I executed process P with env/input I to produce output results R".

This latter form is slightly easier to reason about, in terms of security
properties. We securely bind the claim C (the contents of the buildinfo file)
to the entity B using a cryptographic signature.

Note that the builder is a *distinct entity* from the distribution. It's
important to keep the *original* signature by B on C. It breaks our security
logic, to strip the signature and re-sign C using (e.g.) the Debian archive
release keys - because the entity in charge of this release key is not the one
that actually performed the build. Doing this, would allow malicious builders
to re-attribute their misdeeds to look like it's the fault of Debian.

(Of course there is the special case where the builder *is* Debian, but even in
this case it's good practise to have separate keys for every buildd, plus a
separate release signing key. We can discuss these details separately though.)

Anyway, that's our "next iteration" definition of buildinfo files, along with a
simplified discussion of the rationale. I wrote down more elsewhere, but I'll
keep this short for now, to avoid overwhelming readers.

Now back to the "secondary" purpose:

Using these information "B claims C", other reproduction programs (that we're
also developing) can attempt to actually reproduce the binaries described. It
would do this, by (1) reading the buildinfo file (2) recreating _some_ of the
environment stored in C, and (3) executing the process, and see if it gives R.

The "_some_" in clause (2) is currently up-for-debate, but the important thing
is that this can be changed in the future *without affecting already-produced
buildinfo files*. It may even well be the case that in the future we'd want to
support different values for "_some_" for a given reproduction tool.

The main point is that, this is not a concern of the producer nor distributor
of the buildinfo files. I.e.: you guys (the FTP team) only have to care about
making these signed-claims available to be downloaded by users, and it is up to
the users to run a tool that "interprets" these claims for purposes such as
actually attempting reproduction of a binary.

In this way, we achieve full end-to-end security properties (verifiability of
build) between the producers (builders) and consumers (users). Distributors
only need to care about availiability, they take no part in the security
(except for the case where they are also a builder, as noted already).

> This bug has previously talked about a tarball of .buildinfo files,
> presented as Buildinfos.tgz alongside the Packages file. From looking at
> the current architecture of dak I do not believe that this is an easy
> option.
> I propose instead a Buildinfo.xz (or gz or whatever) file, which is
> single text file with containing all of the buildinfo information that
> corresponds to the Packages list. What is lost by this approach are the
> OpenPGP signatures that .buildinfo files can have on them. I appreciate
> this is an important part of the reproducible builds aim, but I believe
> one of its strengths is the ability for multiple separate package builds
> to attest that they have used that buildinfo information to build the
> exact same set of binary artefacts. This is not something that easily
> scales on the archive network and I think it is better served by a
> separate service; it would be possible to take the package snippet from
> the buildinfo file and sign that alone, uploading the signature to the
> attestation service. For "normal" Debian operation the usual archive
> signatures would provide a basic level of attestation of chain of build
> information.

I have trouble imagining what could make Buildinfo.tgz hard, but make
Buildinfo.xz easy - could you explain this in more detail, please?

Regarding the OpenPGP signatures, they are vital - but I also see no need to
strip them in your model. From the point-of-view of the FTP archive, there is
no immediate need to read or understand the contents of the buildinfo file. [*]
It's just a dumb data blob, it shouldn't matter to Debian whether it's
clearsigned or not.

Separately, it's OK for the Debian release key to sign this dumb data blob, so
that users can check it is part of a real Debian release - but understand that
the *reproducible* security property is checked against the *builders* and not
the release infrastructure.

[*] You might read it later for "more advanced" behaviours, but we'll leave
these out of the current discussion, we haven't designed those yet.

> The rest of this mail continues on the above assumptions. If you do not
> agree with the above the below is probably null and void, so ignore it
> and instead educate me about what the requirements are and I'll try and
> adjust my ideas based on that.

In the below, you refer to a "database" but there is no mention of this above.
Did you neglect to edit something? I now feel like what you meant by "single
text file" is not at all how I imagined it - e.g. a concatenation of all
buildinfo files, which is not much different from a tar archive.

Also I think my explanation above differs significantly from your existing
understanding of the concept, so I'll wait for you to review that as well.

I'll stop detailed comments here, in case I am missing something. Just a few
minor extra points though:

> [..]
> Firstly, there is currently no concept of "build ids" that I can see [..]

I don't imagine a significant use-case where people will want a *specific*
buildinfo file, but if this is needed I guess we could just use the hash of the
whole file (including signature). The majority use-case would be:

  | Given a single binary package b with hash H, give me all buildinfo files C
  | that claim H as an output.

> [..] This assumes we never have the same version of a
> package with different binaries produced [..]

I believe my explanation of the "next iteration" concept addresses this issue, and this is one of the reasons why we chose to alter Lunar's original ideas from the first post.

> Secondly, buildinfo files that I've seen so far include arch all .debs
> with the architecture .debs. [..]
> Thirdly, as the information is generated from a database, [..]

As mentioned, I'll stop commenting here to let you get synced with the latest ideas.


GPG: ed25519/56034877E1F87C35
GPG: rsa4096/1318EFAC5FBBDBCE

More information about the Reproducible-builds mailing list