[Reproducible-builds] Moving towards buildinfo on the archive network

Jonathan McDowell noodles at earth.li
Sun Aug 21 14:14:44 UTC 2016

On Sat, Aug 20, 2016 at 03:13:00PM +0000, Ximin Luo wrote:
> Jonathan McDowell:
> > Having been impressed by the current status of reproducible builds
> > and the fact it looks like we're close to having the important
> > pieces in Debian proper, I have started to have a look at how I
> > could help out with this bug. I've done some poking around in the
> > dak code, and think I have a vague idea of how to achieve what I
> > think is wanted.
> > 
> > First, it is helpful to describe what I think is wanted. What I
> > think we need is the archive network to have, alongside the binary
> > packages it contains, details of exactly how to build those
> > binaries. This is, I believe, the information contained in the
> > .buildinfo files.
> > 
> In our newest discussions, this purpose is secondary. The primary
> purpose of buildinfo files is to record what *one particular builder
> actually did in order to produce some output*. Or, equivalently:
>   | A buildinfo file, abstractly, is a *claim* C by some builder entity B that
>   | "I executed process P with env/input I to produce output results R".
> This latter form is slightly easier to reason about, in terms of
> security properties. We securely bind the claim C (the contents of the
> buildinfo file) to the entity B using a cryptographic signature.

I think the problem here is it's not clear (on either side) who "we" or
"our" means. Different people want different things from reproducible
builds, or have different opinions about relative priorities.

As a *minimum* I think distributions should be providing the information
of how a particular binary was produced. I suppose what it sort of maps
to is "I executed process P with env/input I to produce output results
R" (though, of course, distros already provide R; that's the binaries
shipped). You've used all the letters I might want to refer to it by, so
let's call it Z.

The claim, C, is a signature over Z by B. It's useful extra information,
but it's not required for me to ensure that the source I have build the
binaries I have.

> Note that the builder is a *distinct entity* from the distribution.
> It's important to keep the *original* signature by B on C. It breaks
> our security logic, to strip the signature and re-sign C using (e.g.)
> the Debian archive release keys - because the entity in charge of this
> release key is not the one that actually performed the build. Doing
> this, would allow malicious builders to re-attribute their misdeeds to
> look like it's the fault of Debian.

Debian already does this in the context of the fact that Package files
etc are signed by the archive key. It's possible to go and grab the .dsc
file to see who did the file build, but day-to-day no one is using these
to verify the binaries they receive. I care more that Debian stands
behind the packages I download than being able to verify individually
who build each of the packages I'm running - there's no meaningful way I
can attribute trust to *all* of the people who packaged something I have

> Now back to the "secondary" purpose:
> Using these information "B claims C", other reproduction programs
> (that we're also developing) can attempt to actually reproduce the
> binaries described. It would do this, by (1) reading the buildinfo
> file (2) recreating _some_ of the environment stored in C, and (3)
> executing the process, and see if it gives R.

You don't need the signature to validate the reproducibility.

> The "_some_" in clause (2) is currently up-for-debate, but the
> important thing is that this can be changed in the future *without
> affecting already-produced buildinfo files*. It may even well be the
> case that in the future we'd want to support different values for
> "_some_" for a given reproduction tool.
> The main point is that, this is not a concern of the producer nor
> distributor of the buildinfo files. I.e.: you guys (the FTP team) only
> have to care about making these signed-claims available to be
> downloaded by users, and it is up to the users to run a tool that
> "interprets" these claims for purposes such as actually attempting
> reproduction of a binary.

To clarify: I am not a member of the FTP team and do not claim to
represent them. I am a DD who was present at the DebConf talk about
reproducible builds, was impressed by how far it's come, and asked how I
could help get what was missing and still required into Debian.

> In this way, we achieve full end-to-end security properties
> (verifiability of build) between the producers (builders) and
> consumers (users). Distributors only need to care about availiability,
> they take no part in the security (except for the case where they are
> also a builder, as noted already).

I think I take a less strict view on this, which may be where some of
the disconnect comes from. I care that Debian stands behind it's builds.
I'd like the builder claims to be available (and my original mail did
talk about the fact I didn't think I was preventing that, just that it's
not necessary something that should be on the entire archive network),
but as something that's mirrored everywhere I am absolutely fine with an
attestation by Debian that it received a build appropriately signed by a
DD. Or that it was able to do a build itself within the buildd network
(either for a non-uploaded arch or if we move to source only uploads).

> > This bug has previously talked about a tarball of .buildinfo files,
> > presented as Buildinfos.tgz alongside the Packages file. From looking at
> > the current architecture of dak I do not believe that this is an easy
> > option.
> > 
> > I propose instead a Buildinfo.xz (or gz or whatever) file, which is
> > single text file with containing all of the buildinfo information that
> > corresponds to the Packages list. What is lost by this approach are the
> > OpenPGP signatures that .buildinfo files can have on them. I appreciate
> > this is an important part of the reproducible builds aim, but I believe
> > one of its strengths is the ability for multiple separate package builds
> > to attest that they have used that buildinfo information to build the
> > exact same set of binary artefacts. This is not something that easily
> > scales on the archive network and I think it is better served by a
> > separate service; it would be possible to take the package snippet from
> > the buildinfo file and sign that alone, uploading the signature to the
> > attestation service. For "normal" Debian operation the usual archive
> > signatures would provide a basic level of attestation of chain of build
> > information.
> > 
> I have trouble imagining what could make Buildinfo.tgz hard, but make
> Buildinfo.xz easy - could you explain this in more detail, please?

Debian's archive information is largely stored within a database; things
like the Packages and Contents files are generated each archive run from
this database, rather than incrementally updating a file. It is easy to
generate a Buildinfo.xz file from information contained within the
database (I have some proof-of-concept code locally that does the
beginnings of this), but generating a tar file like you are describing
is either a case of storing each .buildinfo in the database and
generating the tar each run, or adding and deleting files to an existing
tarball. It seems overly intensive and doesn't really seem to scale.

> Regarding the OpenPGP signatures, they are vital - but I also see no
> need to strip them in your model. From the point-of-view of the FTP
> archive, there is no immediate need to read or understand the contents
> of the buildinfo file. [*] It's just a dumb data blob, it shouldn't
> matter to Debian whether it's clearsigned or not.

What I was trying to do with my proposal was turn it from being a dumb
data blob which wasn't easily mapping to the Debian infrastructure, to
something where almost all the information (everything except the actual
signature from the original builder) could be provided alongside the
binaries themselves, enabling people to have what they required to
confirm they could reproduce the builds themselves. *I* think this is
incredibly useful, even if it doesn't achieve everything possible with
reproducible-builds, and I also think that it would provide a sound
basis for another Debian service (perhaps under debian.net to start
with) where multiple builders (starting with the original builder) would
be able to upload their claims, based directly off the buildinfo
information from the archive network. Yes, that's probably an extra
step for the original builder, but it also (to me) seems to be more
flexible and a stronger statement as multiple independent builders can
all confirm things in a single place.

It sounds like this isn't compatible with where reproducible-builds is
heading though, so apologies for the noise.


"Reality Or Nothing!" -- Cold Lazarus
This .sig brought to you by the letter D and the number  4
Product of the Republic of HuggieTag
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 801 bytes
Desc: Digital signature
URL: <http://lists.alioth.debian.org/pipermail/reproducible-builds/attachments/20160821/b7727b98/attachment.sig>

More information about the Reproducible-builds mailing list