[Reproducible-builds] Moving towards buildinfo on the archive network

Ximin Luo infinity0 at debian.org
Sun Aug 21 18:22:00 UTC 2016

Jonathan McDowell:
> On Sat, Aug 20, 2016 at 03:13:00PM +0000, Ximin Luo wrote:
>> I have trouble imagining what could make Buildinfo.tgz hard, but make
>> Buildinfo.xz easy - could you explain this in more detail, please?
> Debian's archive information is largely stored within a database; things
> like the Packages and Contents files are generated each archive run from
> this database, rather than incrementally updating a file. It is easy to
> generate a Buildinfo.xz file from information contained within the
> database (I have some proof-of-concept code locally that does the
> beginnings of this), but generating a tar file like you are describing
> is either a case of storing each .buildinfo in the database and
> generating the tar each run, or adding and deleting files to an existing
> tarball. It seems overly intensive and doesn't really seem to scale.
>> Regarding the OpenPGP signatures, they are vital - but I also see no
>> need to strip them in your model. From the point-of-view of the FTP
>> archive, there is no immediate need to read or understand the contents
>> of the buildinfo file. [*] It's just a dumb data blob, it shouldn't
>> matter to Debian whether it's clearsigned or not.
> What I was trying to do with my proposal was turn it from being a dumb
> data blob which wasn't easily mapping to the Debian infrastructure, to
> something where almost all the information (everything except the actual
> signature from the original builder) could be provided alongside the
> binaries themselves, enabling people to have what they required to
> confirm they could reproduce the builds themselves. *I* think this is
> incredibly useful, even if it doesn't achieve everything possible with
> reproducible-builds, and I also think that it would provide a sound
> basis for another Debian service (perhaps under debian.net to start
> with) where multiple builders (starting with the original builder) would
> be able to upload their claims, based directly off the buildinfo
> information from the archive network. Yes, that's probably an extra
> step for the original builder, but it also (to me) seems to be more
> flexible and a stronger statement as multiple independent builders can
> all confirm things in a single place.
> It sounds like this isn't compatible with where reproducible-builds is
> heading though, so apologies for the noise.

I don't mean to suggest a database is not useful. I thought I was talking to
ftp-masters through you, so I wanted to be very clear about the security
properties we're aiming for, and get common understanding about that first.

But I'm not sure why you say it's incompatible - could you not also store the
detached signatures within the database, and generate the original file
(including signature) from this and the other information? The signatures are
much smaller than the rest of the file.

In fact, we do indeed have longer-term plans for Debian infrastructure to look
into this data and not turn it into a data blob - for example, buildds
themselves could try to reproduce a given buildinfo uploaded by a DD, and send
alerts about packages that can't be reproduced. (I hinted at this by the "more
advanced" behaviours I mentioned in my previous email.) But I wanted to start
off with a simple yet strongly-secure model first.

What I described is not supposed to contradict the ability for users to
"confirm they could reproduce the builds themselves". As I mentioned, a
majority use-case is to allow others to download "all the buildinfo files for a
given binary package", then they check this locally.

Perhaps the confusion is in the suggestion of a single Buildinfo.tgz. Let me
disclaim this for now - I wasn't present for the discussions around why all of
this information needs to be in one file, it actually does *not* make sense to
me. An obvious alternative is to cat all the buildinfo files for a given source
package, into one $source-$version.buildinfos.gz file and store this in pool/.
This would also make it easy to lookup buildinfo files for a given binary
later. Could someone tell me why this approach isn't suitable?

Now going back to "users confirming rebuilds":

The reason why I started off with this high-security dumb-data-blob approach is
to make the security arguments and reasoning very simple and obvious, so it's
harder to accidentally weaken or subvert it in the future. Debian isn't even
involved in the security logic - it's purely the end-user verifier program.

Another benefit of signatures, is that it gives you more information, in the
cases where you might not want to build it yourself (e.g. very large programs).
If you strip this information, then only Debian is "attesting" to a particular
hash (which it didn't even build). If you keep this information, then you can
aggregate multiple peoples' attempts to build a given binary.

Eventually we could have buildinfo-only uploads, just like we have binary-only
or source-only uploads. Then for important binaries like gcc, perhaps 20 people
will want to upload their .buildinfo files to Debian with their signatures
attached, to make us all feel better about that.

Note also in general that you don't actually *want* all of the buildinfo fields
to be the same for everyone. Only the output *has* to be the same, and it is
actually a stronger security property if we get two buildinfo files that started
off with *different* inputs (such as buildpath/time/etc) and got the *same*
binary hashes out.


GPG: ed25519/56034877E1F87C35
GPG: rsa4096/1318EFAC5FBBDBCE

More information about the Reproducible-builds mailing list