[Reproducible-builds] .buildinfo should contain source hashes (as well as binary hashes)

Ximin Luo infinity0 at debian.org
Sun Sep 20 16:49:16 UTC 2015


Hi list,

BACKGROUND
==========

One of the main points of reproducible builds is to enable DDC: http://www.dwheeler.com/trusting-trust/

To take an example, I can convince myself that my /bin/gcc5 corresponds exactly to the source code /src/gcc5, if I can:

1. assume that one of /bin/clang, /bin/gcc4.9 is not compromised

2. /bin/clang /src/gcc5 -o /bin/gcc5_b1; /bin/gcc4.9 /src/gcc5 -o /bin/gcc5_b2

3. /bin/gcc5_b1 /src/gcc5 > /bin/gcc5_b1a; /bin/gcc5_b2 /src/gcc5 -o /bin/gcc5_b2a

4. cmp /src/gcc5_b1a /src/gcc5_b2a

If this exits 0 and (1) was true (and gcc5 is non-buggy), then /bin/gcc5 corresponds exactly to /src/gcc5. If this exits 1, then one of /bin/clang, /bin/gcc4.9 is not compromised.

More generally, if we assume that /bin/cc0 is good, then pick /bin/cc{1.n} ... and run the above for all $i, then the set of compilers that generated the same final output as cc0, is also good.

PROBLEM
=======

With our current .buildinfo setup, the above process is more complicated, because we *only* store hashes of the binary build environment. This means that we can try to reproduce the build, but it makes it more awkward to run DDC, and communicates "the wrong thing".

The point of the .buildinfo file is to say "with these build-deps and this environment, you can build this source code to get this binary target". Of course if you build something with different tools, you expect to get a different result, and that is why we have these files. However, "these build-deps" from a human level refers to the source code, not the binary code. That is, if we replace our binary build-deps with something *compiled from the same source code*, they should behave identically, and we *should still be able to reproduce the same binary target hash*. This is a key principle of DDC.

Currently, to run a DDC test, we would have to read the buildinfo file, find the hashes of the binary build-deps, lookup the source packages that corresponds to these hashes, find a different binary build-deps for these hashes, and run our DDC-checker. This takes many round trips, and contacting external infrastructure that isn't necessary.

If .buildinfo files contained source hashes, the DDC-checker could work more directly, without requiring a remote repository of source hash <-> binary hash mappings. It could even build the build-deps itself, without worrying about the binary hashes of the results, perhaps on a different host architecture. Importantly, it also states the *intentions* of this file much better.

(Lunar tells me on IRC that this is less feasible, but let's discuss this further and see if we can come up with better solutions.)

X

-- 
GPG: 4096R/1318EFAC5FBBDBCE
git://github.com/infinity0/pubkeys.git

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: OpenPGP digital signature
URL: <http://lists.alioth.debian.org/pipermail/reproducible-builds/attachments/20150920/7ca61a25/attachment.sig>


More information about the Reproducible-builds mailing list