[Reproducible-builds] Preliminary review of dpkg-genbuildinfo

Guillem Jover guillem at debian.org
Tue Feb 3 20:00:55 UTC 2015


Hi!

On Sun, 2015-02-01 at 10:46:50 +0100, Jérémy Bobbio wrote:
> Guillem Jover:
> > Looking at <git://anonscm.debian.org/reproducible/dpkg.git>.
> > 
> > Have you seen any actual problem to warrant the «Ensure stable order
> > of Checksums-* fields» commit? In principle the output order is
> > preserved from the input one.
> 
> I have seen the ordering differ, but I might have misunderstood the
> source of the problem. Unfortunately, the packages have since been
> tested again and I did not kept track of them. We can always stumble
> again on the problem later.

Ok.

> > And here's a quickish review of the dpkg-genbuildinfo changes, taking
> > into account that I'm looking at this as a general tool for recording
> > the build environment, not specific for just reproducible builds.
> 
> Guillem, thanks for your review! I had not asked for one yet, because we
> are still exercising this code and design decision on the archive. But I
> guess time has come. :)

No problem!

> >  * I'm still somewhat unconvinced that having byte-for-byte identical
> >    container binary .deb packages is the ideal minimal reproducible
> >    unit. This will completely disallow digital signatures embedded in
> >    the binary packages or per-executable signatures in their xattr
> >    metadata for example, and that seems to me was completely ignored
> >    last time around. I'd be very unhappy if at some point reproducible
> >    builds were enforced and that'd mean we've painted ourselves into a
> >    corner with other potential additions to the .deb that might not be
> >    reproducible by design.
> 
> This is a fair point. Having byte-for-byte identical .deb files does
> not prevent digital signatures. It only makes the process slighly more
> complex. One solution is to record the signatures in the buildinfo (or
> in another build product). It can then be copied as-is during rebuilds.

Hmm, the current thinking is to store signatures in a separate ar
member as a compressed tarball. This means encoding and embedding that
in the *.buildinfo file would probably bloat it quite a bit. Also
there's been for a long time this idea floating around of possibly
signing the binary packages on some steps of their processing, say
by the builder, then by the archive, so that one could always check
its provenance even when the binary packags is disconnected from an
archive. I guess the archive could at the same time mangle the
*.buildinfo file, but it all starts to look a bit fragile.

This also does not work for the per-executable signatures (or would
make it unbearably more complex), which although I don't see being
deployed in Debian, makes it conflict with each other.

One option I mentioned some time ago is to make something like
dpkg-deb generate a unique ID from its contents, instead of its entire
container, which could be recorded in the .buildinfo file, and that
would then be unaffected by this kind of non-reproducible by-design
artifacts.

> >  * Somewhat related to the above point and this new file. Timestamps
> >    in some places (mainly on the actual file metadata, for example in
> >    ar and tar containers) are actually very useful, because as long as
> >    we don't have a stored recording of the build environment, it's the
> >    next "best" approximation to that. Getting rid of those prematurely
> >    makes us miss very valuable information for debugging and similar.
> >    And yes, even replacing the current time with the changelog time is
> >    missing information. This has also been dismissed previously (with,
> >    I'd say, even a bit of contempt!).
> 
> Sorry if it felt contemptuous. You might have more experience than me
> about using these information for debugging.

I don't think this kind of information is used regularly for debugging,
because it's too broad and inaccurate, as it depends on the local clock
being correct, and on the upgrade habits of the sysadmin. But it's
easy to record and (assuming a working clock), it sets an upper time
bound, by which you know for sure if newer versions of other software
cannot have been used; and if you assume an up-to-date system, then you
can probably reproduce the environment to a pretty close state.

For example I'd be able to reproduce the environment for any of my
builds from the timestamp in the .deb ar members, but not from the
changelog, because sometimes I finish the release, but do not build
it until I'm about to upload, which might take days or weeks.

That's one of the reasons you have and will keep finding resinstance to
removing such timestamps from upstream projects, because even if in some
cases it might be out of old habits, this is in many others valuable
information that was there and you are proposing to drop it, w/o the
equivalent or a better substiute as is the case with the *.buildinfo
stuff.

> Could you explain what you mean by “prematurely” here? I was not
> expecting the proposed changes for dpkg to be merged before buildinfo
> files could be kept in the archive.

Precisely that, as long as we don't have something better than
"timestamps" to describe the current environment, getting rid of them
seems premature. I guess this didn't seem clear to me from previous
bug reports against dpkg.

> >  * I'm not entirely sure if this really makes sense as a different
> >    file, but at least given that it's controlled by dpkg-buildpackage
> >    we can always fold it into dpkg-genchanges if we deem that's a
> >    better course of action. So it seems fine this way for now.
> 
> dpkg-genbuildinfo as a different file from dpkg-genchanges in the source
> code or *.buildinfo as different control files from *.changes?

I meant as different control files. The *.changes file records
information and the files for a release, I don't see why we could not
also store the environment used to produce those changes there. Looked
at it that way it does not seem so much like a misnomer to me?

If the archive has to be changed to store and accept *.buildinfo, it
could as well be changed to make the stored *.changes files public.

In any case as mentioned above, the current split file is fine for
experimentation, as we can always fold it back in. But would be nice
to decide before we start to deploy this in dpkg, the archive, etc.

> >  * Some of the information in this file trigger my privacy alarm bells,
> >    for example the Build-Path field.
> 
> Absolutely.
> 
> For packages shipping DWARF symbols, this is only making the issue more
> visible.

AFAICS the DWARF spec says that any file references can be either
absolute or relative, so I guess the problem here is that the build
process or the toolchain is passing absolute paths. The fix here would
be to use relative paths instead.

> The initial idea was to push for a canonical build location for Debian
> packages. Adding `Build-Path` to buildinfo gives us the ability to
> change this location later on.
> 
> In any cases, considering the build path as part of the build
> environment removes several hard to solve issues. Maybe if there's
> enough people feeling like tackling them, we can revisit this decision.

To me this seems like sidestepping a real issue, by neutralizing it
with the recorded setting. The same route could be taken for other
things like uname, etc, when in this case I think it makes sense to
get rid of the different build path. But then I don't think I can
volunteer myself to fix this so… :)

> I intend to re-work the code according to your comments in a couple of
> days. I am only learning Perl, please bear with my beginner's mistakes!
> :)

Sure, no problem, and some of those are just specific to the dpkg
context.

Thanks,
Guillem



More information about the Reproducible-builds mailing list