[Reproducible-builds] Preliminary review of dpkg-genbuildinfo
Guillem Jover
guillem at debian.org
Fri Feb 6 06:13:18 UTC 2015
Hi,
On Wed, 2015-02-04 at 14:36:12 +0100, Holger Levsen wrote:
> On Sonntag, 1. Februar 2015, Guillem Jover wrote:
> > * I'm still somewhat unconvinced that having byte-for-byte identical
> > container binary .deb packages is the ideal minimal reproducible
> > unit.
>
> I'm getting more and more convinced it is. Cause that is what we really care
> about.
Different people care about very different things, in very different
scales.
This will also be a problem with rpms, which do support embedded
signatures natively. And the answer there will also clearly not be
“do not use embedded signatures”.
> > This will completely disallow digital signatures embedded in
> > the binary packages or per-executable signatures in their xattr
> > metadata for example, and that seems to me was completely ignored
> > last time around. I'd be very unhappy if at some point reproducible
> > builds were enforced and that'd mean we've painted ourselves into a
> > corner with other potential additions to the .deb that might not be
> > reproducible by design.
>
> I think, not allowing unreproducible fields in .debs is a feature and a nice
> design.
>
> And yes, there has been years work on embedding signatures into debs, but I
> dont see it as a problem that the result of this work is: don't do that, it
> causes $these problems. Detached signatures are pretty common everywhere.
Detached signatures have their place in various circumstances and
for various motives, they are not always the best option though. You
usually use detached signatures precisely because you cannot embed
them in the files you are signing. When you can, because the format
supports or allows it, it's almost always superior to embed them,
because then you don't easily lose track of those.
> Also, regarding embedded signatures, sure they are nice, but once we have
> reproducible builds, they are also _way_ less meaningful, as the
> reproducibility (and the signed source) make a even more useful statement:
> with embedded signatures you still need to trust the signee that this binary
> derived from the source he/she says it derives from. With reproducible builds
> you can independently veriry that the binary indeed comes from this source.
I think trying to confront those two features would be very unwise.
They have different applications, and I'm sorry to say but while
reproducibility has some very nice attributes and implications, it
does not suddenly nullify everything else. It's a bit like saying that
signatures on source packages are way less meaningful because you
can download the sources and review them. You have to trust something
or someone.
Take the example I gave previously of a binary package detached from
an archive, just a .deb package laying around, either from an old
download or passed to you by someone. You have to *know* the origin of
the binary, otherwise you need to first start hunting down where this
binary was built, say Debian, one of its derivatives or even somewhere
else. And sure, once that's known, the user *might* possibly be able to
reproduce the build, but I don't see many (if not most) users being able
or willing to set up a reproducible build environment just to verify
where a binary was coming from (say my relatives). If you cannot or
wont do that, you need to trust the distribution, the remote server,
the network, the remote binary including any possible reproducible
information being correct. At that point you or a program might as well
have just verified an embedded signature.
Having the possibility to reproducibly build stuff is very nice, but
manye people will be trusting a tiny set of other people doing those
reproducible builds, because they don't fancy running a source based
distribution, building the world on each upgrade.
> > * Some of the information in this file trigger my privacy alarm bells,
> > for example the Build-Path field.
>
> While I don't really share your concerns here, as I think the situation now is
> worse: tools embedd this private data into binaries and not many people know
> about this. So I think making the build path visible (and explaining why we
> have to do this) is actually a step in the right direction, on the way to the
> right fix: not embedding the build path at all.
They don't know because the embedded DWARF sections are compressed, so
a simple «strings» on the binaries does not reveal anything suspicious.
You need to use something like «objdump -W» or «readelf -w». Having a
lintian check that at least warns when finding paths like
«/home/[[:alnum:][:punct:]]+/» would go some way to make people aware
of the problem.
And embedding the path for every and each build is actually way, way
worse, because currently the DWARF problem only affects sources building
binary packages with embedded debugging symbols. This does not currently
affect sources only producing other kind of binary packages, nor sources
that only produce arch:all packages, which are a pretty big part of the
archive. Of course they might still be recording the build path through
other means, but that's a different issue.
> But that is a goal rather far away, so if we want reproducible builds anytime
> soon, which given the feedback at FOSDEM I think *many* people wish for, not
> only on Debian, btw, we need to solve this somehow.
The problem is that if a tool like dpkg-genbuildinfo is run on every
build, even for packages that do not record build paths, then we are
introducing this information that was not there before, w/o an easy
way to opt-out.
> So developer builds in a different build path or otherwise unreproducbile
> builds should not become policy violations or even important bugs for the
> immediate future, even in my vision ;-) But that said, I certainly hope that
> reproducibility will be at least a release goal for stretch!
TBH I'd not be very happy if I'd be coerced to use a specific path
for building (a la RPM).
Regards,
Guillem
More information about the Reproducible-builds
mailing list