[Reproducible-builds] Storing .deb checksums in ADMINDIR/status?

Guillem Jover guillem at debian.org
Fri Jun 26 04:30:39 UTC 2015


Hi!

On Tue, 2015-06-23 at 09:31:05 +0200, Jérémy Bobbio wrote:
> Some people suggested that we should record a checksum of the `.deb`
> installed as a way to unambiguously referring to a specific package.

In principle the tuple pkgname-version-arch should be unique per
archive, otherwise bad-things-will-happen. Of course that does not
cover locally built packages and similar, or mixing different archives
with duplicated tuples, but then those are probably out-of-scope for
reproducible builds *in* Debian anyway, I guess.

> The main benefit that I can think of is that it would allow to directly
> retrieve the file from snapshot.debian.org based on the hash‗[2].

Personally I find the point that David mentioned to be a bit more
interesting. :)

> But, as far as I know, this information is currently not recorded by
> dpkg and there is no way to know for sure which `.deb` has been used for
> a package currently installed. I have a couple of memories where this
> could have been useful outside of the aforementioned use case.
>·
> From my limited knowledge of dpkg's internals, computing checksums
> and adding a new field to the status file doesn't seem hard to
> implement.

The general idea seems worthwhile in principle. The devil is in the
details though, and with dpkg, the implementation is usually not the
hard part. :)

David also pointed some of the possible issues. Others that quickly
come to mind, would be:

 * Checksum of what exactly? Although the seemingly obvious answer
   might be “the entire .deb container”, depending on what one wants,
   the interesting data might be different. For example, essential for
   apt would appear to be control.tar and data.tar, and you might not
   want to reinstall if some other member changes; when using signed
   packages changes to the signatures might also be relevant. Other
   .deb members might also be relevant in case another tool wants to
   use them.
 * Currently dpkg extracts the control.tar with dpkg-deb directly to
   disk, and gets the data.tar contents piped from dpkg-deb, so it does
   not get direct access to the whole file, which means the checksum
   would need to be computed out-of-band, needing to process the .deb
   one more time, which might be wasteful.
 * A possibility could be to pre-compute the checksum on creation or
   modification time, and store it in the debian-binary member for
   example. The problem with that is that tools that modify .debs
   might not genereate a checksum, or worse might not update it. And
   this would also not benefit old binaries.
 * Another possibility might be to make dpkg-deb compute the checksum
   when parsing the .deb and output it on a supplied fd through a
   command-line option.
 * Even when dpkg was being used through dselect, where the checksums
   from the archive were fresh and at reach from the available file,
   dpkg has never propagated them to the status file. I guess mainly
   because at the time of «dpkg -i», there was no guarantee that those
   packages corresponded to the ones from the archive.
 …

Thanks,
Guillem



More information about the Reproducible-builds mailing list