Bug#1072205: prevent re-using package versions for NMUs

Ansgar 🙀 ansgar at 43-1.org
Wed Apr 2 20:41:39 BST 2025


Hi,

regarding your analysis: I think you could just scan the .changes files
as they list all *.deb files uploaded. Though very old changes only
have MD5 hashes.

They can be found in
file://mirror.ftp-master.debian.org/srv/ftp-master.debian.org/queue/done

Regarding your observation regarding bash not showing up in any
Packages index: that can happen for (at least) two reasons. The
snapshot service does not retrieve all Packages files. Or the package
could have been superseded by a newer version before it was ever
published in a dinstall run.

Regards,
Ansgar

On Wed, 2025-04-02 at 15:26 +0200, Johannes Schauer Marin Rodrigues
wrote:
> Hi,
> 
> On Thu, 30 May 2024 14:26:31 +0000 Holger Levsen
> <holger at layer-acht.org> wrote:
> > very "nice" find, josch!
> 
> with the help of Holger and osuosl4 I have dug into this a bit more
> and tried
> to get some hard data about this problem. My idea was the following:
> parse all
> Packages files for all suites, all architectures and all components
> for all
> timestamps stored on snapshot.d.o and find packages with the same
> name/arch/version tuple that have a different checksum. To this end,
> I slightly
> (less than 1000 lines of diff) patched the tooling at
> https://salsa.debian.org/metasnap-team/metasnap.git with the patch
> that I
> attached to this mail on top of
> 1dadf2575160caf9467c4e21aa6c0a31ac10ffc2.
> 
> After running that script for 3 months and downloading 189 GB of data
> in 3.5
> Million requests (about 2 seconds for every request), we had a
> database
> (actually a git repository) of 48 GB that we can use to find
> duplicates. It
> took another 2 months to go through that data. I attached a graph
> which shows
> the number of duplicate name/arch/version triplets per timestamp.
> Please note
> the logarithmic y-axis. The total number of duplicates from 2005
> until 2024 is
> 334335.
> 
> Problem solved? Not so fast. Processing all Packages files will *not*
> find the
> original problem with bash. Why? Because according to the Packages
> files from
> snapshot.debian.org only one version of bash:arm64=5.2.15-2+b3
> exists, namely:
> 
> MD5sum: 01ee4cfa3df78e7ff0dc156ff19e2c88
> SHA1:   1a0b12419b69a983bf22ac1d3d9f8bec725487b1
> SHA256:
> 828ce0b4445921fff5b6394e74cce8296f3038d559845a3e82435b55ca6fcaa8
> 
> The other version never ended up in a Packages file even though it
> was found in
> the  /pool/main/b/bash directory in the snapshot of 2023-07-13
> 21:11:09 nearly
> one year before the other version popped up.
> 
> How can a package be in the pool directory but not in a Packages
> file? No idea
> but it shows that my method from above does not find a certain class
> of
> problems. We could find those by creating a fitting query against the
> snapshot.d.o database. Apparently lw07 is DD accessible and has a
> snapshot-guest service. So this is on my TODO list and Nicolas
> Dandrimont
> already offered to help with constructing an appropriate SQL query
> during
> MiniDebConf Hamburg this year.
> 
> Lastly there is the problem of packages in incoming. Those packages
> will be
> used to build other packages that end up in the archive but they
> might never
> end up in the archive themselves. Thus, we might never know whether
> one of
> these packages violated the idea that the
> packagename/architecture/version
> triplet uniquely identifies a Debian binary package in the archive...
> 
> Thanks!
> 
> cheers, josch
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://alioth-lists.debian.net/pipermail/reproducible-bugs/attachments/20250402/89f33a40/attachment.htm>


More information about the Reproducible-bugs mailing list