<html><head><style>pre,code,address {
margin: 0px;
}
h1,h2,h3,h4,h5,h6 {
margin-top: 0.2em;
margin-bottom: 0.2em;
}
ol,ul {
margin-top: 0em;
margin-bottom: 0em;
}
blockquote {
margin-top: 0em;
margin-bottom: 0em;
}
</style></head><body><div>Hi,</div><div><br></div><div>regarding your analysis: I think you could just scan the .changes files as they list all *.deb files uploaded. Though very old changes only have MD5 hashes.</div><div><br></div><div>They can be found in <a href="file://mirror.ftp-master.debian.org/srv/ftp-master.debian.org/queue/done">file://mirror.ftp-master.debian.org/srv/ftp-master.debian.org/queue/done</a></div><div><br></div><div>Regarding your observation regarding bash not showing up in any Packages index: that can happen for (at least) two reasons. The snapshot service does not retrieve all Packages files. Or the package could have been superseded by a newer version before it was ever published in a dinstall run.</div><div><br></div><div>Regards,</div><div>Ansgar</div><div><span></span></div><div><br></div><div>On Wed, 2025-04-02 at 15:26 +0200, Johannes Schauer Marin Rodrigues wrote:</div><blockquote type="cite" style="margin:0 0 0 .8ex; border-left:2px #729fcf solid;padding-left:1ex"><div>Hi,<br></div><div><br></div><div>On Thu, 30 May 2024 14:26:31 +0000 Holger Levsen <<a href="mailto:holger@layer-acht.org">holger@layer-acht.org</a>> wrote:<br></div><blockquote type="cite" style="margin:0 0 0 .8ex; border-left:2px #729fcf solid;padding-left:1ex"><div>very "nice" find, josch!<br></div></blockquote><div><br></div><div>with the help of Holger and osuosl4 I have dug into this a bit more and tried<br></div><div>to get some hard data about this problem. My idea was the following: parse all<br></div><div>Packages files for all suites, all architectures and all components for all<br></div><div>timestamps stored on snapshot.d.o and find packages with the same<br></div><div>name/arch/version tuple that have a different checksum. To this end, I slightly<br></div><div>(less than 1000 lines of diff) patched the tooling at<br></div><div><a href="https://salsa.debian.org/metasnap-team/metasnap.git">https://salsa.debian.org/metasnap-team/metasnap.git</a> with the patch that I<br></div><div>attached to this mail on top of 1dadf2575160caf9467c4e21aa6c0a31ac10ffc2.<br></div><div><br></div><div>After running that script for 3 months and downloading 189 GB of data in 3.5<br></div><div>Million requests (about 2 seconds for every request), we had a database<br></div><div>(actually a git repository) of 48 GB that we can use to find duplicates. It<br></div><div>took another 2 months to go through that data. I attached a graph which shows<br></div><div>the number of duplicate name/arch/version triplets per timestamp. Please note<br></div><div>the logarithmic y-axis. The total number of duplicates from 2005 until 2024 is<br></div><div>334335.<br></div><div><br></div><div>Problem solved? Not so fast. Processing all Packages files will *not* find the<br></div><div>original problem with bash. Why? Because according to the Packages files from<br></div><div>snapshot.debian.org only one version of bash:arm64=5.2.15-2+b3 exists, namely:<br></div><div><br></div><div>MD5sum: 01ee4cfa3df78e7ff0dc156ff19e2c88<br></div><div>SHA1: 1a0b12419b69a983bf22ac1d3d9f8bec725487b1<br></div><div>SHA256: 828ce0b4445921fff5b6394e74cce8296f3038d559845a3e82435b55ca6fcaa8<br></div><div><br></div><div>The other version never ended up in a Packages file even though it was found in<br></div><div>the /pool/main/b/bash directory in the snapshot of 2023-07-13 21:11:09 nearly<br></div><div>one year before the other version popped up.<br></div><div><br></div><div>How can a package be in the pool directory but not in a Packages file? No idea<br></div><div>but it shows that my method from above does not find a certain class of<br></div><div>problems. We could find those by creating a fitting query against the<br></div><div>snapshot.d.o database. Apparently lw07 is DD accessible and has a<br></div><div>snapshot-guest service. So this is on my TODO list and Nicolas Dandrimont<br></div><div>already offered to help with constructing an appropriate SQL query during<br></div><div>MiniDebConf Hamburg this year.<br></div><div><br></div><div>Lastly there is the problem of packages in incoming. Those packages will be<br></div><div>used to build other packages that end up in the archive but they might never<br></div><div>end up in the archive themselves. Thus, we might never know whether one of<br></div><div>these packages violated the idea that the packagename/architecture/version<br></div><div>triplet uniquely identifies a Debian binary package in the archive...<br></div><div><br></div><div>Thanks!<br></div><div><br></div><div>cheers, josch</div></blockquote></body></html>