partial snapshot mirror amd64/bullseye/bookworm

Lucas Nussbaum lucas at debian.org
Tue Mar 2 09:58:55 GMT 2021


On 01/03/21 at 22:41 +0000, Paul Wise wrote:
> On Mon, Mar 1, 2021 at 5:25 PM Holger Levsen wrote:
> 
> > > How would the mirroring work?
> >
> > to be discussed, but my raw idea would be to use rsync with excluding the years
> > before 2015 or 2017. or can't this work? 8-)
> 
> That won't work, since the filesystem storing the data is hash (SHA1)
> based, so you need to look up hashes for the relevant data in the
> database and then copy only those files.

Hi,

For https://trends.debian.net/, I have a local mirror of snapshot.d.o
(with sources only, and only for specific versions). The code used to
create it is available in https://salsa.debian.org/lucas/dhistory/-/blob/master/dhistory

Specifically, it:
- queries the snapshot DB to identify the files and hashes for each
  source package
- fetches and analyses Sources files to identify (source,version) of
  interest, and thus hashes to transfer
- transfers those hashes from snapshot.d.o to my own machine using rsync

The query used for the first step is:
  psql -At service=snapshot-guest -c "select row_to_json(t) from
  (select srcpkg.name as source_name, srcpkg.version as source_version,
   file.name as file_name, file.hash as file_hash, file.size as file_size,
   node_with_ts.first_run as file_first_run, node_with_ts.last_run as file_last_run
  from srcpkg
  inner join file_srcpkg_mapping on srcpkg.srcpkg_id = file_srcpkg_mapping.srcpkg_id
  inner join file on file.hash = file_srcpkg_mapping.hash
  inner join node_with_ts on node_with_ts.node_id = file.node_id
  inner join archive on node_with_ts.archive_id = archive.archive_id
  where archive.name = 'debian') t"

That's the query that would have to be adapted for binary packages and
for a specific date range.

Lucas



More information about the Reproducible-builds mailing list