partial snapshot mirror amd64/bullseye/bookworm
Lucas Nussbaum
lucas at debian.org
Tue Mar 2 09:58:55 GMT 2021
On 01/03/21 at 22:41 +0000, Paul Wise wrote:
> On Mon, Mar 1, 2021 at 5:25 PM Holger Levsen wrote:
>
> > > How would the mirroring work?
> >
> > to be discussed, but my raw idea would be to use rsync with excluding the years
> > before 2015 or 2017. or can't this work? 8-)
>
> That won't work, since the filesystem storing the data is hash (SHA1)
> based, so you need to look up hashes for the relevant data in the
> database and then copy only those files.
Hi,
For https://trends.debian.net/, I have a local mirror of snapshot.d.o
(with sources only, and only for specific versions). The code used to
create it is available in https://salsa.debian.org/lucas/dhistory/-/blob/master/dhistory
Specifically, it:
- queries the snapshot DB to identify the files and hashes for each
source package
- fetches and analyses Sources files to identify (source,version) of
interest, and thus hashes to transfer
- transfers those hashes from snapshot.d.o to my own machine using rsync
The query used for the first step is:
psql -At service=snapshot-guest -c "select row_to_json(t) from
(select srcpkg.name as source_name, srcpkg.version as source_version,
file.name as file_name, file.hash as file_hash, file.size as file_size,
node_with_ts.first_run as file_first_run, node_with_ts.last_run as file_last_run
from srcpkg
inner join file_srcpkg_mapping on srcpkg.srcpkg_id = file_srcpkg_mapping.srcpkg_id
inner join file on file.hash = file_srcpkg_mapping.hash
inner join node_with_ts on node_with_ts.node_id = file.node_id
inner join archive on node_with_ts.archive_id = archive.archive_id
where archive.name = 'debian') t"
That's the query that would have to be adapted for binary packages and
for a specific date range.
Lucas
More information about the Reproducible-builds
mailing list