[Debian-salsa-ci] dumat + salsa-ci
Helmut Grohne
helmut at subdivi.de
Tue Sep 19 09:51:12 BST 2023
Hi Santiago,
On Tue, Sep 19, 2023 at 05:22:58AM -0300, Santiago Ruano Rincón wrote:
> > > It would also be possible to have salsa-ci maintain its own version of
> > > this database by regularly updating it locally. The steps to create this
> > > database are as follows.
> > >
> > > sqlite3 dumat.db < schema.sql
> > > ./import_mirror.py -d dumat.db
> > >
> > > That latter step will download very many packages from deb.debian.org on
> > > the first invocation. In later invocations, it'll do incremental
> > > updates.
>
> Could you please clarify what do you mean with locally?
My personal server currently maintains a dumat.db and does this sql
export to download. This database is only meaningful to a particular
snapshot of the Debian archive at the moment it was produced. Using this
database is what I'd consider "external" from a salsa-ci point of view.
We could also have some salsa-ci job create and maintain this database
storing it in salsa-ci owned storage. That's what I'd call "locally"
here. Does that make sense?
> Do you have any idea about the current size of packages to be
> downloaded?
For the first download, I recall around 0.5TiB. Moving forward, you
essentially probably download 0.1TiB per month. On my own server I am
partially mitigating this traffic by opportunistically using
mirror.hetzner.de when packages are available there as that is
considered "internal traffic" and not accounted, but in general,
mirror.hetzner.de is always too slow (except for the initial import when
it is blazingly fast for 90%).
> > > I do not recommend adding this to the default pipeline due to the
> > > resources consumed by this job. It only makes sense for a tiny fraction
> > > of the archive. Maybe we can hard code a list of potentially affected
> > > source packages and skip it in all other cases?
>
> Salsa CI cannot control what packages are going to use it. We can add
> rules so this job is not run by default. A dumat test could be opt-in.
I was trying to imply a different approach. My idea here was making it
opt-out. The actual job would then look up the current source package
name in a hard coded list and exit successfully unless a package is
matched. So the job would always run, but do nothing and exit quickly
for the majority of users. Users could still opt out and users could
force running it by skipping the source package match. Does that make
sense to you?
Helmut
More information about the Debian-salsa-ci
mailing list