[Debian-salsa-ci] dumat + salsa-ci
Santiago Ruano Rincón
santiago at debian.org
Thu Sep 21 20:19:27 BST 2023
El 19/09/23 a las 10:51, Helmut Grohne escribió:
> Hi Santiago,
>
> On Tue, Sep 19, 2023 at 05:22:58AM -0300, Santiago Ruano Rincón wrote:
> > > > It would also be possible to have salsa-ci maintain its own version of
> > > > this database by regularly updating it locally. The steps to create this
> > > > database are as follows.
> > > >
> > > > sqlite3 dumat.db < schema.sql
> > > > ./import_mirror.py -d dumat.db
> > > >
> > > > That latter step will download very many packages from deb.debian.org on
> > > > the first invocation. In later invocations, it'll do incremental
> > > > updates.
> >
> > Could you please clarify what do you mean with locally?
>
> My personal server currently maintains a dumat.db and does this sql
> export to download. This database is only meaningful to a particular
> snapshot of the Debian archive at the moment it was produced. Using this
> database is what I'd consider "external" from a salsa-ci point of view.
>
> We could also have some salsa-ci job create and maintain this database
> storing it in salsa-ci owned storage. That's what I'd call "locally"
> here. Does that make sense?
The "challenge" here is Salsa CI doesn't (currently) have any central
and common server where the different jobs from the different projects
could share data.
Each jobs runs in a volatile container, upload the artifacts to be used
in next stages, and that's all.
Let's me translate how I understand your thoughts. I would like to
confirm we are in the same page:
* there should be a "dumat" image that keeps an frequently updated copy
of dumat.sql.zst from https://subdivi.de/~helmut/dumat.sql.zst
* following some rules, run a dumat job that uses the dumat image and
does:
* ./import_mirror.py -d dumat.db --changes $WORKING_DIR/*.changes
(the changes from the preceding build job)
* ./analyze.py -d dumat.db > dumat.yaml
Is that correct? Or is anyway needed to call import_mirror to "download
very many packages from deb.debian.org" before importing the built
packages in the pipeline?
> > Do you have any idea about the current size of packages to be
> > downloaded?
>
> For the first download, I recall around 0.5TiB. Moving forward, you
> essentially probably download 0.1TiB per month. On my own server I am
> partially mitigating this traffic by opportunistically using
> mirror.hetzner.de when packages are available there as that is
> considered "internal traffic" and not accounted, but in general,
> mirror.hetzner.de is always too slow (except for the initial import when
> it is blazingly fast for 90%).
If the salsa admins are OK with this, then great!
> > > > I do not recommend adding this to the default pipeline due to the
> > > > resources consumed by this job. It only makes sense for a tiny fraction
> > > > of the archive. Maybe we can hard code a list of potentially affected
> > > > source packages and skip it in all other cases?
> >
> > Salsa CI cannot control what packages are going to use it. We can add
> > rules so this job is not run by default. A dumat test could be opt-in.
>
> I was trying to imply a different approach. My idea here was making it
> opt-out. The actual job would then look up the current source package
> name in a hard coded list and exit successfully unless a package is
> matched. So the job would always run, but do nothing and exit quickly
> for the majority of users. Users could still opt out and users could
> force running it by skipping the source package match. Does that make
> sense to you?
This is technically possible… But keeping a hard-coded list of packages
is something that I don't find very appealing.
Cheers!
-- Santiago
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 228 bytes
Desc: not available
URL: <http://alioth-lists.debian.net/pipermail/debian-salsa-ci/attachments/20230921/4a8a4f5a/attachment.sig>
More information about the Debian-salsa-ci
mailing list