[Soc-coordination] multi-archive support in dak: first report
Ansgar Burchardt
ansgar at debian.org
Fri Jun 1 16:49:43 UTC 2012
Hi,
Here comes my first report on my Google Summer of Code project to
implement multi-archive support in dak:
Getting started
---------------
To get started I installed dak locally on my machine. As dak usually
runs only on stable I had to patch a few things to get it running on
wheezy (which were already merged).
I also found some dak commands I did not know about yet. It turned
out they are no longer of any use and are now pending removal.
First steps
-----------
While there was an archive table in the database, it was not really
useful to have multiple entires there as dak did not record which
suite belonged to which archive. So as a first step I did add a
column relating suites to archives.
This allows tools to create files relative to the archive root for the
given suite instead of under a fixed path for all suites (Dir::Root).
I did patch most tools to do so. Still missing are check-archive,
control-suite, copy-installer, generate-index-diffs and the
daklib/queue.py module.
The next, larger step was to allow the same file to exist in multiple
archive at the same time. This wasn't possible as the files table
links to a single location, so it had to be replaced by a N:M relation
between files and archives (files_archive_map). I decided to also
allow files to exist in multiple components (main/contrib/non-free) in
the same archive. This is not a large change, but helps with moving a
package between components while keeping the same .orig.tar.gz.
New problems
------------
While this sounds quite simple, this change has many consequences:
- An upload may now reference files that are already known to dak, but
not in the right archive and need to be copied over.
- Files might be removed from single archives, but we need to make
sure not to remove sources for binaries in an archive (ie. this is
now a per-archive constraint instead of a global constraint).
- When moving binaries between archives, we have to make sure the
source is also available in the target archive.
I started to work on teaching the package installation logic to add
the needed entries in files_archive_map (not so hard) and planned to
later drop the files.location relation, however I came to the
conclusion that another approach might be better.
Next steps
----------
Even if the neccesary changes were implemented in process-upload, I
would still need to re-implement parts of it to allow moving packages
between suites/archives from other places or installing packages into
multiple suites as I need to in order to replace build queues with
regular suites (process-upload and the modules it uses is not really
usable from a different context). Also the tendency of process-upload
to leave the archive in a inconsistent state in case of bugs and its
"dict-oriented" programming[1] turned out to be quite annoying.
[1] Large parts use dicts as data structures where it is not clear
where and when values are set and what they mean.
So I started to work on a module that allows to manipulate the archive
in a safe way[2] and is usable as a library. So far I am progressing
quite well: I can already install and copy packages in normal cases;
there may still be problems with NEW packages and byhand is not yet
implemented. Some code for removing packages is also written, but not
yet tested.
[2] That it is to always keep the archive in a consistent state.
There is also one additional change: as one of my goals of the GSoC
project is to remove code duplication, I now also plan to convert
policy queues to regular suites as well (instead of only converting
build queues). This means less special cases to implement in the new
code (as files can *only* be in archives and nowhere else). On the
downside it means process-policy and queue-report will also need a few
changes...
My work can be found in my Git repository at [3]. Most of the work
happens in the pu/multiarchive-{1,2} branches.
[3] https://ftp-master.debian.org/users/ansgar/dak.git
Ansgar
More information about the Soc-coordination
mailing list