[Soc-coordination] Aptitude/DPKG Project Idea

David Kalnischkies kalnischkies+debian at gmail.com
Sat Mar 5 12:18:41 UTC 2011


On Thu, Mar 3, 2011 at 22:29, Chris Baines <cbaines8 at gmail.com> wrote:
> After thinking more on the subject of speeding up the Debian package
> management system, I considered the efficiency of the approach used by
> aptitude, at the moment when downloading and installing a large number
> of packages, they are all downloaded then installed.

Thats not only a "problem" of aptitude - its a defect for the complete APT
family ranging from my beloved apt-get itself to software-center, so fixing it
just for aptitude doesn't make a lot of sense - especially as all these tools
share code to download and install packages in libapt-pkg (provided by apt).
Or in short: s/aptitude/APT/ in this context.
(just in case: I have absolutely nothing against aptitude in general)


> My proposal would involve:
> - Allowing aptitude to:
>  - When installing a long list of packages, break down the list in to
> many groups of unrelated packages

Sounds easy at first, but you will need to think a lot about this as
you don't want too big as well as too small groups - beside that you
need to identify these groups (partly done already).

You properly want to make it configurable as a good groupsize will depend
on the hardware we are operating on. My mobilephone for example has maybe
very few diskspace so i want to have very small groups, but my server with
access to a local mirror (with nearly instant download time) will spend more
time forking and starting up dpkg than installing packages if the groups
are too small.

The good news: This groupsize problem could be avoided, if dpkg would have
(again) a working --command-fd, so APT could spawn it once and communicate
unpack/configure/remove requests over a fd eliminating the need to frequently
start dpkg - which reads e.g. its huge info-database every time…

So a valid plan might be:
1. implement --command-fd in dpkg
2. switch APT from forking many dpkg to communication with a single dpkg
3. write a nice interface to easily parallelize download/install


>  - Choose the group with the lowest overall download size, and then
> begin downloading the packages in this group from the top of the
> dependency tree downwards

Thats properly just a problem in the naming, but if i think of a dependency
tree i usually make the package i want to install in the first place the root
while all its dependencies are children which have possible itself many many
subtrees with their own children and somewhere at the bottom are hopefully
the leaf nodes - packages which have no dependency (on non-installed pkgs).
In such a tree you want to start with the leafs - not with the root -
as you need to install all leafs before the root can work.
So if a leaf fails to install, it doesn't make sense to proceed in trying
to install root - it will just fail even more horrible.

In an install request it might make sense to download in the reverse order
of installation as you have a known target, but for general upgrades the
target is just to upgrade packages, so you want to upgrade as much as
possible - even if the network connection fails in the middle:
If you have downloaded in reverse order you can upgrade nothing, if you
have started with the leafs you can at least upgrade these while waiting
for a reconnect. (Thats already implemented as --no-download --fix-missing)

>  - As soon as aptitude has installable packages downloaded, begin
> installing them
>    - I am not sure about this step, I dont know how exactly how dpkg
> handles package installations, it might be quicker to just unpack and
> prepare the packages as they are downloaded before actually
> installing?

Please make sure to understand what the dpkg states mean. If you tell dpkg
to unpack package A it does far more than just extract a tarball.
pre* maintainerscripts are executed for example - which have (pre)depends!
So you should only unpack a package if you know you are able to configure
it later (even if connection fails now) - otherwise you have a package in a
broken state which is not allowed and your only valid option to recover from
that would be to remove the package and all who depend on it!
(downgrades are not supported by debian, so shouldn't be done automatic)

Sorry, i would love to point you to a nice overview, but i can't find one :(


>  - Perhaps as an addition to this I would like to also look at
> improving the way in which aptitude could use debdelta, possible
> looking and testing the integration available in cupt?
>
> Is this feasible as a project application? This kind of supersedes my
> debdelta only project idea.

I don't know your background, but given that you will likely need to dig deep
into dpkg and APT to work on the "first part" of your proposal i am tempted to
say that it is (way) too much for a gsoc project to add a "second part".


How about joining mailinglist/irc of the teams, pick one of the open bugs
and give it a try? It should help you and your wannabe mentors to decide
what is possible. A lot of time is "wasted" after and most of the time also
before and in the middle of writing a patch with reviewing and discussing it,
so picking a simple bug is a good way to test the "waste management"
skills as talking is a big part of the work in the teams…


Best regards

David Kalnischkies
(last year "MultiArch in APT" gsoc student)



More information about the Soc-coordination mailing list