[Soc-coordination] Report 6 - Pluggable Acquire System for APT

Sun Aug 19 19:41:03 UTC 2012

Codename: apt-fetcher
Mentors: Michael Vogt, David Kalnischkies
Project proposal page: [0]
Project design page: [1]

General summary (3 - 5 lines):

The most significant benefit that I gained from working as a Debian
Developer during GSoC this year was the ability to handle a very
complex codebase. I managed to read and comprehend dozens of source
code files, propose design ideas and entry points for these in the
current architecture, update the build process and makefile hierarchy,
build test cases for the new features, keep the project backwards
compatible while making it open for further improvements. I have
improved my C++ and BASH scripting skills, along with the ability to
communicate with the Debian specialists, implement desired features,
detect and correct bugs. The final result is an extensible module for
a fundamental and intensively used Debian application.

Last 3 weeks summary:
What I've done:
- replaced the legacy code with the new one
- fixed integration tests
- implemented the possibility to choose which code (legacy vs. new)
will be used in the binaries at compile time
What problems I've run into:
- error object bug
- difficulty in understanding the testing scripts framework
- circular header inclusions

In the last 3 weeks of GSoC, after discussing with the mentors, the
project focused mainly on integration, rather than developing new
features. Until now, the apt-get update functionality was tested only
by calling the framework's methods from libapt. The next step was to
make the necessary changes in apt-get (the end application) to use the
new code.

Me and the mentors lost quite a few time fixing a bug regarding error
reporting - in APT, the error messages are kept in a global queue,
represented by the _error object. The implemented parser uses this
object when parsing sources.list. It first tries parsing a line as a
standard line, as specified in the standard format. If this fails, it
tries to parse it as a comment, and if this fails too, it reads it as
garbage. All these errors are registered in the _error object, and the
calling function takes care to pop one message at a time in case of
failure. The _error object keeps both errors and warnings in the same
place. On some test case, the _error object had some warnings before
the parsing, so after the parsing, the parsing errors remained. The
fix was to save the whole context before the parsing and restore it
afterwards, in case the parsing succeeds.

The new code was supposed to replace the old one in apt-get, not just
work alongside. A long time was spent removing all the code using
apt-pkg/sourcelist and apt-pkg/deb/debmetaindex. All this code was
replaced with references to the framework and the default plugins.
After the project passed compilation stage, it had to pass
test/integration. Some tests were failing because of the changes, and
they had to be fixed. Others were failing not because of apt-fetcher,
but it took a lot of time to figure this out. Also, the scripts for
the tests use a fairly complex framework which must first be
understood before individual tests could be debugged. After the change
in the code didn't produce any failures in test/integration, I added a
test for downloading Contents files with apt-get update -
test-contents-basic.

After finishing this step, the mentors thought it would be a good idea
to keep the legacy code, too, though, since we don't want to break
compatibility. I managed to do that with a couple of header files and
some #ifdefs. Fixing all the references to use the right wrapper
functions and to make the switch only from a single place was also
part of the final touches to the project.

Parts of the initial plan that I didn't get to implement:
* "an user interface for the parser" - right now, all the changes to
the parser can only be made through code. The parser can be
customized, in the future, to make use of the APT configuration for
its settings. The parser is a component which exceeded its initial
estimated complexity - predicate based iteration, ability to implement
plugins with parsing methods for other formats (xml). The parser user
interface's importance is not crucial for APT itself, but rather for
other applications that might be using it via libapt.
* "optimizations in the acquire logic" - APT already implements
transfer methods - e.g. http, ftp, copy, gzip, gpgv, pdiffs - as
independent binaries that are launched when a specific metadata file
needs to be processed. The framework plugin model implements the
capability to define acquire algorithms for metadata file types, using
these methods. So rather than improving the acquire logic, apt-fetcher
gives the possibility of defining a new one from scratch for the new
files. As for the present acquire logic, it was completely integrated.
* "a plugin for debtags" - the information about package tags is
located in Packages index files in the Debian archive, not on separate
files. These files are already downloaded by default by apt-get
update, so we may consider that tags are supported by the framework.
Most focus was put on developing a plugin for the Contents files,
since these were, until now, unsupported by apt-get.

To sum up, the apt-fetcher module is a viable alternative for the
current apt-get update backend. The parser for sources.list is
pluggable and can support other formats. It can be used from libapt in
other projects as well. The framework is pluggable both for metadata
type files and Release files - in the future, plugins can be defined
for other Archive formats than the standard Debian archive. It
implements the standard flow from a list of Source file object
provided by the parser to a list of Items to be downloaded by the
present acquire module. Extending the Item object is part of the
plugin, so the developer may define specific operations when
downloading a metadata file. The plugin can also acces the APT
configuration object - _config - and make custom settings when
registering a plugin so they can be used in the metadata acquire
process.

The APT package is proof that a complex project cannot be fully
understood in the timespan of preparing an application. The proposed
estimations and timeline didn't have the expected accuracy. On one
hand, there was no need to implement a backend acquire logic, since
the present one could easily be integrated; on the other, researching
the state of the art lasted the whole community bonding period and
more than half of the coding period. And that was only enough to
develop what I've developed, since it was difficult to wrap my mind
around the whole picture of the APT code. I changed the design and the
code very often, I was always missing details which would later prove
to be relevant. The next most difficult thing to -understanding was
designing, building an architecture over a construction you don't
undestand completely.

To me, the project was a success. APT is one of the fundamental
components of Debian and its successors as a Linux Distro, it has
passed a long series of revisions, and through them it was impossible
to predict how its architecture would evolve. This resulted in a very
stable and complex code. To refactor, expose and extend the metadata
acquire backend was very important in the context of integrating APT
with other package management applications, in Debian and other
distributions (the AppStream project). This component is a little part
in the APT suite, but to be capable of making the changes, one must
first have the big picture. I've organized its current code and made
it extensible for future development. I've integrated it in APT and
tested it with the regression suite. I've implemented a base on which
future developers may build plugins for their specific formats and
files.

My contributions to the APT package can be found in the repo [2].

Bogdan Purcareata

[0] http://wiki.debian.org/SummerOfCode2012/Projects#Pluggable_acquire-system_for_APT
[1] http://wiki.debian.org/BogdanPurcareata/PluggableAptBackend
[2] https://launchpad.net/apt-fetcher