[Soc-coordination] Pluggable Acquire System for APT - Report 2

Bogdan Purcareata bogdan.purcareata at gmail.com
Mon Jun 18 11:38:14 UTC 2012


Codename: apt-fetcher
Mentors: Michael Vogt, David Kalnischkies
Project proposal page: [0]
Project design page: [1]

Here is a summary of weeks 3 and 4 of the coding period:

What I've done:
- refactored the public parser to be ABI compatible
- designed the pluggable acquire framework, the format of the plugins
and plugin objects
- partial implementation of this framework
What problems I've run into:
- C++ problems when refactoring the parser - SIGSEGV everywhere
- not really a problem, but it was pretty time consuming to understand
the whole present process and think of a new design for metadata
download
What I plan to do further:
- finish implementation of the framework
- provide unit tests for it
- integrate with the acquire module and finish apt-get update default
functionality

Two weeks ago, at the last report's milestone, the implementation of
the public parser was in functional state, with some unit tests
covered. This public parser is one of the project's deliverables,
intended to be exported from libapt and used not only by APT itself,
but also by other package management applications, as needed. Its
purpose is to parse the sources.list entries and to expose them as
abstract objects with an access interface and predicate iteration
functionalities. These Source objects would contain and provide all
sources.list information, in a structured and coherent way.

First thing I've done at the beginning of week 3 and 4 of GSoC coding
period was to integrate the parser tests into the libapt tests. I'm
not yet sure how to update the makefile hierarchy so that a simple
`make test` from the project root would run the parser tests along
with the others - it can't fix some dependencies right now. Also I've
refactored the parser considering ABI forward compatibility.

The next step was to make use of these sources to download the Debian
Archive metadata. To achieve this, I first had to understand and
follow the current flow of the metadata download algorithm. The
sources.list entries are parsed by a specific internal perser and they
are transformed into metaIndex objects. A metaIndex object represent
an unique (URI, Distribution) tuple, a.k.a. and unique distribution of
debian from a specific location. Each such distribution has a Release
file in its root, that ennumerates all the Debian index files in this
distribution. A Debian index file is the main storage format for
Debian metadata. There currently are index files for package sources,
binaries, translations, tags, contents, etc. Anyways, a metaIndex file
contains the URI and the Distribution of the Release file, and some
additional information such as the sections - main / contrib /
non-free - the architectures and the trusted nature of the metaIndex.
So, a metaIndex object fully describes what data + metadata to
download from a specific debian archive distribution. The metaIndex
object offers primitives to download the metadata - GetIndexes() - and
build objects used for downloading the actual Debian Packages -
GetIndexFiles(). These objects are pkgIndexFile objects, and their
main use is to download actual Package data. Of course, my main point
of interest was downloading the metadata, and as I've seen, the
process follows a pipelined flow:
- it first produces metaIndex objects from the entries in the
sources.list file(s), merging them for the same URI and Distribution.
- it downloads all the metadata for a specific metaIndex - the
GetIndexes() method has an intermediate, private step -
ComputeIndexTargets() - that builds specific download locations for
Debian index files.
- it produces objects to download actual packages.

These two weeks' goal was to design a framework that would refactor
and enhance the metadata download process, and make it plugin-based.

The framework's input is a list of Source objects, as they are parsed
by the previously developed public parser. The framework will build
metaIndex objects for these Sources, using specific metaIndexPlugins -
currently there is only one metaIndex plugin, for Debian, as currently
all Sources are Debian Sources. The metaIndex, like in the current APT
versions, will contain additional information, such as trusted,
sections, architectures, and one more type of info: METADATA TO
DOWNLOAD. The default metadata types are Packages, Translations and
Sources. Other types of metadata can be provided as sources.list
options, like [contents=true] (apt-file metadata) or [tags=true]
(debtags metadata). This metaIndex implements the same interface as
before, but the metaIndexPlugin provides an interface to build
acquireIndex objects. An acquireIndex is a new abstract object for the
framework, that corresponds to an individual Debian Archive index
file. The metaIndex plugin, using the metaIndex information, will
build acquireIndex objects, using the framework's registered
acquireIndex plugins. An acquireIndex plugin will build the
acquireIndex objects for a specific type of metadata. So the
framework, by default, will have installed acquireIndex plugins for
Packages, Translations and Sources. Developing support for new types
of metadata will be as simple as extending the acquireIndex plugin and
acquireIndex classes.

The acquireIndex is used to build IndexTarget objects, which represent
locations for index files in the Debian Arhive. These are then used by
the APT acquire module to download the files, using available
technologies - diff files, compression. From what I've read in the APT
source code, the acquire module can support the download of generic
index files, through the pkgAcqIndex class. This way, new metadata
files can be added in the Debian Archive and downloaded with the
current acquire module. The pluggable acquire framework must provide a
way to translate sources.list metadata information into remote index
file locations.

Here is a framework usage example, as I've thought of it so far:
* the framework object is instantiated, and several plugins are
registered to it:
- the metaIndex plugins - the Debian metaIndex plugin by default. To
determine which metaIndex plugin will be used to create the metaIndex,
the Type information in the Source object is used. In the future, if
more types of metaIndexes are to be supported (new Release files, new
type of metaIndex information, etc.), new metaIndex plugins can be
implemented.
- the acquireIndex plugins - the Packages, Sources and Translations
acquireIndex plugins by default, and all the other plugins of
applications that are installed and provide a plugin - in the future,
we plan to implement plugins for debtags and apt-file. These
acquireIndex plugins are associated with Debian's metaIndex plugin.
* the framework receives a list of Source objects as input and
transforms them into metaIndex objects, using the metaIndex plugin.
Multiple metaIndex objects can be merged, if they refer to the same
URI and Distribution.
* for each of these metaIndex objects, the metaIndex plugin builds
acquireIndex objects, according to the framework's installed
acquireIndex plugins and the metadata types of the metaIndex.
* the acquireIndex objects are then used to compute IndexTarget
objects, which represent index files locations in the Debian Archive.
* in metaIndex->GetIndexes(), there IndexTarget objects are used to
build pkgAcqIndex objects and download the metadata files.

At this point, I've done most of the implementation of this framework,
but it is not done yet. I'm hoping to finish it during week 5 and also
build some unit tests. Regarding the initial timeline, I think it will
be slightly adjusted: as it turns out, there is no need to provide new
download and security mechanisms for the backend, the present ones in
the acquire module are usable. Also, I had to first design the plugin
model and implement the support for building IndexTarget objects,
before I could fully integrate with apt-get update default
functionality. So even though I haven't managed to provide the default
apt-get update functionality with the new components by now, I've done
the generic plugin definition and interface, which was planned after
that. Also, the parser implementation is pretty much final.

As a conclusion, I'm feeling a lot more comfortable with the APT code,
and I think it's slowly coming to the desired purpose. Just a few more
steps and apt-get update will work with the new parser and metadata
acquire framework. After this, the code will be throughly tested, new
plugins will be built, and certain aspects - acquire logic, donwload
security - will be optimized.

My contributions to the APT package can be found in the repo [2]: the
header file [3], the implementation file [4] and the tester [5]. The
framework: header file [6] and implementation file [7].

Bogdan Purcareata

[0] http://wiki.debian.org/SummerOfCode2012/Projects#Pluggable_acquire-system_for_APT
[1] http://wiki.debian.org/BogdanPurcareata/PluggableAptBackend
[2] https://launchpad.net/apt-fetcher
[3] http://bazaar.launchpad.net/~bogdan-purcareata/apt-fetcher/trunk/view/head:/apt-pkg/parser.h
[4] http://bazaar.launchpad.net/~bogdan-purcareata/apt-fetcher/trunk/view/head:/apt-pkg/parser.cc
[5] http://bazaar.launchpad.net/~bogdan-purcareata/apt-fetcher/trunk/view/head:/test/libapt/parser_tester.cc
[6] http://bazaar.launchpad.net/~bogdan-purcareata/apt-fetcher/trunk/view/head:/apt-pkg/framework.h
[7] http://bazaar.launchpad.net/~bogdan-purcareata/apt-fetcher/trunk/view/head:/apt-pkg/framework.cc



More information about the Soc-coordination mailing list