[Soc-coordination] Report 3 - PyPI to Debian repository converter

Sun Jul 1 21:54:51 UTC 2012

Hello,

This is my third report on the work progress on a project PyPI to Debian
Repository Converter mentored by Piotr Ożarowski.

Work:
-----
Over the past two weeks I’ve worked mainly over designing detailed
structure of database and writing ORM corresponding to it. Basing on
information from the database, I’ve adjusted most of the holistic
algorithm of my program. The main reason of this work was implementation
of few new command line options. Through the deeper thoughts of the
whole program, I could also make some improvements in plugins’ API.

Status of my project described in previous reports included initial
implementation of functions that: download packages from PyPI, change
their names to comply with Debian policy, copy and extract archives and
finally convert and build Debian packages.

The next task that I undertook was to enter all useful information,
which were received during tool’s runtime, into a database.

Database:
----------
In spite of reasonableness the design scheme, additional problem for me
was to implement ORM using SQLAlchemy[1] library, because I’ve never
used it before. However, the benefits of its use presented to me by my
mentor (e.g.: complex object model, easier switch between different
types of databases, simplification of queries, more transparent code),
convinced me to take a few moments to familiarize myself with
documentation and basic use of this library and eventually I’ve
successfully applied it.

After consideration, many attempts and discussions, tool’s database
scheme contains (at the moment) four tables, two main (“processed”,
“skipped”) and two auxiliary (“commands”, “names”).

Table “processed” contains information about packages (which have been
transmitted to convert plugins) generated during operation of the
program, such as:
* package name (renamed to comply with Debian Policy)
* package version (adjusted in a way that dpkg --compare-versions sorts
them correctly)
* process type (instances of for example: ‘convert’, ‘source package
build’, ‘bin package build’)
* plugin name of the executing process
* return code of process (in order to easily identify packages with the
same problem)
* stdout (I want to make logs public later)
* stderr (might be interesting especially for packages developers)
* start time of executing process
* end time of executing process
* session id (to identify the batch from which logs come from, it’s the
start time of the tool)

Table “skipped” contains information about packages, which for various
reasons couldn’t be converted (i.e. convert plugins were not used). Of
course I keep a package name and version, further reason for rejecting
the package and session id.

Table “commands” contains information about session in which the tool
was run, that means options selected from available through the command
line:
* requested Python version (only packages that support given version
were converted)
* tarballs path (to a directory with archives downloaded from PyPI)
* skip-existing (a boolean indicating if packages already available in
Debian should also be converted)
* distro (to change the default distribution in generated packages)
* package (to convert only selected packages or specific version, if given)
* download-only (to download all tarballs from PyPI without other actions)
* force-update (to clear all data and make repository from the beginning)
All this data is assigned to appropriate session id (which is used in
other tables).

Table “names” contains mapping of original package name and original
 version to package name and version obtained by changing the name to
comply with Debian policy.

When I’ve implement the possibility of writing into and obtaining from
the database information about the packages (which ultimately replaced
introduced earlier mechanism of statistics, important for me, because
from the beginning I tried not to loose any package along the way) I
received a guarantee to know the status of each package. Also, it
simplified issue resume associated with resume the interrupted program.

Support options
----------------
The next step taken by me was attempt to implement support for described
above options available through commandline. Unfortunately at this
point, I’ve discovered that a set of generators which I’ve previously
written require considerable rewrite to be able to effective support
those options.

First of all I’ve to create two paths of action from different starting
points: the first one when the network connection is active and packages
have to be downloaded from the PyPI, and the second one when tarballs
have been downloaded in front, or even came from another source (private
tarballs, not available in PyPI). I’ve determined the next steps in both
cases and actions common to both ways and then started quite arduous
work involving the shifting functions to the right place. Work has been
beneficial, because on this occasion I managed to get a clearer
structure of the program (which I hope will result in fewer bugs) and a
clearer layout files in the repository. This is also really helpful to
implement support of options, such as: --path, --package,
--download-only, --force-update.

Option --skip-existing will skip packages already present in official
Debian repository, it will use system function: “apt-cache search -n
python” to generate a list of available packages (respectively for
Python 2 and python 3) and on this basis ignores package previously
submitted for conversion.

Plugins
--------
In the course of work I’ve managed to improve a little plugins’ API
described in previous report[2] by introduction to the plugin base class
methods “is_usable” which validates whether the plugin is installed and
can be used. The method checks for availability of commands listed in
“required_commands” attribute by default.

Summary
---------
I am glad that finally my tool has such an important element as a
database. I am also sure, that familiarising myself with such a powerful
library as SQLAlchemy will pay off in the future. Also finally adding
support for new options is an important step, because such a big
upheaval in the program better to be behind as soon as possible.

Plans
------
Upcoming plans include continued testing and introduced solutions
described above. But the most important task, which so far has not met,
is to improve many components of my tool which, for now, have basic
functionality only, especially the practical implementation of all
plugin methods. I will focus on this in the near term. I also plan to
start working on the Debian source/binary repository which will contain
converted packages.

------
[1] http://www.sqlalchemy.org/
[2] http://lists.debian.org/debian-python/2012/06/msg00039.html