[Soc-coordination] Report 4 - PyPI to Debian repository converter

Piotr Ożarowski piotr at debian.org
Mon Jul 30 21:07:51 UTC 2012

[Natalia is not able to send this week's report by herself, so I'm sending the
 draft she sent me yesterday, footnotes are mine]


This is my fourth report on the work progress on a project PyPI to Debian
Repository Converter.

Over the past weeks I’ve worked mainly over improving components of my tool,
refinement the algorithm, practical implementation of all plugin methods and
increase their efficiency, handling command line options and create the Debian
source/binary repository which contains converted packages.

Overall functionality
Current status of my program looks like this:
* initial part based on generators:
 - creating a list of packages to convert (from PyPI by its XML-RPC methods or
   from specified directories)
 - preparing packages by downloading tarballs, if needed: repacking and renaming
   them, extracting archives
* part of the work based on the plugins system:
 - converting packages (currently implemented: stdeb and pkgme)
 - building source packages (currently implemented: dpkg-source)
 - building binary packages (currently implemented: dpkg-buildpackage)
 - exporting to repository (currently implemented: apt-ftparchive)
* storing all information in the database.

Very nice moment during my recent work on the tool was generating final
product, that is: a complete Debian repository based on converted packages and
even successfully installation of a package (source and binary) from the
repository created by PyPI2Deb :-)

According to the findings of discussions with my mentor, based on lessons
learned from the trials and several approaches to design, algorithm was finally
developed (hopefully) and almost entirely implemented as follows:

[l] get list of packages
[p] select a new package/version pair
    - if there are no more, end the program
[h] get status of selected pair from the database
    - if the package is already in the repo, go to [p]
    - if conversion files are on disk, go to [s]
[c] select next convert plugin:
    - if there are no more, go to [p]
    - if option --force-conversion is enabled, go to [cc]
    - if selected plugin has already been used, go to [c]
[cc] convert package/version using selected convert plugin
    - if it fails, go to [c]
[s] build a source package
    - if it was not successful:
      - go to [c] if --try-next-conversion-plugin-if-building-src-pkg-fails is
        enabled (at the command line or configuration)
      - go to [p]
[b] select next build plugin
    - if there are no more, go to [p]
    - if option --force-build is enabled, go to [bb]
    - if plugin has already been used, go to [b]
[bb] build a binary package using selected build plugin
    - if it failed:
      - go to [b] if --try-next-build-plugin-if-building-fails is enabled
      - go to [c] if --try-next-conversion-plugin-if-building-fails is enabled
      - go to [p]
[t] select next test plugin
    - if there are no more, go to [r]
[tt] run tests for selected test plugin:
    - if test result is less than 50%:
      - if --try-next-converter-if-tests-fail is enabled, go to [c]
      - go to [p]
    - go to [t]
[r] add to the repository

Thanks to storing results in a database, I was able to determine the
most frequent causes of plugin failures. On this basis, I've improved the
implementation of plugins (particularly: convert plugin’s “post_process”
method) which significantly increased their effectiveness. Still the most
troublesome problems are missing build dependencies, but I'll work on that
(by adding new build plugin: pbuilder and/or sbuild and improving build
dependency detection in conversion plugins).¹

I didn’t want to confuse users with too many options, but as the work
progresses, an increasing number of settings seemed to be useful from the
point of view of a future user:

* --config (path to the config file) - seems trivial, but because of the way
  Python imports modules, a combination of config settings and arguments
  given in the command line requires... considerable creativity to supply
  correct current values²
* --pyversion (version of Python which packages have to support³) - classifiers
  system implemented in the PyPI repository, is not correctly used by many
  developers, so previously obvious choice of PyPI XMP-RPC methods to download
  list of desired packages didn't bring expected results and since I think the
  more converted packages, the better, I had to resort to some tricks to
  efficiently download information about many packages. Currently for Python 2
  - I'm able to retrieve information from PyPI about ~16 000 packages (of
  about 23 thousand available) and about almost all that support Python 3.
* --packages (convert only requested packages) - this is my favorite option, it
  makes debugging a lot easier. Properly implementing it cost me a lot of work
  (due to problems with appropriate queries to the XML-RPC and reasonable
  searching for the tarballs stored on the disk). Problem that has accompanied
  me for a long time was about which set of package names to operate on: the
  original PyPI ones or those already adapted to Debian's Python Policy
  requirements. I am really happy with this option, I can use it to convert
  newly released version of given package and update repository rapidly, it
  checks if requested package/version is available for selected Python
  interpreter, etc.
* --converter, --builder, --exporter - they allow user to select which
  (available) plugin(s) should perform given action. The default plugin order
  is based on priorities set by plugin authors and availability of required
  tools ("is_usable" method)
* --force, --force-conversion, --force-build - gives plugins another try
  to convert/build package/version pair (if a plugin failed once, it's
  skipped by default for given package/version)
* --try-next-converter-if-building-src-package-fails,
  --try-next-conversion-plugin-if-building-fails (still looking for
  better names ;-) - these options force build machine to do a bit more
  work, but eventually it's possible that more packages will be
  available in the repository due to this additional work.

Thanks to all these options, program can resume its activities for a
particular package at any moment: the tool checks the status of package in
database, checks what files are already on disk, verifies if package was
already converted, built and exported and skips these steps if necessary
(unless user forces it to redo the work). I think that's a reasonable

To fulfill the objectives of the program and ideas developed in the course of
its implementation, here's what I want to do next:
* implement some tests plugins (lintian, lintian4py (with pyflakes), ...)
* implement more advanced export plugins (mini-dinstall, reprepro)⁴
* an option to provide HTML logs from all actions (per plugin, per author,
  per package, etc.)
* improve conversion tools to support Python 3 packages
* testing, fixing bugs
* improve PyPI XML-RPC methods to be more efficient for tool's needs

Piotr's comments:
[¹] This means Debian would benefit from fixing #652617, anyone cares to
    provide a patch?
[²] Natalia spent some time figuring out why default config values were not
    updated with the ones from command line
[³] i.e. python-foo vs. python3-foo packages
[⁴] simple dput to external repo would be handy as well
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: Digital signature
URL: <http://lists.alioth.debian.org/pipermail/soc-coordination/attachments/20120730/b2349efb/attachment.pgp>

More information about the Soc-coordination mailing list