[Soc-coordination] Third report: Semantic Package Review Interface for mentors.debian.net

Clément Schreiner clement at mux.me
Mon Jul 2 17:40:43 UTC 2012


Hi,

this is the third bi-weekly report on my Summer of Code project
'Semantic Package Review Interface for mentors.debian.net'.

My project aims to extract metadata from packages submitted to
mentors.d.n[1], and use this data to match a mackage with a potential
sponsor. Since a lot of packages get stuck in the mentoring process
because their maintainers have difficulty finding a sponsor, this
should ease their entering the Debian process.

The last two weeks, I have been working on improving debexpo
importer's plugin API[2] so I can import data into the DB when a package
is uploaded, instead of computing data on the fly when a package page
is visited.

Importer plugins are run when a package is uploaded to mentors.d.n and
add package metadata to the database, for example the lintian or
debian QA status, the bugs closed by the package, etc. Currently, this
information is stored in a non-standardized way, which prevents us
from easily accessing it outside the plugins' html templates. Data is
serialized into JSON objects and stored as is into the database.

With Nicolas' help, I have worked up a new database model[3] for this
information and started updating current plugins to this API,
improving/simplifying the current model, and removing the need for
JSON data.

I think the API is (almost) complete, but I haven't been able to test
it out, because plugins need to access objects (for example,
PackageVersion) that are not currently accessible to the importer at
the time it calls them. Understanding and editing the importer logic
is not easy, because it consists of a single python class, with most
of the work done in two 150-lines-methods, mixing access to the DB,
checks and local repository management

Since I had to move stuff around and my project is closely related to
the importer, I have started refactoring the Importer class to make it
more easily maintainable.



My progress has been way slower than I (and probably my mentors,
although they haven't said anything yet) expected of me. The plugin
API redesign and update of existing plugins should have been ready for
use in metadata extraction last week. I think the main reason is that,
even though I started with writing up a plan first, I unnecessarily
changed it while writing code. At least twice, I threw away stuff when
I thought I had to redesign the model, instead of quickly implementing
my first plan and improving it progressively. Also, because I changed
two many things at the same time, I did not have regular feedback in
the form of tests in debexpo, which has not helped with motivation and
productivity.

In this sense, my work on the importer refactoring has been better:
I've started with a dummy class for managing a new upload's data[4],
with only docstrings in methods, and small comments in the importer
code where I thought refactoring was needed. This way, I can make
small changes and testing along the way each time I commit, which
helps me focusing on small things, thus writing code faster. I now
feel confident I can finish this refactoring tomorrow, including the
new plugin system changes. I might be wrong though, especially since
Hofstdadter's Law[5] has held every step of my gsoc until now.

I have freed my evenings/nights for this week so I can make up for the
time I lost on plugins rewriting; I'd like to get real results for the
actual 'metadata extraction and sponsor recommendation' part before we
hit the mid-term evaluation deadline.

I'm also considering writing nose tests when I write stuff that can't
be immediately integrated into debexpo's codebase, which would provide
me with fast feedback and small tasks to complete one after the other.

[1] [http://mentors.debian.net/]

[2] [http://anonscm.debian.org/gitweb/?p=debexpo/debexpo.git;a=blob;f=debexpo/plugins/__init__.py;h=d1d2dfba124f889637db3cf9696858bc87edd800;hb=devel]

[3] [http://anonscm.debian.org/gitweb/?p=debexpo/debexpo.git;a=blob;f=debexpo/model/plugin_results.py;h=6ee3f044f69b841f7c8d46a981bd256427ddb6e9;hb=plugin-api]

[4] [http://anonscm.debian.org/gitweb/?p=debexpo/debexpo.git;a=blob;f=debexpo/importer/upload_data.py;h=6e5a3e8832f5536abcf21d8c80ebb3f5910dab02;hb=new-importer]

[5] "Hofstadter's Law: It always takes longer than you expect, even when you take into account Hofstadter's Law." -- Douglas Hofstadter, Gödel, Escher, Bach: An Eternal Golden Braid



More information about the Soc-coordination mailing list