[Soc-coordination] Final report: Semantic Package Review Interface for mentors.debian.net
Clément Schreiner
clemux at mux.me
Sun Aug 19 22:29:06 UTC 2012
1 Short summary:
-----------------
My project aimed to gather metadata about packages submitted to
mentors.d.n by new contributors, and recommend them sponsors to help
them get their packages into debian. To achieve my goals, I had to
deeply refactor the package importing procedure and metadata storage.
This allowed for integration of debtags heuristics and matching with
similar packages, which can now be used for finding potential
sponsors.
2 Recent work
--------------
2.1 Plugin API
===============
I further improved the plugin API. Maybe I shouldn't have and finished
the semantic metadata stuff instead, but I wanted to be sure I could
store data from semantic plugins properly, so I would not have to
rewrite them later. Moreover, having a good way to import metadata
from a package, store it into the database for easy later retrieval
was key requirement for my project.
2.1.1 Various changes to make the plugins' code less verbose
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
- the PluginResult subclasses now guess their 'entity name', used in
the inheritance scheme to associate a SQL table with the right
model
- To define a PluginResult model as the result of a QA test, we use
the new decorator 'test_result'. It sets an attribute to the class,
that will be checked by the plugin when loading the model.
I should explain what I mean by 'test result': QA plugins typically
determine whether a package passes or fails some test. For example:
the package is lintian clean / has lintian warnings; the bugs in
the changes file's 'Closed-Bugs' section really belong to the
package or not, etc.
If needed, the test results' models can return data from other
models (for example, the lintian plugin defines two models:
LintianTest, the test's result, and LintianWarning, for
representing a tag as reported by the ``lintian`` program.
- I wrote another decorator, ``importercmd``, which decorates plugin
methods to make the importer (or, later, a controller) call them
when importing data from a package
2.1.2 'Property factories'
~~~~~~~~~~~~~~~~~~~~~~~~~~~
PluginResult models can now declare ``fields``, using
automatically-generated-properties. For example, the function
``bool_field will return a property for reading/writing a field as a
boolean, instead of explicitly using the underlying string. Currently,
bool_field, string_field and int_field.
I call these functions 'property factories', but I need to find a
better name for them.
Let's see a very simple and stripped down example: the model for the
``native`` plugin (which determines whether the package is [[native or
not).
@test_result
class NativeTest(PluginResult):
is_native = bool_field('native')
def __str__(self):
return 'Package is %s native' \
% ('' if self.is_native else 'not')
The ``bool_field`` function, defined in debexpo.plugins.api, is
roughly equivalent to this property:
def fset(instance, value):
instance['native'] = 'true' if value else 'false'
is_native = property(
lambda self: True if self.get('native', 'false') else False, # getter
fset) # setter
Previously, the writers of plugins had to write a 'is_native' method
decorated with @property, and explicitly coerce the string into a
boolean. This was especially cumbersome if they also wanted a setter
for coercing a boolean back into a string.
2.1.3 Port existing plugins to the new API
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This was longer than I expected, but I don't think it was a waste a
time. The results from some of these plugins will have to be taken
into account when recommending a sponsor to an uploader, and with my
changes the data is now easy to retrieve.
2.1.4 Almost done: trivial to finish
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
- removed the plugin configuration switches and make
'debexpo.plugins' a packages with several modules, for example:
+ ``qa`` for QA tests
+ ``post_upload`` for various actions done before any data is
imported from the package (this is the current name, but I think
we need to find a less ambiguous one)
post_upload, etc.
+ ``semantic`` for semantic metadata extraction
- action plugins that can be run before the package has been imported
(getorigtarball should be one of those). They would replace some of
current plugin types with ambiguous name like ``post_upload``,
``post_upload_to_debian``, ``post_successful_upload``.
- allow plugins to be run outside the importer, for refreshing data
-> I'm not sure when. Maybe with a cron tab (either a cronjobs
system as used by the debexpo worker, or with small scripts that
could be installed in the system's crontab)? Or maybe through a
specific controller, called after certain user actions.
e.g.: after the user has edited a package's tags, the sponsor
recommendation plugin should be called again
2.2 Sponsor recommendation
===========================
- in the package's page, after the results from QA test, potential
sponsors could be displayed in table. Not very useful in its
current state, though.
3 Final assessment
-------------------
I have not managed to implement all I had intended to, here's a summary:
3.1 Successful
===============
New plugin system: This API makes it possible to store data 'in an
almost declarative way' [I need a better qualifier for that] for
the results of plugins, and make it accessible outside the
plugin. With a little more magic code, some plugins won't need to
have their own templates anymore.
Debtags plugin: using debtags heuristics, find tags associated to
the package
Similar packages plugin: making use of apt-xapian-index and
debtags, matches a package with similar ones already in Debian.
Also usable for finding sponsors.
Small but non-negligible detail, my work's documentation: I have
written and kept up-to-date comprehensive docstrings for all new
objects and methods (and some existing ones). This will not
generate a perfect documentation, but improving it should be easy
and will mostly be a matter of formatting.
3.2 Unsuccessful, or not finished / needs polishing
====================================================
Debtags: I had planned to write new heuristics to gather a richer
set of metadata for uploaded packages, but I did not
have the time.
Sponsor recommendation: this was the ultimate goal of the
project, and it will not be ready on the final deadline (not
sure it's really a failure, though, because the new plugin
architecture should make it easy to improve my proof-of-concept
code).
Semantic metadata querying: I have not designed a nice UI for
browsing through the packages' metadata.
Documentation: Most of the code has good docstrings, but they
probably are not formatted correctly for sphinx
and they could be improved so that the arguments
and return types are explicitly stated. Also, I
wanted to write a few HOWTOs (writing new plugins,
adding a new model to debexpo's database, ...)
3.3 What I gained thanks to the Summer of Code
===============================================
My work has been useful debexpo/mentors.d.n and Debian in general (or
at least, I hope it did!), but it was also very positive for me:
First of all, I've learnt a lot about python development, particularly
about Python's object layer (inheritance, magic methods, attributes
access, among others). I also discovered nice techniques, for
abstracting pieces of code while keeping them readable (using
dictionaries, [namedtuples], iterators, first-class functions, etc.),
among others.
This project introduced me to the [Pylons] framewor and to the
wonderful [sqlalchemy] toolkit, and more generally to web development
and relational databases.
I am now more familiar with Debian and its packaging system, and I am
now motivated for fixing bugs in packages or creating new packages
when I miss something, instead of waiting for someone to do it for me
and installing software outside APT.
[namedtuples]:
http://docs.python.org/library/collections.html#collections.namedtuple
[Pylons]: http://www.pylonsproject.org/projects/pylons-framework/about
[sqlalchemy]: http://www.sqlalchemy.org/
4 This summer of code is over, now what?
-----------------------------------------
I will continue working on debexpo, and probably other (related) parts
of debian during the next months (and perhaps permanently? I like this
project).
My priority is of course to finish what I've started this summer:
4.1 GnuPG wrapper
==================
This was not really part of this summer of code project, but there is
not much work left and it has to be shipped to mentors.d.n soon:
In April I have started rewriting debexpo's GnuPG wrapper (see the
[git branch]) and I used it to add a 'Debian Machine Usage Policy'
agreement form to user profiles. I need to polish it, document it and
write tests. Then I will migrate debexpo's codebase to the new API.
Since I have learnt a lot about python since I wrote that wrapper, I
will be able to make it look nicer that it currently does.
[git branch]:
http://anonscm.debian.org/gitweb/?p=debexpo/debexpo.git;a=blob;f=debexpo/lib/gnupg2.py;hb=refs/heads/gpg-
rewrite
4.2 Plugin API
===============
- default template for very simple QA plugins
- New type of plugins, with their own controller, for viewing/editing
semantic metadata: debtags: the user should be able to verify and
correct the results from debtags heuristics similar packages: the
maintainer (or any reviewer?) should be able to remove a package
from the similar list, and that should be taken into account by the
sponsor-recommendation plugin feedback for sponsor recommendation:
"I'm not interested in sponsoring that package, remove me from the
list"
4.3 Semantic metadata, debtags
===============================
- work with Enrico Zini to make debtags' heuristics easier to use
outside debtagsd, and release them as a new library
- write a lot more debtags heuristics
- manage packaging teams, and associate each with a set of debtags,
for easily matching a package with potential teams
4.4 Sponsor preferences
========================
- extend the plugin system to allow writing small 'metadata plugins'
that can easily be used by sponsors to define their 'Sponsoring
preferences'.
- go through the [Sponsor Checklist] on Debian Wiki and the
preferences linked from there. Then write plugins to standardize
all of those, and make it easy to determinate whether a package
meets a registered sponsor's preferences. This shall be done in one
or more 'metadata plugins'.
- using the sponsor preference plugin, new maintainers will get
personalized advice for making their package ready for inclusion
in debian
[Sponsor Checklist]: http://wiki.debian.org/SponsorChecklist
4.5 Sponsor recommendation
===========================
The current sponsor recommendation is more a proof-of-concept than a
complete new feature and probably will not be very useful to new
maintainers. I need to improve the UI and the underlying algorithms.
4.6 Next months
================
During the summer, I got ideas for improving debexpo in other areas
than semantic metadata and sponsor recommendation. Some of them will
not benefit the project but others might. I will discuss them with the
rest of the team and implement them accordingly.
I probably will contribute to [debtags] too.
[debtags]: http://debtags.debian.net/
5 Conclusion
-------------
Thanks for reading, and many thanks to Google, Debian, my mentors and Debian's
GsoC admins for making this great experience possible.
More information about the Soc-coordination
mailing list