[Soc-coordination] Eleventh report: Provide some metrics in Debile

Clément Schreiner clement at mux.me
Sun Aug 3 23:46:45 UTC 2014


Hi, this is the eleventh weekly report on my Summer of Code project 'Provide
some metrics in Debile'[1].

(Previous report:
http://lists.alioth.debian.org/pipermail/soc-coordination/2014-July/002237.html)


coccinelle check
----------------

I met Matthieu Caneill on IRC, and we managed to fix the coccinelle check. We
found the source of the problem in the changelogs for coccinelle version
1.0rc20. That version introduced backwards-incompatible changes in the
syntax of scripts. Matthieu updated the firehose patches[2], and the
coccinelle check now runs fine.[2.5]


uniquify performance
--------------------

Also with Matthieu (he originally wrote the uniquify function), we
looked for a better way to improve its performance. Rather than using
the same algorithm and making it faster, I will write a new one with
better SQL requests. 

Some explanation first: a firehose result is actually a tree of
results. In addition to the list of errors found in the analysis, there
are elements that give the position of the error in a source code file
(line/column, function, etc.) and various other information. For some
results, this tree can become very large, with numerous elements that
are duplicated (for example functions, files, positions in files, etc.).

The goal of uniquify is to ensure we only add each element once, thus
reducing the size of the database. In order to do that, it calculates a
hash of each element (well, the actual calculation is done in another
function, idify[3]), then looks up for that hash in the database, and
insert the element only if it isn't in the database already.


This is currently done in awkward way because SQLAlchemy doesn't allow
to create an object only if it doesn't exist already. I'll have to
bypass the ORM and do something like this:
http://www.the-art-of-web.com/sql/upsert/#section_4

I'm still not sure how to go about that. As far as I currently
understand, I will need to write a sql statement for each type of
firehose object, though I guess I'll find a nice solution by digging
into sqlalchemy's documentation a little further. (And also by asking
for help, since I don't do that often enough.)


create jobs for new checks / new versions of checks
---------------------------------------------------

Once uniquify (and thus debile-incoming) is able to process all firehose
objects generated by debile's checks, there is still something to do
before we can consider debile production-ready.

While I added a command of adding new checks in the beginning of the
summer (before that we could only enable checks when we initialized
debile-master), this is not enough. Indeed, we want debile to
 
 - create jobs for existing packages when a new check is added

 - create jobs for existing packages when a check has been modified (new
 release of the analyzer, or improvement in the debile plugin, etc.)

This will need some refactoring in debile.master.orm.create_jobs()[4]. I
want to be certain I won't break anything, and to that end I want
to write unit tests for that function (debile desperately needs more unit
tests anyway, imho).

I've had some difficulty with setting up the basis of those tests, but I
managed to get somewhere and all I need now is to actually test
something.[5]

(my most recent (not pushed) code uses factory_boy[6] for easier objects
creation. It seems really useful and nice to use, but I'm currently
having problems integrating it with the sqlalchemy's session correctly.)


Thanks for reading,

Clément



[1]
[https://wiki.debian.org/SummerOfCode2014/StudentApplications/ClementSchreiner]
[2] [https://github.com/coccinelle/coccinellery/pull/4]
[2.5]
[http://anonscm.debian.org/cgit/pkg-debile/debile.git/commit/?id=1247a6b38d9b44f1817f5d8c530e35ba207c76a1]
[3]
[https://github.com/Debian/firewoes/blob/master/firewoes/lib/hash.py#L47]
[4]
http://anonscm.debian.org/cgit/pkg-debile/debile.git/tree/debile/master/orm.py#n865
[5]
[http://anonscm.debian.org/cgit/pkg-debile/debile.git/commit/?h=tests_orm]
[6] [https://factoryboy.readthedocs.org/en/latest/]



More information about the Soc-coordination mailing list