[Soc-coordination] Twelfth report: Provide some metrics in Debile

Clément Schreiner clement at mux.me
Wed Aug 13 20:54:04 UTC 2014



Hi, this is the twelfth weekly report on my Summer of Code project 'Provide
some metrics in Debile'[1].

(Previous report:
http://lists.alioth.debian.org/pipermail/soc-coordination/2014-August/002253.html)

uniquify
--------

My implementation of uniquify is finally functional. I bypassed the ORM
entirely so I could do "insert into... where not exists..." statements
(this took most of last week).

With my usual test firehose file, the new uniquify is twice faster. With
larger files, the time increases linearly, which is much better than the
original implementation. For example a file twice as large as my initial
test file results in x2 time instead of x4; I haven't tested the
original uniquify for much larger files since it would have taken too
much time on my laptop.

debile-incoming has now been been able to import the huge firehose files
(10-40MB) that would originally make it go OOM after a dozen hours of
work.

There is still room for improvement, as suggested by the call graph[2]
generated by the python profiler and gprof2dot[3]. Indeed, only 20% of
the time is spent executing SQL statements, while 60% is spent on
creating those queries. However, since it is usable, I'm now focusing on
lauchning the full rebuild, since the end of the GSoC is approaching
very fast.


full rebuild
------------

When I did what was supposed to be the last test build before lauching
the full rebuild, we realized a new release of dpkg broke our use of
sbuild[4]. I've started looking into sbuild to patch it, but I'm still
not sure exactly how to fix that.


Thanks for reading,

-------
Clément




[1] 
[https://wiki.debian.org/SummerOfCode2014/StudentApplications/ClementSchreiner]
[2] http://www.mux.me/debile/profile.png
[3] https://code.google.com/p/jrfonseca/wiki/Gprof2Dot
[4] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=757795



More information about the Soc-coordination mailing list