[Piuparts-devel] piupartsreport timings (was: Re: Bug#698526: Sort known issues by reverse dependency count)

Andreas Beckmann anbe at debian.org
Thu Feb 21 18:05:27 UTC 2013

On 2013-02-21 15:55, Dave Steele wrote:
> sorting was the single original goal). I agree that .tpl's are
> obsolete, but that wasn't an overriding goal for me, and not necessary
> to get issue logic out of piuparts-report. There's no significant
> performance issue.

Timings ... from my noon run, I think the machine was rather idle (i.e.
the slaves were idle)

12:00 start
14:37 finish

piuparts-analyze highly depends on the speed of the BTS today (and the
number of logs in */fail/)
a rewrite to process all sections from one instance and *cache* the bts
responses (we query always the same packages and their bugs - what else)
could speed this up.
But not really a performance critical issue either, as we can skip this
if we want "fast" reports.

As it looks now detect_well_known_errors is not really a performance
issue for me either, even if it usually takes 30 minutes if there are
more logs to be processed. It's only a bit nasty that I can't run
piuparts-report sensibly if the .tpl files are not there, so this costs
30 minutes startup time for piuparts-report

It may take a very long time for --recheck-all, but that I usually don't

So lets have a look at piuparts-report:

12:40 total: 38476
12:40 source: 18358
12:40 Writing package templates in sid/main
12:48 Writing maintainer summaries in
12:49 Writing section index page

12:49 total: 38476
12:49 source: 18358
12:49 Writing package templates in sid-lo/main
12:54 Writing maintainer summaries in
12:55 Writing section index page

the first one was a bit slow, the second looks better
writing 18000 files in 300 seconds at 60 files/second looks not that bad

86 sections have 305755 source packages in total
305755 source packages in 7200 seconds is still 42 files per second
(and we are writing 69000 maintainer summaries, too, so 52 files per
second in total, still ignoring a few summary, state, ... pages).
Since we are not running under eatmydata each file written will cause
some I/Os for both file data and metadata and regular HDDs probably do
something like 100 (perhaps 200 nowadays) IOPS, I haven't benchmarked
that model ... so this still looks somehow sane and not really
optimizable. The optimization that could be applied here is hashing the
page content, storing the hashes and skipping the creation of a file if
contents don't change (as will be the case for most of the 18000 source
summaries in sid).

OK, lets take a look at piatti, too:

00:00 start
Thu Feb 21 00:22:13 UTC 2013
Thu Feb 21 00:34:15 UTC 2013
00:46 Running section lenny2squeeze

00:35 total: 38476
00:35 source: 18358
00:35 Writing package templates in sid
00:36 Writing maintainer summaries in /org/piuparts.debian.org/htdocs/sid
00:36 Writing section index page

Hmm. Interesting. why is that so much faster there?


More information about the Piuparts-devel mailing list