[Piuparts-devel] Bug#698526: Sort known issues by reverse dependency count

Thu Feb 21 09:24:30 UTC 2013

[ Hint: While replying to the BTS, delete the [Piuparts-devel] marker
from the subject as well as any duplicate bug numbers. ]

On 2013-02-21 03:09, Dave Steele wrote:
> On Wed, Feb 20, 2013 at 8:42 PM, Dave Steele <dsteele at gmail.com> wrote:
>> On Mon, Feb 18, 2013 at 5:44 AM, Holger Levsen <holger at layer-acht.org> wrote:
>> ...
>>>
>>> these are quite some different changes, can you please isolate the commits for
>>> "Sort known issues by reverse dependency count" and rebase them onto current
>>> develop?!
>>
>> The new serial branches sort-issues-by-rdep and
>> sort-issues-by-rdep-fast are separated from the rest of the work, and
>> rebased to develop.

Hi,

this work looks really promising and I'm curious to try it some day on
my instance.

But as I wrote before there is no need to reimplement the .tpl
generation in python. Instead these intermediate files should go away
and the html generation should be moved directly into piuparts-report.
There will be a package db available.
I think this "requirement" to generate .tpl externally dates back to the
time when all logfiles were grepped daily, i.e. before we remembered the
results in .kpr.

Even if .kpr generation can be sped up significantly, I don't think I
want to run this from inside piuparts-report. Just like piuparts-analyze
(that takes 30-60 minutes for my instance) this is something that will
continue to be run from the generate-piuparts-report driver script ...
and having it sped up by a magnitude will decrease my hesitation to run
it with --recheck-all.
Also if the .tpl files are gone, we can actually run piuparts-report
without running piuparts-analyze or detect_well_known_errors directly
before it.

And about speeding up the "grepping" - wouldn't it be even faster if we
can run multiple regexes at the same time on the input - either by
'ORing' them together or passing a list to re or ... then we would just
need to figure out which one has matched ... (No, I haven't tried
anything like this, but I'm considering testing this with the multiple
grep calls in detect_piuparts_issues.
  grep -lE '(foo)|(bar)|(f[o0]{2}bar|baz)'
should be significantly faster than
  grep -l foo
  grep -l bar
  grep -lE 'f[o0]{2}bar|baz'
And there we only care about 'any match' disregarding which matched.
Or am I mistaken here?

Andreas