[Reproducible-builds] Better popcon stats for source packages

Bill Allombert ballombe at debian.org
Mon Jun 13 13:42:30 UTC 2016

On Mon, Jun 13, 2016 at 03:17:18PM +0200, Ximin Luo wrote:
> Hi,
> At Reproducible Builds we just added popcon stats to our issues page,
> to help us better understand which issues to prioritise:
> https://tests.reproducible-builds.org/debian/index_issues.html
> However, we work on source packages, but popcon data is based on
> binary packages. This means that that page is currently very
> inaccurate for some packages - for example it thinks "linux" has a
> popcon score of 6.

Hello Ximin,

I could not find anywhere where we report that linux has 6 installs.

> Popcon does provide stats for source packages at http://popcon.debian.org/source/by_inst
> however, these stats are basically useless - the "score" for each
> source package, is simply the sum total of the scores for the binary
> packages produced by that source package. This is *not* the correct
> way to calculate "popularity" for a source package, since it is
> heavily biased in favour of source packages with many binary packages
> that must be co-installed.
> What we really want is the statistic "number of people that have
> installed binary-package-1 OR binary-package-2 OR .. OR
> binary-package-n". It is mathematically impossible to calculate this
> from the data that popcon is currently providing at
> http://popcon.debian.org/, however fixing this is easy - we would
> simply need to change the backend to keep a separate
> "by-source-package" dump of data, that is based on set-union
> (logical-disjunction) and not arithmetic-sum.

Actually we provide two counts: from http://popcon.debian.org/

Statistics by source packages (sum) sorted by fields:

inst [gz] vote [gz] old [gz] recent [gz] no-files [gz] 

Statistics by source packages (max) sorted by fields:

inst [gz] vote [gz] old [gz] recent [gz] no-files [gz] 

The second set is probably close to what you want:

> I'd be happy to submit a patch for the popcon backend, but I could
> only find the client source code here:
> https://anonscm.debian.org/cgit/popcon/ Could you let me know how I
> could submit a patch for the backend?

The backend is at the same place in the 'examples' subdirectory.

But remember that the correspondance between source and binary packages
change with time.

Bill. <ballombe at debian.org>

Imagine a large red swirl here. 

More information about the Reproducible-builds mailing list