[Popcon-developers] analyzing popcon data for bogus recommends

Enrico Zini enrico at enricozini.org
Wed May 14 09:29:16 UTC 2008


On Tue, May 13, 2008 at 10:51:37PM -0400, Joey Hess wrote:

> It would be nice to have a list which Recommends are ignored/overridden
> the most when installing packages, to identify Recommends that need to be
> downgraded to Suggests. Could we derive such a list from popcon data? I
> think it would need to be done by analyzing each individual popcon data
> submission, so I can't do it as that data is not published.

Yes you can.  Also, there's a xapian database in my home directory
(~enrico/anapop/something IIRC) on people.debian.org that is built with
the popcon data, and you can query that database to quickly get a count
of "submissions having package X AND NOT package Y" and "package X AND
package Y".

That Xapian index indexes popcon submissions as if they were
"documents", and installed packages as if they were "terms".

The database is updated using a weekly cronjob that rescans the whole
popcon database.  I've quickly tried in the past[1] to come out with
ways to hook the indexing process into popcon so that I could do
realitime indexing of the data (it gives an up to date index and doesn't
suck 100% cpu on gluck once a week), but I got the impression that it
required having more discussion than I was motivated to have at the
time.  If more people are interested in using that xapian index, it can
make sense to rehash this.


Ciao,

Enrico

[1] http://lists.alioth.debian.org/pipermail/popcon-developers/2007-June/001374.html
-- 
GPG key: 1024D/797EBFAB 2000-12-05 Enrico Zini <enrico at debian.org>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
Url : http://lists.alioth.debian.org/pipermail/popcon-developers/attachments/20080514/78d66961/attachment.pgp 


More information about the Popcon-developers mailing list