[Popcon-developers] Re: Raw data of popcon ?

Alain Schroeder alain@parkautomat.net
Sat Jul 9 14:14:01 UTC 2005


--k1lZvvs/B4yU6o8G
Content-Type: text/plain; charset=iso-8859-15
Content-Disposition: inline

On Sat, Jul 09, 2005 at 01:26:51PM +0200, Petter Reinholdtsen wrote:
[...]

>  - We can wash the complete data set to remove non-official packages
>    from the individual package lists, and then publish these lists.
>    These washed data should be without any information tracable to
>    individual contributors.

I attached two little scripts. The first (get-packages.sh) builds a 
package list of all packages in Debian using the current Packages files
from sarge. The second (cleaner.pl) removes all packages not apearing in
the Packages files. Just pipe reports to it and redirect the output into
another file.

> You mentioned that you wanted to build a recommendation system for
> debian packages.  This is a long time wishlist bug, #73603.  Please
> read it to learn more about what was proposed almost 5 years ago.

I had a DataMining Class in university just a few weeks ago and I
realized that this might be helpfull for Debian. At the moment I am
thinking of two methods: Market Basket Analysis which is a pretty easy.
It calculates rules like 
		IF (exim4, spamassassin) THEN sa-exim
with a probality.
The other is Clustering. It calculates groups and then depending on the
cluster you're in different packages can be proposed.

I now will start working with my own data set and I just required some
additional ones from my neighbors here at debconf5. :-)

Bye,
  Alain

--k1lZvvs/B4yU6o8G
Content-Type: application/x-sh
Content-Disposition: attachment; filename="get-packages.sh"
Content-Transfer-Encoding: quoted-printable

#!/bin/sh=0Atempname=3D`tempfile`=0Atemp2=3D`tempfile`=0A=0Afor i in alpha =
arm hppa i386 ia64 m68k mips mipsel powerpc s390 sparc; do=0A	wget "http://=
ftp.fi.debian.org/debian/dists/Debian3.1r0/main/binary-${i}/Packages.gz" -O=
 "$tempname"=0A	zcat $tempname | grep "^Package: " | sed -e 's/^Package: //=
' >> $temp2=0Adone=0A=0Asort -u $temp2 > ./packages=0A=0Arm $tempname $temp=
2;=0A
--k1lZvvs/B4yU6o8G
Content-Type: application/x-perl
Content-Disposition: attachment; filename="cleaner.pl"
Content-Transfer-Encoding: quoted-printable

#!/usr/bin/perl=0A=0Ause strict;=0A=0Amy %table;=0A=0Aopen (PACKAGES, '< pa=
ckages') or die ("packages not found. please run get-packages.sh first\n");=
=0A=0Awhile (<PACKAGES>) {=0A	chomp;=0A	$table{$_} =3D "1";=0A}=0A=0Awhile =
(<STDIN>) {=0A	if (m/^POPULARITY-CONTEST-0/i) { =0A		print "\n"; =0A		if (m=
/^POPULARITY-CONTEST-0.* ARCH:(\S*) POPCONVER:(\S*)/i) {=0A			print " $1 $2=
";=0A		}=0A	} else {=0A		m/^\d* \d* (\S*).*/i;=0A		if ($table{$1}) { print =
" $1"; }=0A	}	=0A}=0Aprint "\n";=0A=0A
--k1lZvvs/B4yU6o8G--




More information about the Popcon-developers mailing list