[Popcon-developers] Traces for P2P pub/sub project

Bill Allombert Bill.Allombert at math.u-bordeaux1.fr
Mon Jun 15 15:37:52 UTC 2009


On Mon, Jun 15, 2009 at 10:16:52AM +0200, Spyros Voulgaris wrote:
> Dear PopCon developers,
> 
>   I am an assistant professor in CS at the Vrije Universiteit 
> Amsterdam, and I am running a research project with a student on a P2P 
> pub/sub system targeted at linux package maintenance. The goal is to 
> form a P2P network that will disseminate package updates to _all_ users 
> that have the corresponding packages installed and _only_ them. This 
> would substantially offload the servers, and would provide for an 
> autonomous dissemination framework in place of a hardcoded set of mirrors.

Hello Spyros,

I think pushing the use of P2P to every bandwidth-intensive network
transfer is a good idea. However did you consider the security
implication of allowing computer A to know which packages are installed
on computer B ?

Maybe you are working in a model where B is trusting A however 
popularity-contest is not.

>   In this context, I would like to ask you if we could get hold of your 
> raw, pre-aggregated traces. That is, the listing of installed packages 
> explicitly listed _per_ _user_. Of course, any personal information you 
> may be collecting (user name, IP, etc.) can be anonymized, we just need 
> an arbitrary user ID. In fact, package names can also be anonymized if 
> necessary, although we would prefer not.

Due to the above security implication (and basic privacy expectation of
popularity-contest users) it is not possible for us to publish a
per-user list of packages and unfortunately it is not possible to
anonymize in a information-theoretic safe way the packages and the
users.

Suppose package #123 has 141 installations and 121 votes: you can just
look up the aggregated popcon result and get a small list of canditates
packages so it is not really anonymized

Suppose that all systems which have foo and bar installed 
also has baz installed. If you guess that a popcon submitter has 
foo and bar installed, then you deduce they also have baz. 

But surely you are interested by some statistic on the data rather than
on the data itself. Maybe you could give us what statistic you want and
a Debian developer could compute it for you without giving access to the
data. The developer would have to check the output would not breach
privacy.

>  This would allow us to perform simulations on clustering users based
>  on packages they have (particularly the least popular ones), so that
>  package updates can be propagated in targeted groups. Of course we will
>  keep you up to date for any findings, and if this strives we would be
>  more than happy to see it incorporated in the Ubuntu package management
>  system!

Well we have nothing to do with the Ubuntu package management system.

Cheers,
Bill.



More information about the Popcon-developers mailing list