[Teammetrics-discuss] Converter for mboxes (Was: Debian mailing lists archives as mbox)

Andreas Tille andreas at an3as.eu
Wed Aug 31 07:49:50 UTC 2011


On Wed, Aug 31, 2011 at 06:17:41AM +0000, Cord Beermann wrote:
> Hallo! Du (Christian PERRIER) hast geschrieben:
> 
> > About 10-20 times more efficient than web-based methods...buty it
> > *requires* access to archives as mboxes *without* filtered headers
> > (otherwise, formail, mutt and crm114 would be confused).
> > 
> > So, indeed, mbox archives with filtered headers as...plain useless for
> > such purposes..:-)

To give some technical beckground into this filtering issue.  At

  http://anonscm.debian.org/gitweb/?p=teammetrics/teammetrics.git;a=blob;f=mbox-tools/mboxfilter.py

you can see which HEADERS are kept (IMHO sufficient to use the procedure
described by Christian) and there is a list of possible_HEADERS which
are not yet kept in the mbox file but which I could imagine to be of
some use while not avoiding some privacy.

To give some practical case what fields are deleted by the filter I
give a random example:

Return-path:
Received:
Received:
X-Original-To:
To:
User-Agent:
Sender:
X-Rc-Virus:
X-Rc-Spam:
X-Spam-Checker-Version:
X-Spam-Level:
Resent-Message-ID:
Resent-From:
X-Mailing-List:
X-Loop:
List-Id:
List-Post:
List-Help:
List-Subscribe:
List-Unsubscribe:
List-Archive:
Precedence:
Resent-Sender:
Resent-Date:
Resent-Bcc:
Delivered-To:

are currently removed.  Christian, could you pick fields from this list
which should be kept to be useful for the SPAM removal effort.  I
personally do not see anything in the list of removed fields which might
be a privacy issue but I would like to support listmaster to enable them
accepting some compromise which finally leads to more open access of the
mboxes.
 
> The nomination stage is something which we allow everyone who is able
> to use one of the available methods. 
> 
> The more important stage, the review, is only available to DDs and
> those people have access to the unfiltered mboxes (via master).

Just for the record:  There are people (including me) who do not see any
sense in hiding mboxes for general public on master.  Once we found a
reasonable filtering procedure I will start a discussion on
debian-project how we should handle mbox access to make the best use
out of this information.

> For the review there is currently only the Webinterface available.
> When i found the time to rewrite the whole thing, so that we also
> implement an automatic handling for review-methods 1 and 2, i also
> want to add some more voting methods.
> 
> I put that on my todo list.

Thanks for your work on this.
 
Kind regards

       Andreas.

-- 
http://fam-tille.de



More information about the Teammetrics-discuss mailing list