[Teammetrics-discuss] Web Archive Parser for lists.d.o.

Andreas Tille andreas at an3as.eu
Tue Nov 22 21:34:27 UTC 2011


Hi Sukhbir,

On Tue, Nov 22, 2011 at 05:50:44PM +0530, Sukhbir Singh wrote:
> `git pull` should present you with archiveparser.py. It's ready to the
> point of parsing the message, but I have not implemented it for reason
> which follows.

:-)
 
> Now, you had made an interesting point that we should take into
> account the dates in which the message was sent and *not* the date
> from the 'Date' header. We can easily do this in the web archive
> parser, but what _day_ should we save then? I know the day doesn't
> matter much (!), but we have to save it (and we did it for lists on
> Alioth and the one using NNTP), so if we set the date to the month in
> which the message appeared, what day should we assign to it?

I would like to check whether the date in the mail just fits the momth
of the archive.  If yes, just use this date.  If the month-year given in
the mail does not fit the archive location we could choose one of the
following options:

  1. (simple) Just use a fixed date like 1. or 15. of the given month.
  2. (pseudo-random) Just pick the day from the mail header and use month
     and year.  Assuming that the errors in broken time stamps are
     statistically spread our choosen dates are as well.
  3. (advanced) Choose the date of the previose / next message.

All options are fine for me because each of them works for our purpose.
Just pick your prefered one.

Kind regards

       Andreas.

-- 
http://fam-tille.de



More information about the Teammetrics-discuss mailing list