[Teammetrics-discuss] Error in liststat.py

Sukhbir Singh sukhbir.in at gmail.com
Sat Aug 6 06:59:43 UTC 2011


Hi,

> Did you run it on blends.debian.net?

No, on my computer.

>   - blends.d.n is running stable - you mighth run something else

$ cat /etc/debian_version
6.0.2

>   - it might need certain locale to parse a string which might
>     by chance work on your system

Yes, I have locale support for some Asian languages. And the subject
can be in that, it's possible ;)

> 2011-08-05 20:49:30,474 INFO: Parsing: pkg-java-maintainers-2011-May

At least now we know which mailing list is the culprit. I will investigate.

> When doing so I realised that *all* mboxes from alioth are fetched from
> scratch as *.gz.  The import process just leaves the uncompressed
> mboxes.  I wonder whether there is a chance to "gzip.open('file.gz')" as
> described in[1] to transparently work on gzip files.
>
> I'm also starting to wonder if we always need to download mboxes from
> past years.  Well, there could be some removals from SPAM removal

This led me to think -- why do we need the mbox archives saved locally
at all? Don't you think it would be better if after creating them and
parsing them, we should delete them? Because they are not used
afterwards in anyway. Even for hash comparison, the hashes are used
from the hash file (lists.hash) and the cached mbox archives have no
role to play once their SHA1 is calculated.

This will take care of two problems:

1. We can open a mbox as a string as: `string = gzip.open(file)` (what
you suggested).
2. No need for compression.

Your thoughts?

> Also the output is identical.  Just do your test on blends.debian.net.

I am going to do that.

And some good news: Lars from Gmane says that NNTP puts the least load
possible and we can go ahead with that. So we were right :)

I will also look up 'lists.debian.org' this weekend.



More information about the Teammetrics-discuss mailing list