[Teammetrics-discuss] Web Archive Parser ready for your testing.
Andreas Tille
andreas at an3as.eu
Tue Jan 3 15:19:22 UTC 2012
Hi Sukhbir,
On Mon, Jan 02, 2012 at 12:34:31AM +0530, Sukhbir Singh wrote:
>
> The web archive parser resumes from the last message.
Yep. I noticed the file
/etc/teammetrics/archiveparser.conf
The only problem I have is the location. This files is actually no
configuration file (it is not intended to edit this file manally to
influence the web archive parser directly. So I'd vote for something
like
/var/cache/teammetrics/archiveparser.status
or something like this - feel free to find a better name.
> The checking
> 'algorithm' is very fast and I optimized it to the maximum so that it
> resumes after fetching the least possible data from lists.d.o. I like
> it.
Seem to work fine. I'm just running the parser and it fetches a lot of
mails in a short time. However, I do not see any records in listspam
table. Is this intended behaviour?
> Other than that, try it out for a few lists when you have time. Re-run
> it for a few times and see the log file output. You might notice the
> script takes time to finish on each run, that is because it calls
> updatenames.py on every run. But for our case since the script will
> run once a month, that will be no problem.
It's fine to run updatenames.py several times. Finally it does not
matter whether it runs say 1h per month longer (and it is not even that
long).
> I suggest you observe the log file output and that should serve your
> purpose well. Let me know if you want any changes to be made.
For the moment it looks good. I removed all data yesterday and currently
it is:
teammetrics=# SELECT project, count(*) from listarchives group by project;
project | count
-----------------------+--------
debian-amd64 | 27726
debian-boot | 155106
debian-curiosa | 4842
debian-devel | 309831
debian-embedded | 6808
debian-derivatives | 757
debian-isp | 2876
debian-accessibility | 4135
debian-gis | 4012
debian-firewall | 8872
debian-desktop | 2938
debian-edu | 23873
debian-custom | 2728
debian-ctte | 2931
debian-devel-games | 5356
debian-arm | 12978
debian-blends | 570
debian-i18n | 16617
debian-enterprise | 170
debian-devel-announce | 2526
So for the moment we just need to wait. After the run I will do an
UPDATE listarchives SET project = 'debian-blends' WHERE project = 'debian-custom';
because this list was renamed. The logfile looks perfectly normal
(however also here no sign of SPAM handling).
I will keep you updated after creating the graphs.
Kind regards
Andreas.
--
http://fam-tille.de
More information about the Teammetrics-discuss
mailing list