[Teammetrics-discuss] Phase I: The final parts.
Andreas Tille
andreas at an3as.eu
Sun Jun 5 21:02:57 UTC 2011
Hi,
On Mon, Jun 06, 2011 at 01:24:27AM +0530, Sukhbir Singh wrote:
>
> 1. Storing which lists have been downloaded and thus preventing lists
> from being downloaded again. Suggestions are welcome on this. Is using
> a conf file that maps list-names to SHA1 checksums OK?
I'm afraif I do not understand the question correctly. The *names* of
the mailing lists are in the config file and I do not see a need to keep
the names as a hash sum (in addition). IMHO we only need to store hash
sums of the mbox files to not parse them again.
> 2. Logging support. I know this has been delayed somewhat, but I am
> not proceeding without this.
OK.
> 3. For fetching the list of the current month to be parsed:
>
> Earlier I thought that for the current month, Mailman stores the list
> in plain text instead of gzip archive. That's not correct. What in
> fact happens is if the list has a very small size (perhaps 1 KB?), it
> stores it in plain text then as the size grows, it puts them into a
> gzip archive. And so, we do not parse mailing lists of the current
> month at all. Then probably when the script runs the next time or the
> next month, the list of the last month can be parsed. This is better
> instead of parsing an incomplete month, as discussed with Andreas
> before.
I'd suggest to simply check whether a download file is gziped and either
have them all gzipped before processing or unzip them all to have a
unique handling of the parsing routine.
> 4. Spam filter. I have not looked carefully into this, just a glance
> at Andreas' code but I will be implementing it as the final step.
OK.
> 5. And then lastly, pushing the information into a database.
>
> That's the action plan for this week and I hope I can finish stuff
> quickly. Suggestions as always are welcome.
Make sure you have a look into
svn://svn.debian.org/svn/blends/blends/trunk/team_analysis_tools/archives.sql
IMHO there is no real need to change the database structure (but if you
see room for new fields and enhance this - feel free to do so).
Kind regards
Andreas.
--
http://fam-tille.de
More information about the Teammetrics-discuss
mailing list