[Teammetrics-discuss] Phase I: The final parts.

Andreas Tille andreas at an3as.eu
Sun Jun 5 21:02:57 UTC 2011


Hi,

On Mon, Jun 06, 2011 at 01:24:27AM +0530, Sukhbir Singh wrote:
> 
> 1. Storing which lists have been downloaded and thus preventing lists
> from being downloaded again. Suggestions are welcome on this. Is using
> a conf file that maps list-names to SHA1 checksums OK?

I'm afraif I do not understand the question correctly.  The *names* of
the mailing lists are in the config file and I do not see a need to keep
the names as a hash sum (in addition).  IMHO we only need to store hash
sums of the mbox files to not parse them again.

> 2. Logging support. I know this has been delayed somewhat, but I am
> not proceeding without this.

OK.

> 3. For fetching the list of the current month to be parsed:
> 
> Earlier I thought that for the current month, Mailman stores the list
> in plain text instead of gzip archive. That's not correct. What in
> fact happens is if the list has a very small size (perhaps 1 KB?), it
> stores it in plain text then as the size grows, it puts them into a
> gzip archive. And so, we do not parse mailing lists of the current
> month at all. Then probably when the script runs the next time or the
> next month, the list of the last month can be parsed. This is better
> instead of parsing an incomplete month, as discussed with Andreas
> before.

I'd suggest to simply check whether a download file is gziped and either
have them all gzipped before processing or unzip them all to have a
unique handling of the parsing routine.
 
> 4. Spam filter. I have not looked carefully into this, just a glance
> at Andreas' code but I will be implementing it as the final step.

OK.

> 5. And then lastly, pushing the information into a database.
> 
> That's the action plan for this week and I hope I can finish stuff
> quickly. Suggestions as always are welcome.

Make sure you have a look into

   svn://svn.debian.org/svn/blends/blends/trunk/team_analysis_tools/archives.sql

IMHO there is no real need to change the database structure (but if you
see room for new fields and enhance this - feel free to do so).

Kind regards

     Andreas.

-- 
http://fam-tille.de



More information about the Teammetrics-discuss mailing list