[Teammetrics-discuss] Updates.
Andreas Tille
andreas at an3as.eu
Thu Jun 30 20:51:08 UTC 2011
On Fri, Jul 01, 2011 at 12:38:14AM +0530, Sukhbir Singh wrote:
> name | frequency | rawlen | quotelen | blanklen | siglen
> ---------------+-----------+--------+----------+----------+--------
> Sukhbir Singh | 77 | 58673 | 998 | 1248 | 1248
> Andreas Tille | 46 | 66462 | 946 | 1590 | 854
> Scott Howard | 4 | 4318 | 48 | 91 | 91
Nice.
> As you can notice, 'siglen == blanklen' as Scott doesn't have a
> signature, it's just `~Scott` while Andreas and I do have one. That
> explains the difference in the `siglen` column and perhaps why it is
> important. I feel all the metrics are pretty conclusive for a mailing
> list. Rest you can observe. Here is a summary once again:
>
> rawlen -- total number of characters in the message body.
> blanklen -- total number of lines in the body excluding blank lines
Nitpicking: The name "blanklen" implies that we are counting the number
of blanks and not non-blanks.
> quotelen - total number of lines excluding blank lines AND lines
> starting with >
> siglen - total number of lines excluding blank lines AND lines
> starting with > AND up till '-- '
Same here: The naming of the columns is suboptimal.
> For the lists.debian.org, I investigated using the NNTP interface.
> That works perfectly. We get exactly what we want and it's fast and
> doesn't strain the Gmane server (40,000 subjects/ From fields in ~10
> seconds). There is only one drawback and that is the obfuscation of
> the mail addresses. And that was only in one list I checked. I didn't
> keep a check as to which it was (sorry) but out of six lists, only one
> had obfuscated email addresses.
IMHO we could live with just a few obfuscated lists - at least for
the moment.
> So what I suggest now is that we go with NNTP access only. I think
> that obfuscation is a rarity and we should go ahead with this. For
> starters, you can point me to some mailing lists that you would want
> to parse first so I can check for obfuscation. Then at DebConf, we can
> take up how to parse these lists or request for mbox archives.
You might like to check my Perl code for all the lists I was observing.
> I will be investigating the CGI thing tomorrow.
Great. BTW, I'll be offline tomorrow - well for *me* it is tomorrow -
for you it is "today". :-)
Kind regards
Andreas.
--
http://fam-tille.de
More information about the Teammetrics-discuss
mailing list