[Soc-coordination] Debian Teams Activity Metrics - Report IV [Update]

Olly Betts olly at survex.com
Thu Aug 4 06:26:42 UTC 2011


On Thu, Aug 04, 2011 at 11:16:26AM +0530, Sukhbir Singh wrote:
> Hi Olly,
> 
> > Gmane actually has an mbox export feature for this sort of thing:
> 
> I am aware :) There are a couple of reasons why we thought that this
> is a bad idea:
> 
> 1). Gmane actually doesn't favor this. The export page says:

Perhaps I should have mentioned I'm involved with helping to run
Gmane, though mostly the search side.  So I have a reasonable idea
of how it works internally.

>     This interface is a slight CPU and bandwidth hog, so if it's
> abused, it will be shut down. (List admins will then have to get a
> user name/password thing going.)

I believe this is to try to discourage people from using this feature
gratuitously.  Fetching the messages individually will probably use a
fraction more bandwidth than the mbox version.  I'd expect the CPU would
if be more for fetching individually too.

> Even though I have a redundancy check in place that prevents the same
> articles from being fetched again, imagine this running the first time
> for lists that have 40, 000 articles. And as we are not going to be
> parsing a single list, for two - three teams, the article count goes
> up to ~ 1, 50, 000 articles. This caused two problems:
> 
> a). This is not a good way of fetching articles via Gmane and would
> strain the Gmane server and ...
> b). ... we seemed to randomly getting disconnected when fetching
> articles in the very initial stages only.
> c). So not only is it possible abuse but it doesn't suit us.
[...]
> Feel free to comment and suggest.

I would talk to Lars about this - he's a friendly guy, if sometimes
rather busy.  He's the man who actually runs the show so will be able
to give you the definitive word on what Gmane would prefer you to do.

However, he said a few months ago that "62K articles isn't a lot":

http://thread.gmane.org/gmane.discuss/13937/focus=13940

I do know that IP addresses and ranges sometimes get blocked for
fetching excessive numbers of articles via NNTP, though I'm not sure how
many counts as excessive.  But your current approach might be viewed as
abusive.

Cheers,
    Olly



More information about the Soc-coordination mailing list