[Teammetrics-discuss] Web Parser

Andreas Tille andreas at an3as.eu
Thu Dec 1 13:39:53 UTC 2011


Hi Sukhbir,

(just git pull - I have chmod a+x archiveparser.py)

On Thu, Dec 01, 2011 at 05:08:46PM +0530, Sukhbir Singh wrote:
> Hi,
> 
> Yesterday I had updated the web archive parser and fixed some issues
> with the Date handling and all. Now it's in an operational state much
> better than the previous one, however there is one issue that I should
> discuss.
> 
> After a certain amount of messages downloaded (usually 100+),
> lists.d.o stops responding for sometime. This causes the
> urllib2.URLError to be thrown but lists.d.o is not responding; to be
> sure that this problem was not in the code, I noticed that it failed
> to load even through the browser. After a few seconds, it starts
> responding again. This is totally random. I have handled this
> exception but *not* implemented a mechanism that tries to download the
> message again.

Sounds like some bandwidth limiting means implemented by Debian Server
Admin.  Would you mind asking this question on
debian-services-admin at lists.debian.org ?

> I was wondering, is this expected? I mean, did you face this issue
> with your code, ever?

No, but I was just reading the index files and not the whole pages which
might have kept me by far below such a potential limit.

BTW, I just tried

$ ./archiveparser.py 
Traceback (most recent call last):
  File "./archiveparser.py", line 150, in <module>
    main(conn, cur)
  File "./archiveparser.py", line 93, in main
    day, message_month, message_year = ''.join(date).split()
ValueError: need more than 0 values to unpack


The logfile simply says:

2011-12-01 13:26:08,933 INFO:           Starting Web Archive Parser                                       
2011-12-01 13:26:08,959 INFO:   List 1 of 55                                                              
2011-12-01 13:26:08,959 INFO: Fetching 'debian-accessibility'                                             
2011-12-01 13:26:09,261 INFO: List archives are from 2003 to 2011                                         
2011-12-01 13:26:09,510 WARNING: Possible spam: date mismatch in message 871y1t91zy.fsf at lexx.delysid.or   
 

Kind regards

        Andreas.

-- 
http://fam-tille.de



More information about the Teammetrics-discuss mailing list