[Teammetrics-discuss] Checksums for NNTP list parsing.

Tue Jul 5 06:37:31 UTC 2011

Hi,

I am going to start implementing the checksum phase, so I thought it
better to discuss first.

Here are the choices we have:

1. We implement a checksum for the entire list. We download the entire
list from the server and generate the checksum.
PROS: Easy to implement and manage, will just take a few hours to implement.
CONS: The entire list has to be downloaded. So for a list with 40, 000
messages, it can take time (a few hours).

2. We don't implement a checksum for now and just save which lists
have been downloaded.
PROS: Very fast because we will keep track of what has been downloaded
without the need to download it again.
CONS: If the message changes in the archives (does this happen
often?), the metrics can change.

3. We implement and save the checksum for each article in the server.
This makes it easier to download new articles.
PROS: Makes management per messages and downloading new messages
easier. But messages still have to be downloaded (as in 1).
CONS: As mentioned, some lists have 40, 000 messages. So saving this
to a file is probably not a good idea. A database could be a better
option.

So how do you want this handled? Note that I will go with (1) in case
you don't have any apprehension :-)

-- 
Sukhbir