[Teammetrics-discuss] No Message-ID found
Andreas Tille
andreas at an3as.eu
Thu Aug 18 12:33:34 UTC 2011
Hi Sukhbir,
git pull
I handled the valid case of duplicated Message-IDs (explanation for
validity in commit log and code comment). This again proves my point
that primary keys are a good idea - we should not count one single
message twice, right? :-)
However, the problems primary keys are uncovering always force you to
handle the according events. Currently you are handling missing Message
IDs like:
if msg_id_raw is None:
logging.error('No Message-ID found')
msg_id = ''
This works only once, because the primary key on Message_id throws
Traceback (most recent call last):
File "./liststat.py", line 525, in <module>
main(conf_info, total_lists)
File "./liststat.py", line 453, in main
parse_and_save(mbox_files)
File "./liststat.py", line 262, in parse_and_save
(msg_id, project, name, email_addr, subject, reason)
psycopg2.IntegrityError: FEHLER: doppelter Schlüsselwert verletzt Unique-Constraint »pk_spam_messageid«
DETAIL: Schlüssel »(message_id)=()« existiert bereits.
(Sorry for German locale - it just says "duplicated key »(message_id)=()« exists and
violates Unique-Constraint »pk_spam_messageid«)
So what to do? The fact that this is the second case where a missing ID
is qualified as SPAM message lets me assume that a missing message ID
could be perfectly added to the "reason"s for SPAM. Moreover we need to
"invent" some valid Message-ID which is unique and enables us to keep a
record of this message. So what about the following algorithm
md5hash('date' + 'subject) + '@teammetrics-spam.debian.org'
This should work as unique identifyer and would make sure we will not
violate the primary key constraint.
Kind regards
Andreas.
--
http://fam-tille.de
More information about the Teammetrics-discuss
mailing list