[Teammetrics-discuss] [andreas at an3as.eu: No Message-ID found]
Andreas Tille
andreas at an3as.eu
Thu Aug 18 13:33:42 UTC 2011
You can probably respond via list - I than can reply - which seems to
work. Sometimes this mailing list is strange somehow - did not
experienced this in other lists ...
On Thu, Aug 18, 2011 at 06:47:35PM +0530, Sukhbir Singh wrote:
> > Traceback (most recent call last):
> > File "./liststat.py", line 525, in <module>
> > main(conf_info, total_lists)
> > File "./liststat.py", line 453, in main
> > parse_and_save(mbox_files)
> > File "./liststat.py", line 262, in parse_and_save
> > (msg_id, project, name, email_addr, subject, reason)
> > psycopg2.IntegrityError: FEHLER: doppelter Schlüsselwert verletzt Unique-Constraint »pk_spam_messageid«
> > DETAIL: Schlüssel »(message_id)=()« existiert bereits.
>
> Oh! I never anticipated that we will come across a duplicate message ID.
Well, for the *validly* duplicated Message-IDs I hoped to provide a
sensible explanation. For the SPAM-caused missing Message-IDs it is
clear that you get duplicates if you set them to ''.
> > record of this message. So what about the following algorithm
> >
> > md5hash('date' + 'subject) + '@teammetrics-spam.debian.org'
>
> Sounds good :) So we do this for messages that have no message ID set,
> right?
Yes, If there is no Message-ID found, then set it to
md5hash('date' + 'subject) + '@teammetrics-spam.debian.org'
*and*
set SPAM reason flag to "No Message ID".
Could you implement this right now to enable me keeping on with my
tests?
> And it can help in better detection of spam also later.
However, it is a bit hard to remove the SPAM messages according to
the Message-ID. :-)
I have no idea if similar things will happen in lists.debian.org mboxes
- but I'd also vote to drop these messages - at least I will suggest this
to listmaster in my next ping...
Kind regards
Andreas.
--
http://fam-tille.de
More information about the Teammetrics-discuss
mailing list