[Teammetrics-discuss] Observations from current run of `archiveparser.py`.
Andreas Tille
andreas at an3as.eu
Sat Feb 4 22:33:43 UTC 2012
Hi Sukhbir,
On Sun, Feb 05, 2012 at 12:44:12AM +0530, Sukhbir Singh wrote:
> ...
> (As of writing this, there are 22 messages that have been flagged with
> the keywords, but they are all OK)
Seems I might have been to harsh in my first SPAM fighting effort.
Please change in a way that we prefer rather having some SPAMs
undetected but all real mails are in our listarchives table.
> That is fine. But once a message is flagged as spam, it *does not*
> populate the `listarchives` table. We just populate `listspam` and
> move on.
In principle this is right if we can be sure that a message is really
SPAM. Please remove all these keywords from our list of potential
SPAM keywords which generate false positives.
> Because we have worked hard on this (and aimed for
> perfection), I want to save every message possible. And right now, we
> are losing lots of messages.
You are definitely right that this should not happen.
> What I suggest is this -- populate `listspam` *and also* populate
> `listarchives`. That way we serve both purposes: help the spam
> fighting effort and not lose any messages. I had discussed this
> earlier but you were not comfortable about it, but given that we are
> losing so many genuine messages, I thought I had bring this up again.
> Have a look at the log file and you will make up your mind hopefully
> about this!
I'm not fully convinced that just dropping those keywords from our list
of SPAMy keywords would not have a similarly helpful effect. But I will
not insist on my opinion if you ar enot comfortable about this - so if
you really feel better to have all messages in the listarchives table I
will not stop you to do so. However, in this case it does not make any
sense to simply keep copies in the listspam table. We should rather add
a spam flag to the list archives table instead of stupidly copying data.
> PS: Summer of Code 2012 was just announced. Who knows, we might get a
> student :)!
I do not think that it works for this topic again. If you have some
other ideas feel free to propose.
Kind regards
Andreas.
--
http://fam-tille.de
More information about the Teammetrics-discuss
mailing list