[Teammetrics-discuss] Observations from current run of `archiveparser.py`.

Sukhbir Singh sukhbir.in at gmail.com
Sun Feb 5 06:46:07 UTC 2012


Hi,

> Seems I might have been to harsh in my first SPAM fighting effort.
> Please change in a way that we prefer rather having some SPAMs
> undetected but all real mails are in our listarchives table.

Yes, that's what we are going to do.

> I'm not fully convinced that just dropping those keywords from our list
> of SPAMy keywords would not have a similarly helpful effect.  But I will
> not insist on my opinion if you ar enot comfortable about this - so if
> you really feel better to have all messages in the listarchives table I
> will not stop you to do so.  However, in this case it does not make any
> sense to simply keep copies in the listspam table.  We should rather add
> a spam flag to the list archives table instead of stupidly copying data.

That's right, we are not going to drop the spam keywords, but also
populate listarchives, so that if we do hit a false positive for the
spamfilter, at least we won't miss that message.

So here is what we are going to do:

1. Run every message through spamfilter.py,
2. If it is spam:
          generate a log entry
          populate listspam
          populate listarchives ALSO but set the boolean column:
is_spam=True [your suggestion]

That way we don't lose anything and both our purposes are served.

    teammetrics=> ALTER TABLE listarchives ADD COLUMN is_spam boolean;
    FEHLER:  Berechtigung nur für Eigentümer der Relation listarchives

I translated what this means (heh!), so set this yourself and when you
are done, let me know and I will start archiveparser.

Thanks for allowing this, I had a strong intuition about this and I
didn't feel comfortable losing any message. Now I feel good!

--
Sukhbir



More information about the Teammetrics-discuss mailing list