[Teammetrics-discuss] Spam filters and encoding handlers in place
Sukhbir Singh
sukhbir.in at gmail.com
Wed Jun 22 20:28:55 UTC 2011
Hi,
repository.update()
I know the problem of the root user still exists but this will be the
last time as the next phase will be creating a deb package! For now,
please run it as root if you want to try it out for yourself, or you
can wait for a day or two and let me remove this hurdle (read
further).
Changes:
+ A working spam filter in place. This is handled by the spamfilter.py.
You can have a look at the source code to see what all is being
handled. I think the filters in place cut a significant amount of spam
from what I have seen.
+ I have tried to handle all the encoding errors, but still (very few
I guess) still remain. Weirdly, all the encoding errors as of now are
with the Subject field *only* and not with Name field. I will find out
what is causing the problem soon.
+ There is a new table called listspam which saves the reason why the
message was considered as spam which will help us identify how well
our filter is working (as requested by Andreas).
+ Here is what some sample output looks like from listspam (from the
list: debian-med-commit):
name | subject
| reason
-------------------------+-------------------------------------------------+-----------------------
CORNEL | [med-svn] util
| Name is in upper case
CURSURI GRATUITE ONLINE | [med-svn] invitatie la cursuri gratuite
online | Name is in upper case
EVRIKA GROUP | [med-svn] LA MULTI ANI !
| Name is in upper case
LINO TECH | [med-svn] Fw: PARDOSELI PVC TRAFIC INTENS
| Name is in upper case
EVRIKA GROUP | [med-svn] invitatie la cursuri de
perfectionare | Name is in upper case
So, overall, pretty slick!
SELECT name, COUNT(name) FROM listarchives WHERE
project='debian-med-commit' GROUP BY name ORDER BY count DESC LIMIT
10;
name | count
------------------------------------+-------
Charles Plessy | 1352
Andreas Tille | 1261
tille at alioth.debian.org | 755
hanska-guest at alioth.debian.org | 509
Mathieu Malaterre | 498
plessy at alioth.debian.org | 389
Steffen Möller | 346
smoe-guest at alioth.debian.org | 344
charles-guest at alioth.debian.org | 342
olivier sallou | 169
(10 rows)
So let me know your thoughts on this.
The next phases in order:
+ deb package.
+ encoding errors.
That's all for tonight!
More information about the Teammetrics-discuss
mailing list