[Teammetrics-discuss] SPAM filter issue

Andreas Tille andreas at an3as.eu
Wed Jan 8 14:10:33 UTC 2014


Hi Sukhbir,

I injected 'ImageJ ports on ' to the SPAM filter.  It is not actually
SPAM but due to a broken wrapper in the imagej package from time to time
we got those peaks in the stats.

$ grep -R "ImageJ ports on " 
spamfilter.py:                'ImageJ ports on ',
hacks/get-archive-pages:                'akavanagh',   # strange "spammer" on Debian-med-packaging mailing list Subject: "ImageJ ports on kasilas" in February 2020

Unfortunately this does not work since if you try

teammetrics=# SELECT * from listarchives where project = 'debian-med-packaging' and name like 'Malte V%' limit 3; 
       project        |         domain          |      name      |      email_addr       |                       subject                       |               message_id               | archive_date | today_date | msg_raw_len | msg_no_blank_len | msg_no_quotes_len | msg_no_sig_len | is_spam 
----------------------+-------------------------+----------------+-----------------------+-----------------------------------------------------+----------------------------------------+--------------+------------+-------------+------------------+-------------------+----------------+---------
 debian-med-packaging | lists.alioth.debian.org | Malte Vassholz | vassholz at mail.desy.de | [Debian-med-packaging] ImageJ ports on exflpcx18766 | E1VBRMa-0006ib-DW at exflpcx18766.desy.de | 2013-08-19   | 2013-12-03 |          37 |                1 |                 1 |              1 | t
 debian-med-packaging | lists.alioth.debian.org | Malte Vassholz | vassholz at mail.desy.de | [Debian-med-packaging] ImageJ ports on exflpcx18766 | E1VBRMX-0006br-Ft at exflpcx18766.desy.de | 2013-08-19   | 2013-12-03 |          37 |                1 |                 1 |              1 | t
 debian-med-packaging | lists.alioth.debian.org | Malte Vassholz | vassholz at mail.desy.de | [Debian-med-packaging] ImageJ ports on exflpcx18766 | E1VBRMX-0006bS-Fr at exflpcx18766.desy.de | 2013-08-19   | 2013-12-03 |          37 |                1 |                 1 |              1 | t
(3 rows)

you see there are some occurences left and when trying

teammetrics=# SELECT count(*) from listarchives where project = 'debian-med-packaging' and name like 'Malte V%' ;  
 count 
-------  
   603
(1 row)

this is exactly the number we get into the graph which is pure rubish.
I wonder why spamfilter.py did not deleted this and I'll leave the data
in the database for your inspection (rather than just kicking these
manually).

Kind regards

        Andreas.


-- 
http://fam-tille.de



More information about the Teammetrics-discuss mailing list