[Teammetrics-discuss] Name fixing (Was: NNTPStat completed successfully.)
Sukhbir Singh
sukhbir.in at gmail.com
Wed Nov 9 11:54:28 UTC 2011
Hello!
> teammetrics=# SELECT name, count(*) from listarchives where name ilike '%tille%' group by name;
> name | count
> -----------------------------------+-------
> Andreas Tille | 7678
> Tille, Andreas | 157
> <tille at debian.org | 6
> 'Andreas Tille' | 2
> <tillea at rki.de | 1
>
Interesting find. To find other names, I did this:
select count(*) from listarchives where name ilike '<%'; count
-------
4681
(1 row)
and
select count(*) from listarchives where name ilike '''%';
count
-------
77
(1 row)
So clearly, there are other such names. When checking the message from
lists.d.o web interface, I see that these are the messages that have a
'From' header like:
From: <foo at bar.com>
instead of
From: Foo Bar <foo at bar.com>
... the name is missing. So there is no nothing we can do in this case
because constructing a name from an email address is not possible.
However, we can strip the `<` and `'` characters from the 'Name' field
which will make: 'Andreas Tille' and Andreas Tille equal.
For the cases of: tillea at rki.de ; tille at debian.org, again the same
problem is there.
So, to summarize:
1. I will strip the `<` and `'` characters.
2. For cases where the name == the email address, there is nothing we
can do except add entries manually.
We have another bug in our code, something that I feel stupid about!
I just noticed when investigating the above problem, we are storing
the email address in the form of:
name at domain.com
Stupid me! I don't know came into my mind that I had this line in liststat:
email_addr = email_raw.replace('@', ' at ')
:(
I will fix all these issues and then push them.
More information about the Teammetrics-discuss
mailing list