[Teammetrics-discuss] Case-insensitive name matching

Andreas Tille andreas at an3as.eu
Fri Jan 4 08:10:45 UTC 2013


Hi,

I've got some positive feedback private mails which I could summarise as
a "+2" here on the list for all people working on this project.

There was one additional hint which can be visualised at:

  http://blends.debian.net/liststats/commitstat_pkg-perl.png

  "gregor h" == "Gregor H"

Currently we can only set a matching in /etc/teammetrics/names.list to
reflect this.  On the other hand a case insensitive matching of names
comes to mind.  

The only chance to effectively do this is to try:

CREATE OR REPLACE FUNCTION commit_names_of_project_lc(text,int) RETURNS SETOF RECORD AS '
  SELECT name FROM (
    SELECT lower(name) AS name, COUNT(*)::int
      FROM commitstat
      WHERE project = $1
      GROUP BY lower(name)
      ORDER BY count DESC
      LIMIT $2
  ) tmp
' LANGUAGE 'SQL';

The difference between the current query is:

teammetrics=# SELECT * FROM  commit_names_of_project('pkg-perl', 10)  AS (category text);
       category       
----------------------
 Damyan Ivanov
 Gregor Herrmann
 Martín Ferrari
 gregor herrmann
 Salvatore Bonaccorso
 Roberto Sanchez
 Ryan Niebur
 Niko Tyni
 Jonas Smedegaard
 Jonathan Yu
(10 Zeilen)


teammetrics=# SELECT * FROM  commit_names_of_project_lc('pkg-perl', 10)  AS (category text);
       category       
----------------------
 damyan ivanov
 gregor herrmann
 martín ferrari
 salvatore bonaccorso
 roberto sanchez
 ryan niebur
 niko tyni
 jonas smedegaard
 jonathan yu
 dominique dumont
(10 Zeilen)


which on one hand solves the problem of the duplicated name with
different capitalisation but now ends up in having all lower cased
names which we would need to "normalise" to the usual spelling and
finally we do not know what is the usual / prefered spelling.  We
also do not have a means to "guess" how gregor really wants to be
spelled.  I doubt that postgresql knows about some case insensitive
"GROUP BY" for the very same reason - there is no way to tell what
is the "correct" result set in case of different capitalisation.

So for the moment we should probably stick to the manuall injection of
Gregor into the config file.  However, we should be aware that we in any
case will "loose" those members of the team that are using different
spellings (in whatever way) that might occupy rank 11 and 12 and thus
should be in the top ten but we just do not see it because the name does
not pop up in the stats.  The same is true for people that show up
inside the stats but should actually be ranked higher because a
different spelling is "hidden" in a lower ranking.

For the moment we can only trust the teams to tell us about missing /
wrongly placed members and adjust the names.  The case insensitive
matching is not our only problem and I do not neither see a simple nor
a "correct" way to fix it.

Kind regards

       Andreas.

-- 
http://fam-tille.de



More information about the Teammetrics-discuss mailing list