[Teammetrics-discuss] Filtering upstream contributors.

Sukhbir Singh sukhbir.in at gmail.com
Sun Jul 17 20:13:26 UTC 2011


Hi,

I did some testing with the 'getend passwd' method suggested by Alioth
admins. Here are the results using the 'debian-med' Git repository.

1. If we use their approach, we get the names of the users and then
filter the results -- if the name is in 'getent passwd', save to
database, else skip.

         name          | sum
------------------------+------
 Andreas Hildebrandt    | 1103
 Charles Plessy         |  698
 Aaron M. Ucko          |  283
 Shaun Jackman          |  157
 Michael Hanke          |  128
 Rafael Laboissiere     |  117
 Andreas Tille          |   26
 Steffen Möller         |   11

The only wrong information here is Steffen's. Because in some commits,
he probably used the Git username as as 'Steffen Moller' and in others
he must have used 'Steffen Möller' and hence the difference.

But let's continue. Here is the result _without_ using any filter, but
I have removed the upstream contributors names manually:

          name           | sum
-------------------------+------
 Andreas Hildebrandt     | 1103
 Charles Plessy          |  698
 Aaron M. Ucko           |  283
 Shaun Jackman           |  157
 Michael Hanke           |  128
 Rafael Laboissiere      |  117
 Steffen Möller          |   74
 Andreas Tille           |   26

Notice, Steffen has 74 commits.

So here is what I feel. The method suggested by the admins does work
as we are bothered with the name of the contributor itself and not
his/ her email address. This is because we check if the author name
from the Git repository exists in Alioth -- if it does, we save it
else we don't. So IMHO, we really don't need the email address because
we are going to discard users that don't exist in Alioth. What are
your thoughts?

As far as the special case of Steffen is concerned, maybe we will
encounter more of them. For that, I suggest we have a whitelist that:

    if author name is not in Alioth BUT it is in the whitelist, save
it to the database.

Something similar to updatenames.py that let's us handle this manually.

I have not pushed this code (getend or filter), but I will do it once
you give me the go ahead.

There is only one thing left for commitstat after this and that is
implementing a check for SVN repositories. I have some ideas related
to that but I will implement them tomorrow after researching a bit
more.

-- 
Sukhbir



More information about the Teammetrics-discuss mailing list