[Debichem-devel] Formating of author field in debian/upstream (Was: [Debichem-commits] r3618 - /unstable/gromacs/debian/upstream)

Andreas Tille andreas at an3as.eu
Mon May 7 20:14:30 UTC 2012


On Mon, May 07, 2012 at 12:53:36AM +0200, Michael Banck wrote:
> I've whipped up a patch (attached) which seems to work for most cases which are in
> debian/upstream already (input/output attached as well).

I commited your patch with some changes[1].  Here you can see the
changes regarding the author-fixing code[2].
 
> The following still fail, due to bugs or unusual authors in debian/upstream:
> 
> Test string input:  Vullo, Alessandro, and Frasconi, Paolo
> 
> (", and" is wrong)

That's actually really wrong and

  SELECT * from bibref where value like '%, and%' ;

has 16 matches in the current UDD database.  I'll go on fixing this at
the source by checking this at debian/upstream import time as well as
issuing a warning to remind me fixing the upstream files themselves.
 
> Test string input:  Li, Heng and Handsaker, Bob and Wysoker, Alec and
> Fennell, Tim and Ruan, Jue and Homer, Nils and Marth, Gabor and
> Abecasis, Goncalo and Durbin, Richard and 1000 Genome Project Data
> Processing Subgroup
> 
> (well, that's kinda tricky - "1000 Genome Project Data
> Processing Subgroup" does not have a "," like the others, so my
> heuristics fail)

Because that's actually really tricky I suggest to delay such cases.  I
injected a simple "emergency brake" and do not replace the authors
string if the number of ',' in the new string is higher as the number of
' and ' strings in the original string.  This also causes a warning in
the log (all other changes only INFO just to have some record).
 
> Test string input:  Michael Hanke and Yaroslav O. Halchenko and Per B.
> Sederberg and Stephen Jos{\'e} Hanson and James V. Haxby and Stefan
> Pollmann
> 
> (Does {\'e} render fine on the web? - the output is otherwise fine)

Fixed in debian/upstream file.
 
> Test string input:  Van der Spoel D, Lindahl E, Hess B, Groenhof G, Mark
> AE, Berendsen HJ
> Test string input:  Trapnell C, Williams BA, Pertea G, Mortazavi AM,
> Kwan G, van Baren MJ, Salzberg SL, Wold B, Pachter L.
> 
> (No "," between last name and first name, no final "and")

This should be fixed in Debian upstream files.  I'll try to issue a
warning at input time to make us aware about such problems.  I think
some check whether there are more than one ',' but no ' and ' will do
here.

> Test string input:  Xavier Didelot, Daniel Falush
> 
> (no "and", this one breaks on my patch)

Hmmm, no idea how to catch this at debian/upstream import time.  Could
be a perfectly valid "double lastname, double firstname" string.  I
simply suggest fixing the debian/upstream file once you are aware about
this.

As always you can check the log output at

  /var/lib/gforge/chroot/home/groups/blends/webtools/logs

(here debian-med.log has the most values).  Unfortunatel there is currently
a problem on alioth because there are some timing problems with udd.d.n
(which urgently needs replacement of hardware).  So currently you only
can see the effect on blends.debian.net.  Here is some relevant snippet
of the log:


tille at debian-med:/var/lib/gforge/chroot/home/groups/blends/webtools/logs$ grep "Author string" debian-med.log | grep -A2 -B2 ^WARNING
INFO - blendstasktools.py (1527): Author string changed in r-other-bio3d: 'Grant, Barry J. and Rodrigues, Ana P. C. and ElSawy, Karim M. and McCammon, J. Andrew and Caves, Leo S. D.' -> 'Barry J. Grant, Ana P. C. Rodrigues, Karim M. ElSawy, J. Andrew McCammon and Leo S. D. Caves'
INFO - blendstasktools.py (1527): Author string changed in r-other-mott-happy: 'Mott, Richard and Talbot, Christopher J. and Turri, Maria G. and Collins, Allan C. and Flint, Jonathan' -> 'Richard Mott, Christopher J. Talbot, Maria G. Turri, Allan C. Collins and Jonathan Flint'
WARNING - blendstasktools.py (1525): Refuse to change Author string in samtools: 'Li, Heng and Handsaker, Bob and Wysoker, Alec and Fennell, Tim and Ruan, Jue and Homer, Nils and Marth, Gabor and Abecasis, Goncalo and Durbin, Richard and 1000 Genome Project Data Processing Subgroup'(9) -> 'Li, Heng, Handsaker, Bob, Wysoker, Alec, Fennell, Tim, Ruan, Jue, Homer, Nils, Marth, Gabor, Abecasis, Goncalo, Durbin, Richard and 1000 Genome Project Data Processing Subgroup'(17)
INFO - blendstasktools.py (1527): Author string changed in seaview: 'Gouy, Manolo and Guindon, Stephane and Gascuel, Olivier' -> 'Manolo Gouy, Stephane Guindon and Olivier Gascuel'
INFO - blendstasktools.py (1527): Author string changed in seqan-apps: 'Doring, Andreas and Weese, David and Rausch, Tobias and Reinert, Knut' -> 'Andreas Doring, David Weese, Tobias Rausch and Knut Reinert'


Thanks for your inspiring patch and now back to do the needed changes
in UDD importer.

Kind regards

        Andreas.

[1] http://anonscm.debian.org/viewvc/blends?view=revision&revision=3316
[2] http://anonscm.debian.org/viewvc/blends/blends/trunk/webtools/blendstasktools.py?view=diff&r1=3315&r2=3316&diff_format=h

-- 
http://fam-tille.de



More information about the Debichem-devel mailing list