Bug#867302: licensecheck: incorrectly parses multi-line copyright notices

Jonas Smedegaard jonas at jones.dk
Wed Jul 5 18:15:02 UTC 2017


Control: reassign -1 libstring-copyright-perl
Control: retitle -1 libstring-copyright-perl: incorrectly parses multi-line copyright notices
Control: found -1 0.003005-1

Hi Ximin,

Quoting Ximin Luo (2017-07-05 17:45:17)
> For https://sources.debian.net/src/sagemath/7.6-2/sage/src/sage/misc/edit_module.py/
> 
> $ licensecheck --copyright src/sage/misc/edit_module.py 
> src/sage/misc/edit_module.py: GPL
>   [Copyright: 2007 Nils Bruin <nbruin at sfu.ca> and]
> 
> This is wrong, but I can work around it with the following sed script:
> 
> $ cat src/sage/misc/edit_module.py | tr '\n' '\t' | sed -e 's/\(,\|\band\)\s*\t#\?\s*/\1 /g' | tr '\t' '\n' > fixed.py
> $ licensecheck --copyright fixed.py 
> fixed.py: GPL
>   [Copyright: 2007 Nils Bruin <nbruin at sfu.ca> and William Stein <wstein at math.ucsd.edu>]
> 
> It would be good if this logic were incorporated into licensecheck 
> itself. I'd help, but my perl is really bad.
> 
> (Also perhaps the # in the regex should be a (?:#|//|/*) or something 
> like that)

I agree (unsurprisingly) that this is wrong.

Unfortunately it is not as simple as throwing a regex at it: One of my 
reasons for taking over and working on licensecheck was a remark once on 
d-devel@ that it was far too slow to be usable for Chromium, and I 
wanted to (silently so as to not make too much of a fool of myself) take 
the challenge of optimizing it.

Unlikely in its days living in devscripts, licensecheck routines to 
match copyright holders have been separated into new library 
String::Copyright (libstring-copyright-perl in Debian), and the code has 
been refactored to use a single large RE2-compatible regex to match each 
copyright statement, in the hope of some day switching to use the RE2 
engine and become faster...

My first brief look at this has revealed a few bugs: In next release of 
licensecheck the leading # is stripped _before_ handing over to 
String::Copyright code (as was intended for years).

Have a look (if interested) at /usr/share/perl5/String/Copyright.pm and 
in particular the (huge when expanded) $signs_and_more_re at line 138.

Replacing $blank_re with $blank_or_break_re in $owners_re (line 136) 
succeeds in detecting the second copyright holder, but then also bogusly 
includes the license statement as a copyright holder.


> X

That is the most elegant signature I have seen. Ever!

It beats my primary school teacher who used "kh" to mean both her 
initials and an abbreviation of the danish equivalent of "kind regards".


 - Jonas

-- 
 * Jonas Smedegaard - idealist & Internet-arkitekt
 * Tlf.: +45 40843136  Website: http://dr.jones.dk/

 [x] quote me freely  [ ] ask before reusing  [ ] keep private
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: signature
URL: <http://lists.alioth.debian.org/pipermail/pkg-perl-maintainers/attachments/20170705/19d744d2/attachment.sig>


More information about the pkg-perl-maintainers mailing list