Bug#950363: licensecheck reports dubious (may be misleading) information for image files

Jonas Smedegaard jonas at jones.dk
Thu Feb 6 10:21:01 GMT 2020


control: severity -1 wishlist
control: retitle -1 licensecheck: detect additional (i.e. non-SPDX) qualities

Quoting Dominique Dumont (2020-02-06 10:17:26)
> On Friday, 31 January 2020 20:28:50 CET Jonas Smedegaard wrote:
> > Concretely I do think that you have spotted an issue with an image 
> > containing non-free code, and I recommend that you report or fix it.
> 
> Thanks for the advice. This worked quite well:
> https://github.com/libuv/libuv/issues/2670
> https://github.com/libuv/libuv/pull/2672
> 
> I'm prettysure the stripped images will be merged for libuv1 release.
> 
> Thanks for pushing me to do this :-)

I am very happy that it worked out well.  Their response is similar to 
what I generally experience: Positive surprise and (maybe after a bit of 
confusion) full agreement that it should be fixed.

(I must admit I am commonly a bit scared of filing bugreports - which 
makes me wonder not if but when and how I myself is scary to approach)


> > Just yesterday I wrote down in the TODO file for licensecheck (but 
> > not yet added that edit to git) that it would be nice if a set of 
> > "qualities" was expressed, besides the concrete task of finding 
> > copyright and licnesing statements.  It was inspired by the 
> > currently the only "side note" tracked - "(with wrong address)" - 
> > and presented only in default output (it really should be added as a 
> > Comment when generating DEP-5 output), but fits well with this 
> > example too.
> 
> ok, I'm wondering if you plan to include this information in "machine" 
> output. That may break the processing done by cme.

There are two machine-readable outputs currently, enabled by either of 
options "--machine" or "--deb-machine" - I assume you are talking about 
the latter.

Yes, I plan to include most possible in machine-readable output, but 
will (for the "--deb-machine" format) keep within the boundaries of 
https://www.debian.org/doc/packaging-manuals/copyright-format/1.0/ - so 
what you should worry about should only be if you are making too strict 
assumptions on that format.  In particular, beware that it is plain 
wrong to only expect explicitly defined fields (as per ยง 4: "Extra 
fields can be added to any paragraph").

On a related note, beware that the FIXMEs added to output is deliberate: 
Licensecheck does not claim to know better than human reasoning what is 
accurate, and therefore flag all its findings as needing human 
confirmation.  If you wrap Licensecheck and hide/strip those FIXMEs then 
_you_ take responsibility for the output being perceived as requiring 
less/no human validation.  Now that I write this, it occurs to me that 
it probably makes sense to expand those FIXMEs to add some explanatory 
text.


> > Here is the full list I wrote down:
> > 
> >  * Quality flagging
> >    + ambiguous: license ref pointing to multiple license fulltexts
> >      (e.g. "MIT" or "GNU" or "GPL"
> >    + unlicensed: copyright holder(s) but no licensing
> >    + ungranted: license fullref requiring explicit grant,
> >      but no corresponding license grant
> >    + incomplete: fractions of license fullref, but no complete fullref
> >    + alien: license label but no license name
> >    + unowned: license but no copyright holder
> >    + uncertain: license ref and more unknown text
> >      in same sentence/paragraph/section
> >    + buried: license or copyright not at top of file
> >    + unstructured: license/copyright not at ideal place of data structure
> >      (e.g. in commend field of EXIF data, or in content o of PDF/HTML)
> >    + unaligned: license/copyright out of sync between layers of structure
> >      (e.g. ICC data and EXIF data of PNG, or content and metadata of
> > PDF/HTML) + imperfect: license ref not following format documented in
> > license fulltext + conflict: incompatible licenses
> >      (e.g. GPL-3+ and GPL-2-only, or OpenSSL and GPL)
> > 
> > The example you present here would ideally (continue to report HP as
> > copyright holder - and more reliably so, but that's a separate issue -
> > and) be flagged as "unlicensed", "buried" and "unaligned".
> > 
> > Does that make sense?  
> 
> yes, but I'm not sure how I could exploit this information with cme.

I imagine that qualities are of different importance for different uses 
of licensecheck.  An author might be interested in correcting errors, 
and a larger organization of authors (e.g. KDE) might want to ensure 
coherence both in writing style and in licensing "regime" (in lack of a 
better word: which political field they want to stay within - e.g. 
"GNU-compatible copyleft" or "Apache semi-copyleft without 
GPL-contamination"), whereas a distributor like Debian is less 
interested about style (we cannot change it anyway) except for details 
directly harmful for our work (e.g. wrong contact information as has 
happened with FSF changing postal address).


> May be a mechanism similer to "and/or" in license: a license statement 
> with "and/or" is allowed but triggers a warning inviting cme user to 
> investigate manually the problematic file and override the information 
> extracted by licensecheck.

...then maybe I should add " and/or UNKNOWNS" to _all_ detections - 
which is currently implied by the "FIXME" comments.

To clarify: When licensecheck says "GPL-2+ and/or MIT"" then it means 
"this file is seemingly licensed under GPL-2+ and/or MIT (and/or 
additional terms not auto-detected)" (not "this file is _only_ licensed 
under GPL-2+ and/or MIT").

If cme warns about "and/or" needing human investigation but not FIXMEs, 
then it implicitly says FIXMEs need less human investigation which is 
plain wrong!


> > Would you agree to turn this bugreport into a
> > wishlist reminder for making that side-note spiffy-ness happen?
> 
> Sure.

Done.  Thanks a lot for your input!


 - Jonas

-- 
 * Jonas Smedegaard - idealist & Internet-arkitekt
 * Tlf.: +45 40843136  Website: http://dr.jones.dk/

 [x] quote me freely  [ ] ask before reusing  [ ] keep private
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: signature
URL: <http://alioth-lists.debian.net/pipermail/pkg-perl-maintainers/attachments/20200206/0eb35947/attachment-0001.sig>


More information about the pkg-perl-maintainers mailing list