Bug#951726: licensecheck: option --encoding is not propagated during recursive scan

Jonas Smedegaard jonas at jones.dk
Thu Feb 20 20:29:40 GMT 2020


Control: tag -1 confirmed

Hi Dominique,

Quoting Dominique Dumont (2020-02-20 17:15:29)
> While packaging nqp, I've noticed a discrepancy in licensecheck output:
> 
> licensecheck correctly reports the absence of information when
> scanning nqp/115-nums.t file from nqp directory:
> 
> $ licensecheck --encoding utf8 --copyright --machine --recursive nqp | grep 115 
> nqp/115-nums.t  UNKNOWN *No copyright*
> 
> licensecheck correctly reports garbage when scanning nqp/115-nums.t
> file from current directory:
> 
> $ licensecheck --encoding utf8 --copyright --machine --recursive . | grep 115
> ./nqp/115-nums.t        UNKNOWN ೨೪, ೫e-೩೨೪, '6e-324 denormal equates to 5e-324 denormal (Uni)'); / ೨೪, ೫e-೩೨೪, '5e-324 denormal equates to 5e-324 denormal (Uni)'); / ೨೪, ೧e-೩೨೩, '9e-324 denormal is 1e-323 (Uni)'); / ೨೪, ೦e೦, 'denormal 5e-324 is recognized and is not 0 (Uni)'); / ೨೪, ೦e೦, '2e-324 denormal is 0e0 (Uni)'); / e-೩೨೪, ೫e-೩೨೪, '2e-324 denormal equates to 5e-324 denormal (Uni)');
> 
> The mis-decoded file contains © character hence the mojibake garbage.
> 
> I would expect --encoding utf8 option to be used to read all files.

Thanks for an excellently framed bugreport!

The cause for the difference in output is revealed in --verbose mode:

$ licensecheck --encoding utf8 --copyright --machine --recursive --verbose . | grep '115\|cannot be read'
file moar/05-decoder.t cannot be read with App::Licensecheck=HASH(0x563009ec7500)->encoding; encoding, will try latin-1:
----- nqp/115-nums.t header -----
./nqp/115-nums.t	UNKNOWN	೨೪, ೫e-೩೨೪, '6e-324 denormal equates to 5e-324 denormal (Uni)'); / ೨೪, ೫e-೩೨೪, '5e-324 denormal equates to 5e-324 denormal (Uni)'); / ೨೪, ೧e-೩೨೩, '9e-324 denormal is 1e-323 (Uni)'); / ೨೪, ೦e೦, 'denormal 5e-324 is recognized and is not 0 (Uni)'); / ೨೪, ೦e೦, '2e-324 denormal is 0e0 (Uni)'); / e-೩೨೪, ೫e-೩೨೪, '2e-324 denormal equates to 5e-324 denormal (Uni)');

Licensecheck chokes on moar/05-decoder.t and re-reads as latin-1.

...but then licensecheck _continues_ to read following files as latin-1, 
which is wrong.

(enabling --verbose also reveals that Licensecheck wrongly treats Encode 
objects as strings, as seen with the HASH string in the warning message)


 - Jonas

-- 
 * Jonas Smedegaard - idealist & Internet-arkitekt
 * Tlf.: +45 40843136  Website: http://dr.jones.dk/

 [x] quote me freely  [ ] ask before reusing  [ ] keep private
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: signature
URL: <http://alioth-lists.debian.net/pipermail/pkg-perl-maintainers/attachments/20200220/63d36d72/attachment.sig>


More information about the pkg-perl-maintainers mailing list