Bug#951726: licensecheck: option --encoding is not propagated during recursive scan
Jonas Smedegaard
jonas at jones.dk
Thu Feb 20 20:29:40 GMT 2020
Control: tag -1 confirmed
Hi Dominique,
Quoting Dominique Dumont (2020-02-20 17:15:29)
> While packaging nqp, I've noticed a discrepancy in licensecheck output:
>
> licensecheck correctly reports the absence of information when
> scanning nqp/115-nums.t file from nqp directory:
>
> $ licensecheck --encoding utf8 --copyright --machine --recursive nqp | grep 115
> nqp/115-nums.t UNKNOWN *No copyright*
>
> licensecheck correctly reports garbage when scanning nqp/115-nums.t
> file from current directory:
>
> $ licensecheck --encoding utf8 --copyright --machine --recursive . | grep 115
> ./nqp/115-nums.t UNKNOWN ೨೪, ೫e-೩೨೪, '6e-324 denormal equates to 5e-324 denormal (Uni)'); / ೨೪, ೫e-೩೨೪, '5e-324 denormal equates to 5e-324 denormal (Uni)'); / ೨೪, ೧e-೩೨೩, '9e-324 denormal is 1e-323 (Uni)'); / ೨೪, ೦e೦, 'denormal 5e-324 is recognized and is not 0 (Uni)'); / ೨೪, ೦e೦, '2e-324 denormal is 0e0 (Uni)'); / e-೩೨೪, ೫e-೩೨೪, '2e-324 denormal equates to 5e-324 denormal (Uni)');
>
> The mis-decoded file contains © character hence the mojibake garbage.
>
> I would expect --encoding utf8 option to be used to read all files.
Thanks for an excellently framed bugreport!
The cause for the difference in output is revealed in --verbose mode:
$ licensecheck --encoding utf8 --copyright --machine --recursive --verbose . | grep '115\|cannot be read'
file moar/05-decoder.t cannot be read with App::Licensecheck=HASH(0x563009ec7500)->encoding; encoding, will try latin-1:
----- nqp/115-nums.t header -----
./nqp/115-nums.t UNKNOWN ೨೪, ೫e-೩೨೪, '6e-324 denormal equates to 5e-324 denormal (Uni)'); / ೨೪, ೫e-೩೨೪, '5e-324 denormal equates to 5e-324 denormal (Uni)'); / ೨೪, ೧e-೩೨೩, '9e-324 denormal is 1e-323 (Uni)'); / ೨೪, ೦e೦, 'denormal 5e-324 is recognized and is not 0 (Uni)'); / ೨೪, ೦e೦, '2e-324 denormal is 0e0 (Uni)'); / e-೩೨೪, ೫e-೩೨೪, '2e-324 denormal equates to 5e-324 denormal (Uni)');
Licensecheck chokes on moar/05-decoder.t and re-reads as latin-1.
...but then licensecheck _continues_ to read following files as latin-1,
which is wrong.
(enabling --verbose also reveals that Licensecheck wrongly treats Encode
objects as strings, as seen with the HASH string in the warning message)
- Jonas
--
* Jonas Smedegaard - idealist & Internet-arkitekt
* Tlf.: +45 40843136 Website: http://dr.jones.dk/
[x] quote me freely [ ] ask before reusing [ ] keep private
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: signature
URL: <http://alioth-lists.debian.net/pipermail/pkg-perl-maintainers/attachments/20200220/63d36d72/attachment.sig>
More information about the pkg-perl-maintainers
mailing list