Bug#867305: licensecheck: doesn't parse unicode files correctly
Jonas Smedegaard
jonas at jones.dk
Fri May 15 12:03:42 BST 2020
Control: retitle -1 licensecheck: misparses utf8-encoded files by default
Quoting Ximin Luo (2017-07-05 18:00:28)
> licensecheck seems to generate bad output for unicode files such as:
>
> https://sources.debian.net/src/sagemath/7.6-2/sage/src/doc/ja/tutorial/tour_rings.rst
>
> An example command line is:
>
> $ licensecheck -l250 --deb-machine --merge-licenses src/doc/ja/tutorial/tour_rings.rst
>
> I get glyphs like <U+008D>ã<U+0081> suggesting that maybe it is
> getting utf-8-encoded twice.
Licensecheck reads data as Latin1 by default.
Explicitly tell licensecheck to use (or more accurately first try) utf8:
licensecheck -l250 --deb-machine --merge-licenses --encoding utf8 tour_rings.rst
I agree that this is not optimal: Nowadays licensecheck should use utf8
by default. I am just not quite certain how to go about that - if ok to
simply switch, or if I should make a mimor or major version bump when
doing such change.
--
* Jonas Smedegaard - idealist & Internet-arkitekt
* Tlf.: +45 40843136 Website: http://dr.jones.dk/
[x] quote me freely [ ] ask before reusing [ ] keep private
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: signature
URL: <http://alioth-lists.debian.net/pipermail/pkg-perl-maintainers/attachments/20200515/93296533/attachment.sig>
More information about the pkg-perl-maintainers
mailing list