Bug#867305: licensecheck: doesn't parse unicode files correctly

Jonas Smedegaard jonas at jones.dk
Fri May 15 12:03:42 BST 2020


Control: retitle -1 licensecheck: misparses utf8-encoded files by default

Quoting Ximin Luo (2017-07-05 18:00:28)
> licensecheck seems to generate bad output for unicode files such as:
> 
> https://sources.debian.net/src/sagemath/7.6-2/sage/src/doc/ja/tutorial/tour_rings.rst
> 
> An example command line is:
> 
> $ licensecheck -l250 --deb-machine --merge-licenses src/doc/ja/tutorial/tour_rings.rst
> 
> I get glyphs like <U+008D>ã<U+0081> suggesting that maybe it is 
> getting utf-8-encoded twice.

Licensecheck reads data as Latin1 by default.

Explicitly tell licensecheck to use (or more accurately first try) utf8:

  licensecheck -l250 --deb-machine --merge-licenses --encoding utf8 tour_rings.rst

I agree that this is not optimal: Nowadays licensecheck should use utf8 
by default.  I am just not quite certain how to go about that - if ok to 
simply switch, or if I should make a mimor or major version bump when 
doing such change.

-- 
 * Jonas Smedegaard - idealist & Internet-arkitekt
 * Tlf.: +45 40843136  Website: http://dr.jones.dk/

 [x] quote me freely  [ ] ask before reusing  [ ] keep private
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: signature
URL: <http://alioth-lists.debian.net/pipermail/pkg-perl-maintainers/attachments/20200515/93296533/attachment.sig>


More information about the pkg-perl-maintainers mailing list