Bug#787821: libhtml-parser-perl: encode_entities() convert chars to à instead of their proper entity

gregor herrmann gregoa at debian.org
Fri Jun 5 15:21:18 UTC 2015


On Fri, 05 Jun 2015 16:20:24 +0200, Mathieu ROY wrote:

> > > I obviously can adjust the script to pre convert UTF-8 to ISO-8859
> > Or just add "use utf8;" to your script if it contains utf8-encoded
> > strings.
> That works for the test script allright.
> But in the script I'm actually working on, the string is imported from an 
> image exif data. And in this case, use utf8 has no effect at all. 

Right, "use utf8;" only affects the _script_ but not input and
output.

> The string is 
> utf8 and encode_entities fails to convert it properly.

In this case I'd probably try with "use utf8::all;" or told open()
about the encoding:

>   $ cat test.pl 
> #!/usr/bin/perl
> use utf8;
> use HTML::Entities;
> 
> open(INPUT, "< testdata");

open(my $fh,'<:encoding(utf8)', 'testdata');

(Untested.)

> "When UTF-8 becomes the standard source format, this pragma will effectively 
> become a no-op."
> 
> Well, that day, if that day comes, HTML::Entities will definitely have to deal 
> properly with UTF-8 first hand. :-)

In my understanding, HTML::Entities doesn't have a problem with
UTF-8; it's just about telling perl itself, how the data in the
script or read from an external file are encoded.
 

Cheers,
gregor

-- 
 .''`.  Homepage: http://info.comodo.priv.at/ - OpenPGP key 0xBB3A68018649AA06
 : :' : Debian GNU/Linux user, admin, and developer -  https://www.debian.org/
 `. `'  Member of VIBE!AT & SPI, fellow of the Free Software Foundation Europe
   `-   NP: Peter, Paul and Mary: For Loving Me
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 949 bytes
Desc: Digital Signature
URL: <http://lists.alioth.debian.org/pipermail/pkg-perl-maintainers/attachments/20150605/95312a0c/attachment-0001.sig>


More information about the pkg-perl-maintainers mailing list