Bug#787821: libhtml-parser-perl: encode_entities() convert chars to à instead of their proper entity
gregor herrmann
gregoa at debian.org
Fri Jun 5 15:21:18 UTC 2015
On Fri, 05 Jun 2015 16:20:24 +0200, Mathieu ROY wrote:
> > > I obviously can adjust the script to pre convert UTF-8 to ISO-8859
> > Or just add "use utf8;" to your script if it contains utf8-encoded
> > strings.
> That works for the test script allright.
> But in the script I'm actually working on, the string is imported from an
> image exif data. And in this case, use utf8 has no effect at all.
Right, "use utf8;" only affects the _script_ but not input and
output.
> The string is
> utf8 and encode_entities fails to convert it properly.
In this case I'd probably try with "use utf8::all;" or told open()
about the encoding:
> $ cat test.pl
> #!/usr/bin/perl
> use utf8;
> use HTML::Entities;
>
> open(INPUT, "< testdata");
open(my $fh,'<:encoding(utf8)', 'testdata');
(Untested.)
> "When UTF-8 becomes the standard source format, this pragma will effectively
> become a no-op."
>
> Well, that day, if that day comes, HTML::Entities will definitely have to deal
> properly with UTF-8 first hand. :-)
In my understanding, HTML::Entities doesn't have a problem with
UTF-8; it's just about telling perl itself, how the data in the
script or read from an external file are encoded.
Cheers,
gregor
--
.''`. Homepage: http://info.comodo.priv.at/ - OpenPGP key 0xBB3A68018649AA06
: :' : Debian GNU/Linux user, admin, and developer - https://www.debian.org/
`. `' Member of VIBE!AT & SPI, fellow of the Free Software Foundation Europe
`- NP: Peter, Paul and Mary: For Loving Me
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 949 bytes
Desc: Digital Signature
URL: <http://lists.alioth.debian.org/pipermail/pkg-perl-maintainers/attachments/20150605/95312a0c/attachment-0001.sig>
More information about the pkg-perl-maintainers
mailing list