Bug#787821: libhtml-parser-perl: encode_entities() convert chars to à instead of their proper entity
gregor herrmann
gregoa at debian.org
Fri Jun 5 12:31:17 UTC 2015
On Fri, 05 Jun 2015 13:35:24 +0200, Mathieu Roy wrote:
> However, here:
>
> $ cat test.pl
> #!/usr/bin/perl
>
> use HTML::Entities;
> $input = "vis-à-vis Beyoncé's naïve\npapier-mâché résumé";
> print encode_entities($input), "\n"
>
> # EOF
>
> $ perl test.pl
> vis-à -vis Beyoncé's naïve
> papier-mâché résumé
Oh, fun with encodings in general and UTF-8 in particular again.
This works:
% cat test.pl
#!/usr/bin/perl
use utf8;
use HTML::Entities;
$input = "vis-à-vis Beyoncé's naïve\npapier-mâché résumé";
print encode_entities($input), "\n"
% perl test.pl
vis-à-vis Beyoncé's naïve
papier-mâché résumé
> Where do these à come from?
From perl not knowing that the script ins utf8-encoded and taking it
as Latin1 or something.
So, I'm not sure there is actually a bug somewhere.
With "use utf8;" this works, and perl needs to be told about the
encoding ...
> Plus, as a side bug (require a report on its own?),
> man HTML::Entities prints
>
> For example, this:
>
> $input = "vis-a-vis Beyonce's naieve\npapier-mache resume";
> print encode_entities($input), "\n"
>
> Prints this out:
>
> [...]
>
> Yes, the man page example is actually stripped of entities to encode!
Ouch, ugly.
Yes, please report a separate bug.
Cheers,
gregor
--
.''`. Homepage: http://info.comodo.priv.at/ - OpenPGP key 0xBB3A68018649AA06
: :' : Debian GNU/Linux user, admin, and developer - https://www.debian.org/
`. `' Member of VIBE!AT & SPI, fellow of the Free Software Foundation Europe
`- NP: Penelope Swales: Lost & Found
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 949 bytes
Desc: Digital Signature
URL: <http://lists.alioth.debian.org/pipermail/pkg-perl-maintainers/attachments/20150605/48a68384/attachment.sig>
More information about the pkg-perl-maintainers
mailing list