Bug#787821: libhtml-parser-perl: encode_entities() convert chars to à instead of their proper entity

gregor herrmann gregoa at debian.org
Fri Jun 5 12:31:17 UTC 2015


On Fri, 05 Jun 2015 13:35:24 +0200, Mathieu Roy wrote:

> However, here:
> 
>   $ cat test.pl 
> #!/usr/bin/perl
> 
> use HTML::Entities;
> $input = "vis-à-vis Beyoncé's naïve\npapier-mâché résumé";
> print encode_entities($input), "\n"
> 
> # EOF 
> 
>   $ perl test.pl 
> vis-à-vis Beyoncé's naïve
> papier-mâché résumé

Oh, fun with encodings in general and UTF-8 in particular again.

This works:

% cat test.pl 
#!/usr/bin/perl

use utf8;

use HTML::Entities;
$input = "vis-à-vis Beyoncé's naïve\npapier-mâché résumé";
print encode_entities($input), "\n"


% perl test.pl
vis-à-vis Beyoncé's naïve
papier-mâché résumé

> Where do these à come from?

From perl not knowing that the script ins utf8-encoded and taking it
as Latin1 or something.


So, I'm not sure there is actually a bug somewhere.
With "use utf8;" this works, and perl needs to be told about the
encoding ...


> Plus, as a side bug (require a report on its own?),
> man HTML::Entities prints
> 
>    For example, this:
> 
>         $input = "vis-a-vis Beyonce's naieve\npapier-mache resume";
>         print encode_entities($input), "\n"
> 
>        Prints this out:
> 
>         [...]
> 
> Yes, the man page example is actually stripped of entities to encode!

Ouch, ugly.
Yes, please report a separate bug. 


Cheers,
gregor

-- 
 .''`.  Homepage: http://info.comodo.priv.at/ - OpenPGP key 0xBB3A68018649AA06
 : :' : Debian GNU/Linux user, admin, and developer -  https://www.debian.org/
 `. `'  Member of VIBE!AT & SPI, fellow of the Free Software Foundation Europe
   `-   NP: Penelope Swales: Lost & Found
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 949 bytes
Desc: Digital Signature
URL: <http://lists.alioth.debian.org/pipermail/pkg-perl-maintainers/attachments/20150605/48a68384/attachment.sig>


More information about the pkg-perl-maintainers mailing list