Bug#787821: libhtml-parser-perl: encode_entities() convert chars to à instead of their proper entity
Mathieu Roy
yeupou at gnu.org
Fri Jun 5 11:35:24 UTC 2015
Package: libhtml-parser-perl
Version: 3.71-1+b3
Severity: important
Hello,
According to http://search.cpan.org/dist/HTML-Parser/lib/HTML/Entities.pm
use HTML::Entities;
$input = "vis-à-vis Beyoncé's naïve\npapier-mâché résumé";
print encode_entities($input), "\n"
print
vis-à-vis Beyoncé's naïve
papier-mâché résumé
That's correct.
However, here:
$ cat test.pl
#!/usr/bin/perl
use HTML::Entities;
$input = "vis-à-vis Beyoncé's naïve\npapier-mâché résumé";
print encode_entities($input), "\n"
# EOF
$ perl test.pl
vis-à -vis Beyoncé's naïve
papier-mâché résumé
Where do these à come from?
According to http://www.w3schools.com/charsets/ref_html_entities_4.asp it's for Ã.
I tested the same script on a debian stable and on some ubuntu with the exact same result.
I dont know what I'm doing wrong here but a simple copy/paste of the documented example does not work.
Other similar commands work as expected. For instance:
echo "vis-à-vis Beyoncé's naïve\npapier-mâché résumé" | recode utf8..html
vis-à-vis Beyoncé's naïve\npapier-mâché résumé
Plus, as a side bug (require a report on its own?),
man HTML::Entities prints
For example, this:
$input = "vis-a-vis Beyonce's naieve\npapier-mache resume";
print encode_entities($input), "\n"
Prints this out:
[...]
Yes, the man page example is actually stripped of entities to encode!
-- System Information:
Debian Release: stretch/sid
APT prefers testing
APT policy: (990, 'testing'), (500, 'unstable'), (1, 'experimental')
Architecture: amd64 (x86_64)
Kernel: Linux 3.16.0-4-amd64 (SMP w/6 CPU cores)
Locale: LANG=fr_FR.UTF-8, LC_CTYPE=fr_FR.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash
Init: sysvinit (via /sbin/init)
Versions of packages libhtml-parser-perl depends on:
ii libc6 2.19-18
ii libhtml-tagset-perl 3.20-2
ii liburi-perl 1.64-1
ii perl 5.20.2-6
ii perl-base [perlapi-5.20.1] 5.20.2-6
libhtml-parser-perl recommends no packages.
Versions of packages libhtml-parser-perl suggests:
pn libdata-dump-perl <none>
-- no debconf information
More information about the pkg-perl-maintainers
mailing list