Bug#787821: libhtml-parser-perl: encode_entities() convert chars to à instead of their proper entity

Mathieu Roy yeupou at gnu.org
Fri Jun 5 11:35:24 UTC 2015


Package: libhtml-parser-perl
Version: 3.71-1+b3
Severity: important

Hello,

According to http://search.cpan.org/dist/HTML-Parser/lib/HTML/Entities.pm


 use HTML::Entities;
 $input = "vis-à-vis Beyoncé's naïve\npapier-mâché résumé";
 print encode_entities($input), "\n"

print 

 vis-à-vis Beyoncé's naïve
 papier-mâché résumé


That's correct.


However, here:

  $ cat test.pl 
#!/usr/bin/perl

use HTML::Entities;
$input = "vis-à-vis Beyoncé's naïve\npapier-mâché résumé";
print encode_entities($input), "\n"

# EOF 

  $ perl test.pl 
vis-à-vis Beyoncé's naïve
papier-mâché résumé


Where do these à come from?
According to http://www.w3schools.com/charsets/ref_html_entities_4.asp it's for Ã.

I tested the same script on a debian stable and on some ubuntu with the exact same result.

I dont know what I'm doing wrong here but a simple copy/paste of the documented example does not work.

Other similar commands work as expected. For instance:

echo "vis-à-vis Beyoncé's naïve\npapier-mâché résumé" | recode utf8..html
vis-à-vis Beyoncé's naïve\npapier-mâché résumé




Plus, as a side bug (require a report on its own?),
man HTML::Entities prints

   For example, this:

        $input = "vis-a-vis Beyonce's naieve\npapier-mache resume";
        print encode_entities($input), "\n"

       Prints this out:

        [...]

Yes, the man page example is actually stripped of entities to encode!






-- System Information:
Debian Release: stretch/sid
  APT prefers testing
  APT policy: (990, 'testing'), (500, 'unstable'), (1, 'experimental')
Architecture: amd64 (x86_64)

Kernel: Linux 3.16.0-4-amd64 (SMP w/6 CPU cores)
Locale: LANG=fr_FR.UTF-8, LC_CTYPE=fr_FR.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash
Init: sysvinit (via /sbin/init)

Versions of packages libhtml-parser-perl depends on:
ii  libc6                       2.19-18
ii  libhtml-tagset-perl         3.20-2
ii  liburi-perl                 1.64-1
ii  perl                        5.20.2-6
ii  perl-base [perlapi-5.20.1]  5.20.2-6

libhtml-parser-perl recommends no packages.

Versions of packages libhtml-parser-perl suggests:
pn  libdata-dump-perl  <none>

-- no debconf information



More information about the pkg-perl-maintainers mailing list