Bug#750946: libhtml-html5-parser-perl: UTF-8 character breaks parse_file

Vincent Lefevre vincent at vinc17.net
Mon Aug 7 02:29:43 UTC 2017


On 2017-08-06 18:11:34 -0700, Gregory Williams wrote:
> The above patch should handle the LWP case which the previously
> suggest patch avoids. It still passes the test suite (which should
> probably be improved to verify this case), and also supports the
> test case detailed in this bug report (though I should mention that
> I believe the test script included by Vincent Lefevre includes a
> double-encoding bug as $doc->toString() actually returns utf8
> encoded bytes, which the :encoding(UTF-8) PerlIO layer on stdout
> will attempt to encode a second time).

Indeed. I would rather see this as a Perl design bug (at least
under UTF-8 locales).

-- 
Vincent Lefèvre <vincent at vinc17.net> - Web: <https://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)



More information about the pkg-perl-maintainers mailing list