Bug#661736: HTML::FormatText and UTF-8
brian m. carlson
sandals at crustytoothpaste.net
Thu Dec 6 02:26:04 UTC 2012
This issue is this line (line 135):
$text =~ tr/\xA0\xAD/ /d;
This works great if your data is in a Unicode string. It also works
great if your data is a byte string using Latin-1. It works very poorly
if your UTF-8 data is in a byte string. In the example given in the
original bug report, -Mutf8 was not used, so the data is treated as a
series of (two) Latin-1 characters.
vauxhall ok % perl -MHTML::FormatText -Mutf8 -C6 -E 'print HTML::FormatText->new->format_string("à")' |hd
00000000 c3 a0 0a |...|
00000003
vauxhall ok % perl -MHTML::FormatText -Mutf8 -E 'print HTML::FormatText->new->format_string("à")' |hd
00000000 e0 0a |..|
00000002
I suspect the correct fix for this bug is documentation.
--
brian m. carlson / brian with sandals: Houston, Texas, US
+1 832 623 2791 | http://www.crustytoothpaste.net/~bmc | My opinion only
OpenPGP: RSA v4 4096b: 88AC E9B2 9196 305B A994 7552 F1BA 225C 0223 B187
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: Digital signature
URL: <http://lists.alioth.debian.org/pipermail/pkg-perl-maintainers/attachments/20121206/8814c213/attachment.pgp>
More information about the pkg-perl-maintainers
mailing list