Bug#959761: Bug#959474: Issues with Chinese language (all variants) when building some pages in buster
gregor herrmann
gregoa at debian.org
Tue May 5 11:16:17 BST 2020
On Tue, 05 May 2020 10:53:29 +0200, Axel Beckert wrote:
> > Perhaps the strings in wml need to be decoded from UTF-8 so that they
> > aren't treated as a sequence of independent bytes?
> ... and would have expect "use feature unicode_strings;" already
> activates all of this.
(I haven't read the thread in detail …).
Personally I often use "use utf8:all" (from libutf8-all-perl) if I'm
reasonably sure that the input is not weird and I want to output
utf-8. It is sometimes a bit slow but handles all the en/decoding in
my experience.
> > Explicitly using Encode helps:
> >
> > echo 包 | perl -E 'use Encode qw(decode_utf8); while(<>) { $_ = decode_utf8($_); s|\s+\n|\n|sg; print }'
> > Wide character in print at -e line 1, <> line 1.
> > 包
% time echo 包 | perl -E 'use Encode qw(decode_utf8); while(<>) { $_ = decode_utf8($_); s|\s+\n|\n|sg; print }'
Wide character in print at -e line 1, <> line 1.
包
echo 包 0.00s user 0.00s system 42% cpu 0.002 total
perl -E 0.03s user 0.01s system 97% cpu 0.034 total
% time echo 包 | perl -Mutf8::all -E ' while(<>) { s|\s+\n|\n|sg; print }'
包
echo 包 0.00s user 0.00s system 63% cpu 0.002 total
perl -Mutf8::all -E ' while(<>) { s|\s+\n|\n|sg; print }' 0.04s user 0.01s system 98% cpu 0.050 total
% time echo 包 | perl -CS -E 'while(<>) { s|\s+\n|\n|sg; print }'
包
echo 包 0.00s user 0.00s system 60% cpu 0.002 total
perl -CS -E 'while(<>) { s|\s+\n|\n|sg; print }' 0.00s user 0.00s system 83% cpu 0.005 total
Cheers,
gregor
--
.''`. https://info.comodo.priv.at -- Debian Developer https://www.debian.org
: :' : OpenPGP fingerprint D1E1 316E 93A7 60A8 104D 85FA BB3A 6801 8649 AA06
`. `' Member VIBE!AT & SPI Inc. -- Supporter Free Software Foundation Europe
`- BOFH excuse #378: Operators killed by year 2000 bug bite.
More information about the Perl-maintainers
mailing list