Bug#959474: Issues with Chinese language (all variants) when building some pages in buster
Boyuan Yang
byang at debian.org
Tue May 5 03:19:02 BST 2020
Hi,
在 2020-05-05星期二的 03:34 +0200,Axel Beckert写道:
> → echo 包 | perl -pe 's|\s+\n|\n|sg;'
> 包
> → echo 包 | perl -M"feature unicode_strings" -pe 's|\s+\n|\n|sg;'
> �
>
> Which kinda sounds like a Perl bug. Cc'ing the maintainers of Debian's
> perl package (not the whole Debian Perl Team), maybe they have some
> insight what actually goes wrong here and if that's indeed a Perl bug.
I guess it is a Perl bug. I am listing more Chinese characters other than "包"
here that can trigger the problem:
% echo 包 | perl -M"feature unicode_strings" -pe 's|\s+\n|\n|sg;'
�
% echo 赠 | perl -M"feature unicode_strings" -pe 's|\s+\n|\n|sg;'
�
% echo 传 | perl -M"feature unicode_strings" -pe 's|\s+\n|\n|sg;'
�
% echo 阅 | perl -M"feature unicode_strings" -pe 's|\s+\n|\n|sg;'
�
% echo 加 | perl -M"feature unicode_strings" -pe 's|\s+\n|\n|sg;'
�
% echo 者 | perl -M"feature unicode_strings" -pe 's|\s+\n|\n|sg;'
�
% echo -n 赠 | hexdump -C
00000000 e8 b5 a0
% echo -n 传 | hexdump -C
00000000 e4 bc a0
% echo -n 包 | hexdump -C
00000000 e5 8c 85
% echo -n 阅 | hexdump -C
00000000 e9 98 85
% echo -n 加 | hexdump -C
00000000 e5 8a a0
% echo -n 者 | hexdump -C
00000000 e8 80 85
(Note that 0xA0 and 0x85 at the end.)
Mwei (https://nm.debian.org/person/mwei/) just talked to me saying that it
could be a bug with isSPACE_L1 macro in perl's pp.c. He will be replying the
email soon.
--
Thanks,
Boyuan Yang
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part
URL: <http://alioth-lists.debian.net/pipermail/perl-maintainers/attachments/20200504/b977ac93/attachment-0001.sig>
More information about the Perl-maintainers
mailing list