Bug#959474: Issues with Chinese language (all variants) when building some pages in buster

Boyuan Yang byang at debian.org
Tue May 5 03:19:02 BST 2020


Hi,

在 2020-05-05星期二的 03:34 +0200,Axel Beckert写道:
> → echo 包 | perl -pe 's|\s+\n|\n|sg;'
>> → echo 包 | perl -M"feature unicode_strings" -pe 's|\s+\n|\n|sg;'
>> 
> Which kinda sounds like a Perl bug. Cc'ing the maintainers of Debian's
> perl package (not the whole Debian Perl Team), maybe they have some
> insight what actually goes wrong here and if that's indeed a Perl bug.

I guess it is a Perl bug. I am listing more Chinese characters other than "包"
here that can trigger the problem:


% echo 包 | perl -M"feature unicode_strings" -pe 's|\s+\n|\n|sg;'
�
% echo 赠 | perl -M"feature unicode_strings" -pe 's|\s+\n|\n|sg;'
�
% echo 传 | perl -M"feature unicode_strings" -pe 's|\s+\n|\n|sg;'
�
% echo 阅 | perl -M"feature unicode_strings" -pe 's|\s+\n|\n|sg;'
�
% echo 加 | perl -M"feature unicode_strings" -pe 's|\s+\n|\n|sg;'
�
% echo 者 | perl -M"feature unicode_strings" -pe 's|\s+\n|\n|sg;'
�

% echo -n 赠 | hexdump -C
00000000  e8 b5 a0
% echo -n 传 | hexdump -C
00000000  e4 bc a0
% echo -n 包 | hexdump -C                                        
00000000  e5 8c 85
% echo -n 阅 | hexdump -C
00000000  e9 98 85
% echo -n 加 | hexdump -C
00000000  e5 8a a0
% echo -n 者 | hexdump -C
00000000  e8 80 85

(Note that 0xA0 and 0x85 at the end.)

Mwei (https://nm.debian.org/person/mwei/) just talked to me saying that it
could be a bug with isSPACE_L1 macro in perl's pp.c. He will be replying the
email soon.

-- 
Thanks,
Boyuan Yang
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part
URL: <http://alioth-lists.debian.net/pipermail/perl-maintainers/attachments/20200504/b977ac93/attachment-0001.sig>


More information about the Perl-maintainers mailing list