Bug#959474: Issues with Chinese language (all variants) when building some pages in buster
Yao Wei
mwei at debian.org
Tue May 5 04:14:31 BST 2020
On Mon, May 04, 2020 at 10:19:02PM -0400, Boyuan Yang wrote:
> Mwei (https://nm.debian.org/person/mwei/) just talked to me saying that it
> could be a bug with isSPACE_L1 macro in perl's pp.c. He will be replying the
> email soon.
>
Hi,
(I used reportbug to handle reply of this thread, and I missed a lot of
recipients here. This is a resend of reply in #959474. Sorry for the
noise.)
After a bit of investigation of Perl source code (5.31.11 downloaded
from upstream) I found the they have weird handling of whitespace when
`feature unicode_strings` turned on. I am not a perl person and I
haven't executed the source code yet, so my interpretation might be
wrong.
When `unicode_strings` is on, `in_uni_8_bit` should true internally, and
in three places of pp.c:6040, pp.c:6076, pp.c:6114 `isSPACE_L1` is
called to check whether the examining character is a whitespace, by
checking whether the character is 0x85 or 0xA0 (handy.h:1611). In the
case of the character 包, the last byte of 3-byte UTF-8 code is 0x85,
henceforth the problem.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <http://alioth-lists.debian.net/pipermail/perl-maintainers/attachments/20200505/5491f7c1/attachment-0001.sig>
More information about the Perl-maintainers
mailing list