Bug#959474: Issues with Chinese language (all variants) when building some pages in buster
Yao Wei (=?UTF-8?Q?=E9=AD=8F=E9=8A=98=E5=BB=B7?=)
mwei at lxde.org
Tue May 5 03:18:39 BST 2020
Package: www.debian.org
Followup-For: Bug #959474
Hi,
After a bit of investigation of Perl source code (5.31.11 downloaded
from upstream) I found the they have weird handling of whitespace when
`feature unicode_strings` turned on. I am not a perl person and I
haven't executed the source code yet, so my interpretation might be
wrong.
When `unicode_strings` is on, `in_uni_8_bit` should true internally, and
in three places of pp.c:6040, pp.c:6076, pp.c:6114 `isSPACE_L1` is
called to check whether the examining character is a whitespace, by
checking whether the character is 0x85 or 0xA0 (handy.h:1611). In the
case of the character 包, the last byte of 3-byte UTF-8 code is 0x85,
henceforth the problem.
-- System Information:
Debian Release: bullseye/sid
APT prefers unstable
APT policy: (500, 'unstable')
Architecture: amd64 (x86_64)
Kernel: Linux 5.6.0-1-amd64 (SMP w/8 CPU cores)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8), LANGUAGE=en_US.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /usr/bin/dash
Init: systemd (via /run/systemd/system)
LSM: AppArmor: enabled
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <http://alioth-lists.debian.net/pipermail/perl-maintainers/attachments/20200505/f7cdcf15/attachment-0001.sig>
More information about the Perl-maintainers
mailing list