Bug#853091: perl: Dying when matching simple regex: Malformed UTF-8 character fatal
Niko Tyni
ntyni at debian.org
Sun Jan 29 18:24:25 UTC 2017
Control: found -1 5.24.1-1
On Sun, Jan 29, 2017 at 06:23:30PM +0100, Leszek Dubiel wrote:
> Package: perl
> Version: 5.20.2-3+deb8u6
> Severity: normal
>
> This is stripped out program version that causes error:
>
> printf "\x41\x9c\x5a\x0a" | perl -CS -e '$_ = <>; /^(.*)$/ && print "($1)\n"; /[^#]*/;'
>
> It displays:
>
> (A�Z)
> Malformed UTF-8 character (fatal) at -e line 1, <> line 1.
>
> Locale is pl_PL.UTF-8 .
This still happens with 5.24.1-1. It can be reduced to
printf "\x9c\x5a" | perl -CI -ne '/[^#]*/'
The byte sequence is indeed invalid utf8 (as shown by iconv as well),
but you're explicitly telling Perl (with -CS) that it's getting utf8 on
stdin. This is a recipe for problems.
So I'm not sure if it's a bug at all. At most the failure should be
handled a bit more gracefully.
--
Niko Tyni ntyni at debian.org
More information about the Perl-maintainers
mailing list