Bug#798727: Encode::Unicode decode() dies unnecessarily
Damian Lukowski
damian.lukowski at credativ.de
Fri Sep 11 23:40:29 UTC 2015
Package: perl
Version: 5.20.2-2
The Encode::Unicode documentation states the following:
When BE or LE is omitted during decode(), it checks if BOM is at the
beginning of the string; if one is found, the endianness is set to what
the BOM says. If no BOM is found, the routine dies.
To reproduce:
---
use Encode qw/decode/;
decode("utf-16be", "Hello World"); # does not die
decode("utf-16le", "Hello World"); # does not die
decode("utf-16", "\xFE\xFFHello World"); # does not die
decode("utf-16", "Hello World"); # dies with "UTF-16:Unrecognised BOM"
---
Unicode Standard version 8.0:
The UTF-16 encoding scheme may or may not begin with a BOM. However,
when there is no BOM, and in the absence of a higher-level protocol, the
byte order of the UTF-16 encoding scheme is big-endian.
RFC2781:
If the first two octets of the text is not 0xFE followed by
0xFF, and is not 0xFF followed by 0xFE, then the text SHOULD be
interpreted as being big-endian.
There is a simple fix of doing nothing:
diff --git a/cpan/Encode/Unicode/Unicode.xs b/cpan/Encode/Unicode/Unicode.xs
index cf42ab8..7caf1c1 100644
--- a/cpan/Encode/Unicode/Unicode.xs
+++ b/cpan/Encode/Unicode/Unicode.xs
@@ -164,9 +164,18 @@ CODE:
endian = 'V';
}
else {
- croak("%"SVf":Unrecognised BOM %"UVxf,
- *hv_fetch((HV *)SvRV(obj),"Name",4,0),
- bom);
+ /* No BOM found, use big-endian fallback as specified in
+ * RFC2781 and the Unicode Standard version 8.0:
+ *
+ * The UTF-16 encoding scheme may or may not begin with
+ * a BOM. However, when there is no BOM, and in the
+ * absence of a higher-level protocol, the byte order
+ * of the UTF-16 encoding scheme is big-endian.
+ *
+ * If the first two octets of the text is not 0xFE
+ * followed by 0xFF, and is not 0xFF followed by 0xFE,
+ * then the text SHOULD be interpreted as big-endian.
+ */
}
}
#if 1
CPAN bug report: https://rt.cpan.org/Ticket/Display.html?id=107043
More information about the Perl-maintainers
mailing list