[xml/sgml-pkgs] Bug#378411: Buffer overflow in XML::Parser::Expat triggered by utf8

Steinar H. Gunderson sgunderson at bigfoot.com
Tue Sep 5 10:17:30 UTC 2006


On Mon, Aug 07, 2006 at 10:53:38AM +0200, Joris van Rantwijk wrote:
> PS. (and slightly off-topic) My personal opinion is that Perl has
> utterly messed up Unicode handling. The documentation uses the terms
> "Unicode" and "UTF8" as if they were interchangable. In fact, and as we
> see with this bug, there is a very important conceptual difference
> between "a string containing N raw utf8 bytes" and "a string containing
> M logical Unicode characters".

This isn't relevant for the bug report, but I think you got it wrong -- Perl
_does_ distinguish between them. In Perl, a scalar is either binary (in which
case it contains raw bytes), or it is text. In the latter case, it is
logically Unicode, but can be stored internally in iso8859-1 or UTF-8 as Perl
sees fit (and is converted transparently between the two). The fact that XS
modules also have to care about this is, of course, another matter :-)

/* Steinar */
-- 
Homepage: http://www.sesse.net/




More information about the debian-xml-sgml-pkgs mailing list