Bug#521177: libwww-perl: LWP::UserAgent::request() fails with 'Wide character in syswrite' when posting UTF-8 encoded body
Ruzsa Balazs
ruzsa.balazs at interware.co.hu
Wed Mar 25 13:53:25 UTC 2009
Package: libwww-perl
Version: 5.813-1
Severity: important
Here is what I tried to do:
------------cut------------
#!/usr/bin/perl
use strict;
use warnings;
use encoding 'iso-8859-2';
use Encode;
use LWP::UserAgent;
use HTTP::Request;
my $POST_URL = "http://somewhere.net/webservice.php";
my $xml = <<"EOT";
<?xml version="1.0" encoding="utf-8" ?>
<PACKET>
<TEXT>Árvíztûrõ tükörfúrógép</TEXT>
</PACKET>
EOT
my $ua = LWP::UserAgent->new();
my $request = HTTP::Request->new('POST', $POST_URL);
my $content = encode('utf-8', $xml);
$request->header('Content-Type' => 'text/xml; charset=utf-8');
$request->header('Content-Length' => length($content));
$request->content($content);
my $response = $ua->request($request);
------------cut------------
Here is what I get when Perl tries to execute the last line:
------------cut------------
failed: 500 Wide character in syswrite
Content-Type: text/plain
Client-Date: Wed, 25 Mar 2009 13:21:30 GMT
Client-Warning: Internal response
500 Wide character in syswrite
------------cut------------
The message in the <TEXT> tag is a test phrase containing all possible accented
characters in the Hungarian language. It is encoded as 'iso-8859-2' in the
source file. Thanks to the 'use encoding' pragma this is converted to
character semantics (utf8 flag on) when Perl reads the source.
After some bughunting, I identified the source of the problem in
/usr/share/perl5/LWP/Protocol/http.pm:
202: my $req_buf = $socket->format_request($method, $fullpath, @h);
...
235: if ($has_content) {
...
249: my $buf = $req_buf . $$content_ref; # <--- HERE
If $$content_ref contains a byte-string (a string with byte semantics) and
$req_buf is a character-string (a string with character semantics) then upon
concatenation, $$content_ref will be converted to character semantics with the
default 'iso-8859-1' encoding (this conversion happens even if $req_buf
contains only ASCII characters). In my example, this means that Perl converts
my utf-8 encoded test phrase to a string that contains consecutive bytes of
utf-8 sequences masquerading as separate characters.
What I don't understand: LWP::UserAgent should be able to send the resulting -
"semantically" wrong, but "syntactically" right - string over the wire, as it
contains only characters with code points < 256. So I still don't understand
where those "wide characters" - which I assume to be characters with code
points >= 256 - are coming from.
Anyway, the problem can be resolved with the following lines added after line
#202:
my $req_buf = $socket->format_request($method, $fullpath, @h);
use Encode;
if (Encode::is_utf8($req_buf)) {
Encode::_utf8_off($req_buf);
}
This simply makes sure that the buffer storing the HTTP headers does not have
the 'utf8' flag turned on. I can only hope that the $req_buf returned by
format_request does not contain non-ASCII characters (it shouldn't).
With this change, the concatenation above does not touch $$content_ref and the
request gets posted without errors.
-- System Information:
Debian Release: 5.0
APT prefers stable
APT policy: (500, 'stable')
Architecture: i386 (i686)
Kernel: Linux 2.6.28.7prana (PREEMPT)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/bash
Versions of packages libwww-perl depends on:
ii libhtml-parser-perl 3.56-1+b1 A collection of modules that parse
ii libhtml-tagset-perl 3.20-2 Data tables pertaining to HTML
ii libhtml-tree-perl 3.23-1 represent and create HTML syntax t
ii liburi-perl 1.35.dfsg.1-1 Manipulates and accesses URI strin
ii netbase 4.34 Basic TCP/IP networking system
ii perl [libdigest-md5-perl] 5.10.0-19 Larry Wall's Practical Extraction
Versions of packages libwww-perl recommends:
ii libcompress-zlib-perl 2.012-1 Perl module for creation and manip
pn libhtml-format-perl <none> (no description available)
ii libmailtools-perl 2.03-1 Manipulate email in perl programs
Versions of packages libwww-perl suggests:
ii libio-socket-ssl-perl 1.16-1 Perl module implementing object or
-- no debconf information
More information about the pkg-perl-maintainers
mailing list