Bug#745823: libwww-perl: an https request with iso-8859-1 headers, chunked transfer and data with utf8 bit on is corrupted.
John Hughes
john at calva.com
Fri Apr 25 15:10:36 UTC 2014
Package: libwww-perl
Version: 5.836-1
Severity: normal
This was horrible to narrow down, but:
1. I'm doing a POST to a HTTPS url
2. Some of my headers containg iso-8859-1 data
3. The body is sent with transfer-encoding: chunked
4. the "is_utf8" bit was set on the data (although it happens to be
all in code points < 256).
(changing *any* of these conditions makes the bug go away).
The request headers get corrupted, sent in utf-8 instead of iso-8859-1
some of the data doesn't get sent, messing up the chunked counts, or
even trashing the request headers.
The number of missing bytes seems related to the difference in length
between the iso-8859-1 headers and the incorrect utf-8 versions.
For example my request should look like:
----
POST / HTTP/1.1
TE: deflate,gzip;q=0.3
Connection: TE, close
Host: localhost:4433
User-Agent: LWP UTF8 BUG
Subject: ®®®®®®®®®®®®
Transfer-Encoding: chunked
1
®
0
----
But it is sent as:
----
POST / HTTP/1.1
TE: deflate,gzip;q=0.3
Connection: TE, close
Host: localhost:4433
User-Agent: LWP UTF8 BUG
Subject: ®®®®®®®®®®®®
Transfer-Encoding: chunk0
----
Here's my test program:
----
#! /usr/bin/perl
use strict;
use LWP::UserAgent;
my $agent = LWP::UserAgent->new (agent => 'LWP UTF8 BUG');
# Bug only happens if https
my $req = HTTP::Request->new (POST => 'https://localhost:4433');
# Bug only happens if utf8 bit is set on data to be written
my $body = substr ("\x{f00f}\xae", 1, 1);
print "utf8 bit set\n" if utf8::is_utf8($body);
# Bug only happens with chunked content
my $read_body = sub {
my $buf = $body;
$body = "";
$buf
};
$req->content ($read_body);
# Bug only happens if header with iso-8859-1 data
$req->header (Subject => "\xae" x 12);
my $ret = $agent->request ($req);
# Request sent is malformed - iso-8859-1 data sent as utf-8 and
# bytes missing from output (number of bytes missing equal to
# difference in length between iso-8859-1 and utf-8 representations.
---
-- System Information:
Debian Release: 6.0.7
APT prefers oldstable
APT policy: (500, 'oldstable')
Architecture: amd64 (x86_64)
Kernel: Linux 2.6.32-5-amd64 (SMP w/1 CPU core)
Locale: LANG=en_US.utf8, LC_CTYPE=en_US.utf8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash
Versions of packages libwww-perl depends on:
ii libhtml-parser-perl 3.66-1 collection of modules that parse H
ii libhtml-tagset-perl 3.20-2 Data tables pertaining to HTML
ii libhtml-tree-perl 3.23-2 Perl module to represent and creat
ii liburi-perl 1.54-2 module to manipulate and access UR
ii netbase 4.45 Basic TCP/IP networking system
ii perl 5.10.1-17squeeze6 Larry Wall's Practical Extraction
Versions of packages libwww-perl recommends:
ii libhtml-format-perl 2.04-2 format HTML syntax trees into text
ii libio-compress-perl 2.024-1 bundle of IO::Compress modules
ii libmailtools-perl 2.06-1 Manipulate email in perl programs
ii perl [libio-compress-p 5.10.1-17squeeze6 Larry Wall's Practical Extraction
Versions of packages libwww-perl suggests:
ii libcrypt-ssleay-perl 0.57-2 Support for https protocol in LWP
ii libio-socket-ssl-perl 1.33-1+squeeze1 Perl module implementing object or
-- no debconf information
More information about the pkg-perl-maintainers
mailing list