Bug#516129: perl-modules: CGI.pm unwanted UTF-8 conversion in URLs
Gabor Kiss
kissg at ssg.ki.iif.hu
Thu Feb 19 12:54:48 UTC 2009
Package: perl-modules
Version: 5.10.0-19
Severity: normal
Function url(-path-info=>1) does not work well if I have ISO-8859-2
accented chars in the URL. Utility function CGI::Util::escape()
unconditionally forces an ISO-8859-1 -> UTF-8 conversion:
# force bytes while preserving backward compatibility -- dankogai
$toencode = pack("C*", unpack("U0C*", $toencode));
This code produces from original URL "...&word_to_search=v%E1ros&..."
another one: "...&word_to_search==v%C3%A1ros&..."
First time the search engine gets the word "város" but in the
next round it becames "vÃáros" because the whole program is based
on 8 bit chars instead of UTF-8.
(In the next cycle it will v%C3%83%C2%A1ros and so on.)
The basic problem is that functions CGI::Util::unescape()
and CGI::Util::escape() are incompatible.
unescape(escape($string)) is not an idempotent operation.
Gábor
-- System Information:
Debian Release: 5.0
APT prefers proposed-updates
APT policy: (500, 'proposed-updates'), (500, 'testing'), (500, 'stable')
Architecture: i386 (i686)
Kernel: Linux 2.6.26-1-686 (SMP w/2 CPU cores)
Locale: LANG=en_US, LC_CTYPE=en_US (charmap=ISO-8859-1)
Shell: /bin/sh linked to /bin/bash
Versions of packages perl-modules depends on:
ii perl 5.10.0-19 Larry Wall's Practical Extraction
perl-modules recommends no packages.
perl-modules suggests no packages.
-- no debconf information
More information about the Perl-maintainers
mailing list