Bug#516129: perl-modules: CGI.pm unwanted UTF-8 conversion in URLs

Gabor Kiss kissg at ssg.ki.iif.hu
Thu Feb 19 12:54:48 UTC 2009


Package: perl-modules
Version: 5.10.0-19
Severity: normal

Function url(-path-info=>1) does not work well if I have ISO-8859-2
accented chars in the URL. Utility function CGI::Util::escape()
unconditionally forces an ISO-8859-1 -> UTF-8 conversion:

  # force bytes while preserving backward compatibility -- dankogai
  $toencode = pack("C*", unpack("U0C*", $toencode));

This code produces from original URL "...&word_to_search=v%E1ros&..."
another one: "...&word_to_search==v%C3%A1ros&..."

First time the search engine gets the word "város" but in the
next round it becames "vÃáros" because the whole program is based
on 8 bit chars instead of UTF-8.

(In the next cycle it will v%C3%83%C2%A1ros and so on.)

The basic problem is that functions CGI::Util::unescape()
and CGI::Util::escape() are incompatible.
unescape(escape($string)) is not an idempotent operation.

Gábor

-- System Information:
Debian Release: 5.0
  APT prefers proposed-updates
  APT policy: (500, 'proposed-updates'), (500, 'testing'), (500, 'stable')
Architecture: i386 (i686)

Kernel: Linux 2.6.26-1-686 (SMP w/2 CPU cores)
Locale: LANG=en_US, LC_CTYPE=en_US (charmap=ISO-8859-1)
Shell: /bin/sh linked to /bin/bash

Versions of packages perl-modules depends on:
ii  perl                          5.10.0-19  Larry Wall's Practical Extraction 

perl-modules recommends no packages.

perl-modules suggests no packages.

-- no debconf information






More information about the Perl-maintainers mailing list