Bug#320406: Unicode::MapUTF8 does not handle one byte BIG5 characters properly

Niko Tyni ntyni at iki.fi
Mon Dec 26 20:07:41 UTC 2005


reassign 320406 libunicode-map-perl 0.112-8
tags 320406 patch
forwarded 320406 http://rt.cpan.org/NoAuth/Bug.html?id=16734
thanks

> Unicode::MapUTF8 fails to handle one byte BIG5 characters properly;

Hi,

this bug is actually in Unicode::Map, which Unicode::MapUTF8 uses.  The
BIG5 map distributed in Unicode::Map 0.112 (Map/EASTASIA/BIG5.map) is
missing the characters 0-127, which are the same as the respective
ASCII characters.

The actual error is in the original input file, currently at 
< ftp://ftp.unicode.org/Public/MAPPINGS/OBSOLETE/EASTASIA/OTHER/BIG5.TXT >.
Since that file is now considered obsolete by the Unicode Consortium, 
I suppose they are not interested in updating it.

I have re-reported the bug in the upstream CPAN bug tracker against the
correct package. The URL is < http://rt.cpan.org/NoAuth/Bug.html?id=16734 >.

The attached patch fixes the problem by modifying the binary mapping
file at package build time.  It also includes a new test that checks
for the correct BIG5 behaviour.

We cannot include the original input file, modified or not, since its
header explicitly forbids redistributing it to third parties. In a
strict sense, this means the character maps licensed in this way are
non-free since their sources (as in "preferred form of modification")
are not distributable.  As it's relatively easy to modify the binary
maps as well, I suppose this is not a critical issue.

Cheers,
-- 
Niko Tyni	ntyni at iki.fi
-------------- next part --------------
diff -urN libunicode-map-perl-0.112/debian/fix-big5 libunicode-map-perl-0.112-big5/debian/fix-big5
--- libunicode-map-perl-0.112/debian/fix-big5	1970-01-01 02:00:00.000000000 +0200
+++ libunicode-map-perl-0.112-big5/debian/fix-big5	2005-12-26 20:42:27.790460730 +0200
@@ -0,0 +1,33 @@
+#!/bin/sh
+# Insert the US-ASCII compatible characters into the BIG5 binary map,
+# since the original source file (currently  
+#  ftp://ftp.unicode.org/Public/MAPPINGS/OBSOLETE/EASTASIA/OTHER/BIG5.TXT
+# ) doesn't include them. Note that this depends on the binary map format.
+# Hopefully by the time it is changed, this problem has been fixed upstream.
+#
+# See http://bugs.debian.org/320406
+#
+# Copyright Niko Tyni <ntyni at iki.fi> 2005
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of either:
+# 
+# a) the GNU General Public License as published by the Free Software
+#    Foundation; either version 1, or (at your option) any later
+#    version, or
+# 
+# b) the "Artistic License" which comes with Perl.
+
+set -e
+
+IN=Map/EASTASIA/BIG5.map
+DD=/bin/dd
+PF=/usr/bin/printf
+
+$DD if=$IN bs=1 count=12 # header, 12 bytes
+$PF "\x0\x8\x0"          # partial key-value mappings
+$PF "\x8\x1\x10\x1"      # input 1 char of 8-bits at a time, output 1 char of 16 bits
+$PF "\x80\x0\x80\x0\x0"  # 128 characters starting at 0x00 -> 128 chars starting at 0x0000
+$PF "\x0\x0\x0"          # end of submap
+$DD if=$IN bs=1 skip=12  # rest of the file
+
diff -urN libunicode-map-perl-0.112/debian/rules libunicode-map-perl-0.112-big5/debian/rules
--- libunicode-map-perl-0.112/debian/rules	2005-12-26 20:15:47.328299427 +0200
+++ libunicode-map-perl-0.112-big5/debian/rules	2005-12-26 19:33:57.176463414 +0200
@@ -18,10 +18,11 @@
 		dh_testroot
 		[ ! -f Makefile ] || $(MAKE) realclean
 		dh_clean
-		rm -f build-stamp install-stamp
+		[ ! -f fix-big5-stamp ] || mv debian/BIG5.map.dist Map/EASTASIA/BIG5.map
+		rm -f build-stamp install-stamp fix-big5-stamp
 
 build:		build-stamp
-build-stamp:
+build-stamp:	fix-big5
 		dh_testdir
 		perl Makefile.PL INSTALLDIRS=vendor
 		$(MAKE) OPTIMIZE="-O2 -g -Wall"
@@ -37,6 +38,15 @@
 		$(MAKE) install PREFIX=$(PWD)/$(TMP_DIR)/usr
 		touch install-stamp
 
+# fix the BIG5 map on the fly
+fix-big5:	fix-big5-stamp
+fix-big5-stamp:
+		dh_testdir
+		debian/fix-big5 > debian/BIG5.map.new
+		mv Map/EASTASIA/BIG5.map debian/BIG5.map.dist
+		mv debian/BIG5.map.new Map/EASTASIA/BIG5.map
+		touch fix-big5-stamp
+
 binary-indep:
 
 binary-arch:	build install
@@ -58,6 +68,6 @@
 
 binary:		binary-indep binary-arch
 
-.PHONY:		clean build install binary-indep binary-arch binary
+.PHONY:		clean build install binary-indep binary-arch binary fix-big5
 
 ## ----------------------------------------------------------------------
diff -urN libunicode-map-perl-0.112/t/map.t libunicode-map-perl-0.112-big5/t/map.t
--- libunicode-map-perl-0.112/t/map.t	2001-01-07 23:51:18.000000000 +0200
+++ libunicode-map-perl-0.112-big5/t/map.t	2005-12-26 19:33:20.566730033 +0200
@@ -6,7 +6,7 @@
 # Change 1..1 below to 1..last_test_to_print .
 # (It may become useful if the test is moved to ./t subdirectory.)
 
-BEGIN { $| = 1; print "1..5\n"; }
+BEGIN { $| = 1; print "1..6\n"; }
 END {print "not ok 1\n" unless $loaded;}
 use Unicode::Map;
 $loaded = 1;
@@ -27,6 +27,7 @@
    ["GB2312",         "n->m: GB2312 (GB2312-80^8080 + ISO8859-1)"],
    ["DEVANAGA",       "n->m: DEVANAGA"],
    ["EUC_JP",         "n->m: EUC-JP"],
+   ["BIG5",           "n->m: BIG5"],
 );
 
 {
@@ -133,6 +134,21 @@
    return testMapping ( "APPLE-DEVANAGA", $_locale, $_unicode );
 }
 
+sub BIG5 {
+   my $_locale  =
+      "\xA5\x40"
+      ."\xA5\x41"
+      ."\x30"
+      ."  "
+   ;
+   my $_unicode =
+      "\x4E\x16"
+      ."\x4E\x15"
+      ."\x00\x30\x00\x20\x00\x20"
+   ;
+   return testMapping ( "BIG5", $_locale, $_unicode );
+}
+
 sub testMapping {
     my ( $charsetId, $txtLocale, $txtUnicode ) = @_;
     return 0 if ! ( my $Map = new Unicode::Map($charsetId) );


More information about the pkg-perl-maintainers mailing list