Bug#273600: [xml/sgml-pkgs] Bug#273600: After switch to native transcoder, KOI8-R support in xerces23 is broken

Jay Berkenbilt Jay Berkenbilt <ejb@ql.org>, 273600@bugs.debian.org
Tue, 28 Sep 2004 14:31:03 +0000


I really appreciate your help on this -- I'm American (What do you
call someone who speaks one language?  American.) and don't have much
experience with working with different encodings, so I am likely to
make mistakes in this area.  With your help, we can make the xerces
packages satisfy the needs of a wider range of users.  Let's stick to
this until it's done right.  (Note: I'll be traveling some this week
and may not be very responsive for a few days.)  I will add your
KOI8-R example to my test suite to make sure this gets fixed and stays
fixed one way or another.

>   >  2.  Does this happen if you use the icu version of xerces, available
>   >      in the libxercesicu23 (or libxercesicu25) package?  If these
>   >      work, are they a possibility?
>
>   With libxercesicu23 problem is not reproducable. It may be a
>   workaround for now, but requires 'icu' package to be installed
>   manually.  Anyway, see below.

Why does icu have to be installed manually?  libxercesicu23 has a
dependency on libicu21c102.  Does it need additional dependencies in
order for your application to work correctly?  If so, this is a bug
(in the xerces packaging) that should be fixed.  (I'll experiment with
this and see if I can figure out what the problem is here, but any
additional hints you can provide would be helpful.)

>   > It's possible that the native transcoder supports fewer encodings, but
>   > we found a handful of other problems that went away when we switched
>   > from gnu to native.
>
>   I've checked the code, by re-building package with debugging symbols and 
>   using gdb on the test attached to the original bug report..
>
>   Actualy, 'native transcoder' does not support any encodings at all!

Well, it supports UTF-8 and ISO-8859-1, both of which are exercised in
my test suites.  I guess it isn't able to load any non-built-in
encodings.  Anyway, clearly it doesn't support what you need it to
support, so there is a problem here that has to be resolved, and I
appreciate your bringing it up.

>   I believe the transcoder switch should be reverted. Removing useful
>   features is not the proper way to deal with bugs.

I agree whole-heartedly.  I didn't think I was removing useful
features -- I didn't realize (in spite of asking questions on the
xerces mailing lists and digging through documentation) that there
would be a loss of functionality, and I thought having an icu version
of the library would satisfy the needs of users with deeper encoding
needs.

>   If you really really think that supporting encodings is not
>   important to you, at least you should explicitly mention that in
>   package description (I really want localization-related bugs to
>   become RC one day, to make package maintainers take them
>   seriously... but that's just a dream...).  However, a better
>   solution is to re-enable encoding support (revert transcoder
>   change) and make real fixes for bugs "closed" by change.

Rest assured, I am taking this seriously.  Before I switch back to the
GNU transcoder, I need to study more deeply what was the underlying on
the bugs that were closed after switching to native.  (I was not able
to reproduce any of them myself because I don't have access to
non-i386 architectures.)  Also, please explain to me why simply using
libxercesicu25 isn't sufficient.  If there is something wrong with the
dependencies, I can fix that.  It may then be suitable to say
something in the description and/or README.Debian about the native
version being faster but not supporting encodings other than the
default build-in encodings.  Alternatively, maybe libxerces{23,24,25}
should be icu versions (and should provide libxercesicu{23,24,25} for
backward-compatibility) and the native version should be provided as
libxercesnative{23,24,25} for people who don't need the additional
functionality and want to avoid the overhead.

If the right fix is to revert the transcoder change, I'll make sure it
gets done quickly so we don't ship a broken version in sarge.

-- 
Jay Berkenbilt <ejb@ql.org>
http://www.ql.org/q/