[debiandoc-sgml-pkgs] Re: Debiandoc/zh-cn fix + UTF-8 modifications
Danai SAE-HAN ( 韓達耐 )
danai.sae-han at edpnet.be
Thu Apr 12 20:54:02 UTC 2007
Hi!
From: Osamu Aoki <osamu at debian.org>
> I think you are in the right path but you need to be careful not to
> break old behavior too.
>
> 'charset' in tools/lib/Locale/{SG,XML} uses traditional non-UTF-8
> encodings. If Japan, EUC-JP, If Wetern Europe, Latin-1, If Russia,
> KOI-8.
I see. So I could just make zh_CN.UTF-8/SGML and change "iso-8859-1"
into "utf-8", right?
> This is what we should do.
>
> We convert all locale specific data to UTF-8 and use them as the base
> data.
>
> We also make traditional non-UTF-8 encoding data at the pacjage build
> time to make traditional behavior available.
Shouldn't be too much of a problem with iconv, but why keep the
traditional encodings for the languages that support UTF-8?
If one needs the GB2312 version, just reencode the files in zh-cn and
use zh_CN.GB2312. In the modifications that I have (locally), I can
use both zh_CN.UTF-8 and zh_CN.GB2312; I only need to reencode the
contents of zh-cn with iconv.
Three files need changing as well in qref: default.ent and
bin/getdocdate (re-encode thepart of zh-cn back to GB2312), and
bin/getlocale (s/zh-cn/zh_CN.GB2312/ instead of zh_CN.UTF-8).
Perhaps we could just add this info in README.Debian for those who
still want the traditional encoding?
> By using new script option (e.g. -u) or specifying full locale name with
> ".utf-8", this script should accept utf-8 encoded data. Oh, html
> generation code needs to be swichable too.
Hmmm, I don't really like this idea. Why should we keep non-UTF-8
data for the languages that are supported? Such as zh_CN: I got UTF-8
support, so I see no reason to keep GB2312-encoded files.
> Another easier and safer approach is to create new UTF-8 version of
> debiandoc-sgml (say, debiandoc-sgml-utf8 package conflicting with
> debiandoc-sgml). Simply use encoding change. Fic html header and latex
> code generation. This is more like what you are thinking.
Indeed, but not by creating a new package. Let's just switch to UTF-8
for the languages that are already supported.
I could get many more languages, but I just need DFSG-free TTF fonts.
Once I get that, then latex-cjk will support about every language
(perhaps an exception for languages with difficult ligatures such as
Arabic or Indic scripts).
> Once you are successful, start filing all debiandoc-sgml depending
> packages to start using new utf-8 version while converting source text
> to UTF-8
>
> I was thinking first option but that may be too complicated. Your
> thought may be good for migration since we still have old package for a
> while.
Perhaps it's best if I uploaded my patch so you could have a look, and
see if it breaks things.
I'll make a patch file, so you can patch it locally. And if it's
okay, then I'll upload it to CVS.
Best regards
Danai SAE-HAN
韓達耐
--
題目:《偶題》
作者:張耒(1052-1112)
相逢記得畫橋頭,花似精神柳似柔。
莫謂無情即無語,春風傳意水傳愁。
More information about the Debiandoc-sgml-pkgs
mailing list