[debiandoc-sgml-pkgs] Re: Debiandoc/zh-cn fix + UTF-8 modifications
Osamu Aoki
osamu at debian.org
Thu Apr 12 14:38:40 UTC 2007
On Thu, Apr 12, 2007 at 03:24:50AM +0200, Danai SAE-HAN wrote:
>
> Hi!
>
> 1.
> zh_CN.GB2312 doesn't work because of one character: ä.
> The a with an umlaut doesn't exist in GB2312.
> So I'd like to ask to change "Esko Arajärvi" into "Esko Arajärvi"
> twice in zh-cn/append.sgml [qref].
>
> I could do it myself, but I ought to go to bed now. =_=.zZ
>
> 2.
> I've built qref with TeXLive2007 and it works perfectly.
> What we need to do know, is to find out which packages qref exactly
> needs from the TL packages. "texlive-base-bin" and
> "texlive-latex-extra" look like obvious candidates, but how about
> other packages? Perhaps texlive-lang-*?
>
> 3.
> I have reencoded the zh_CN files into UTF-8, and it works (with TL2007
> + CJK4.7.0). I have also made a few changes locally in my qref and
> debiandoc-sgml tree to allow zh_CN.UTF-8. I could build all packages
> from qref without breaking anything; the resulting PDF and PS files
> compiled without problems. All the fonts are embedded.
>
> If you want, I could upload these changes to qref and debiandoc-sgml
> tomorrow, and if it works also for you, then I'll do the same for
> zh_TW and ja.
>
> I'm not sure if 'charset' in tools/lib/Locale/{SG,XML}
> [debiandoc-sgml] should be changed or not.
I think you are in the right path but you need to be careful not to
break old behavior too.
'charset' in tools/lib/Locale/{SG,XML} uses traditional non-UTF-8
encodings. If Japan, EUC-JP, If Wetern Europe, Latin-1, If Russia,
KOI-8.
This is what we should do.
We convert all locale specific data to UTF-8 and use them as the base
data.
We also make traditional non-UTF-8 encoding data at the pacjage build
time to make traditional behavior available.
By using new script option (e.g. -u) or specifying full locale name with
".utf-8", this script should accept utf-8 encoded data. Oh, html
generation code needs to be swichable too.
Another easier and safer approach is to create new UTF-8 version of
debiandoc-sgml (say, debiandoc-sgml-utf8 package conflicting with
debiandoc-sgml). Simply use encoding change. Fic html header and latex
code generation. This is more like what you are thinking.
Once you are successful, start filing all debiandoc-sgml depending
packages to start using new utf-8 version while converting source text
to UTF-8
I was thinking first option but that may be too complicated. Your
thought may be good for migration since we still have old package for a
while.
Osamu
More information about the Debiandoc-sgml-pkgs
mailing list