[Ankur-core] Re: [Debian-in-workers] Bengali in new Debian Installer

Thu Sep 22 19:29:04 UTC 2005

On Thursday 22 September 2005 13:01, Jamil Ahmed wrote:
> Soumyadip Modak wrote:
> >On Thu, 2005-09-22 at 13:40 +0600, Omi Azad wrote:
> >>Deepayan Sarkar wrote:
> >>>AFAICS, Debian itself is not Unicode 4.1.0 compliant yet (the unstable
> >>> version of console-data has [1]
> >>> /usr/share/unidata/UnicodeData-4.0.1d1b.txt). The change is the
> >>> addition of one (moderately rare) code point, but the font editor we
> >>> use (fontforge) doesn't know about it yet (I haven't checked the CVS
> >>> version recently though).
> >>>
> >>>In any case, this is a minor problem (that will get solved eventually),
> >>> and more importantly, completely orthogonal to the issue at hand.
> >>>
> >>>Deepayan
> >>
> >>But Unicode 4.1.0 works fine in my Ubuntu 5.04 box. Windows and it's
> >>range of software also doesn't have any update yet, but that works fine
> >>without any problem. It's just a character besides a particular code, so
> >>there should not be any problem getting the output if we fix the font.
> >>In fact I'm not getting any problem in both Windows and Linux. I just
> >>added the codepoint in the glyph with FontLab (Windows) and it's working
> >>well. You can check it by downloading any of the font from
> >>http://www.ekushey.org.
> >
> >What Deepayanda probably meant is that there are significant parts of
> >Debian-specific (and by extension Ubuntu-specific) software that is not
> >Unicode 4.1.0 compliant. The compliance of the Desktop portion of the
> >distribution doesn't mean we can call the distribution Unicode 4.1.0
> >compliant. Of course the modularity of a *nix system maybe a trifle too
> >difficult to understand for people used to MS Windows :)
>
> So what is the solution now? :)

I don't see the problem yet.

The only question would be one of policy. If a translated string involves a 
khanda-ta (which, for those who don't know, is the only Bengali 'character' 
whose encoding changed from 4.0.1 to 4.1.0), should it be encoded using the 

(a) 4.0.1 rule or 
(b) 4.1.0 rule ?

I think (b) would be the obvious choice, as long as it doesn't crash anything. 
Upside: this would remove any future conversion headaches. Downside: at most 
cause khanda-ta's not to be displayed correctly. It is quite possible, as Omi 
suggests, that dropping in a replacement unicode 4.1-compatible font would 
solve any display problems.  My guess is that this problem would be mostly 
irrelevant because the translated strings will have few (if any) khanda-ta's 
(which we cannot verify unless enough translations are done). So I suggest 
that we get on with the non-trivial task of translation rather than worry 
about unimportant problems.

Deepayan