[Debian-l10n-devel] Translations cleanup

Christian PERRIER christian at perrier.eu.org
Mon Jun 7 17:56:42 UTC 2010

Quoting Arne Goetje (arne.goetje at canonical.com):
> Hi Christian,

CC'ing debian-l10n-devel mailing list, that gathers all people
involved in the Debian i18n infrastructure and scripts.

> I just stumbled upon this translation overview page:
> http://www.debian.org/international/l10n/po/
> The script which generates the page seems to need some improvement:
>  * it seems not to use the iso_639_3.xml file from the iso-codes
> package, since many language codes are marked as "Unknown language".

I'm not sure this is easy to achieve. Nicolas, any idea?

>  * language codes with @ modifiers are not parsed correctly. The
> script should split the string at the @ and display it like this:
> ca at valencia Catalan (valencia).

That can probably be fixed even though my personal opinion about this
ca at valencia "joke" is....let's say this politically correct...mitigated..:-)

>  * some entries look bogus, e.g. vi_AR. There are no translations
> with that code, so it needs to be investigated where this code comes
> from.

Certainly from some bogus package providing a vi_AR.po file.

> Also, I'd like to ask if there is any coordinated effort planned or
> underway to fix the .po file names in the packages themselves? Quite
> a few files need to be renamed in order to be useable.

There have been some initiatives. In a quite distant past, I reported
a few such errors to the relevant packages.

> Examples:
>  * dk -> should be da, according to the translations inside
>  * sr_SR -> the country code for Serbia is RS. It should actually be
> just 'sr'. Likewise with sr_YU.
>  * sr at Latn and sr at latin is actually the same and should be merged
> into sr at latin. sr at Latn doesn't exist as a locale.
>  * no and no_NO are discouraged. Translations should be either nb or
> nn. In most cases, these 'no' translations are actually nb.
>  * zh is also discouraged, they should be either zh_CN or zh_TW.
>  * codes with country codes, where the language is only mainly
> spoken in one country should be merged with the country-less
> language codes to avoid confusion. E.g. ca_ES at valencia should get
> merged into ca at valencia

I even go further: fr_FR.po when there is no fr.po file and no other
fr_* file is plain stupid. Indeed, my own personal opinion is that
there is no serious argument for using country modifiers for most of
the "multiple country" languages.

I had this debate many times in Debian lists...and, of course, there
always someone popping up in a more or less pedantic way and "kindly"
explaining me that "French as spoken in Belgium" is different from
"French as spoken in France", but:

- after over 10 years in l10n, I know about all this and probably all
specificities of most languages in the world. That's pedantic too but
I think I deserve the right to be pedantic on that matter

- software l10n is about *written* languages, not spoken ones and
apart from  very specific very well known cases ("ordenador" vs
"computador" in es_ES and es_everywhere-else), there is no practical
differences in most cases

- only having fr_CA (for instance) translation files for French
deprives users of other French locales from the French translation
unless this file is copied as fr_FR, fr_CH, fr_BE, fr_LU, etc. Huge
waste of resources. Of course, French is only an example, here.

- exceptions to this (that is, real good reasons to use xx_YY.po files
are very limited:
  - pt vs pt_BR
  - zh_CN vs zh_TW (all all practical implications for users of zh_HK,
  - eventually pa_IN/pa_PK and bn_BD/bn_IN

So, in short, all occurrences of xx_YY.po files (apart from the
abovementioned exceptions) should be hunted down....and I would
wholeheartedly welcome an initiative about this. Of course, most of
these errors belong to upstream software, but we can expect Debian
developers to relay them upstream (and of course, then, have fun times
arguing with upstream developers when they tell us we are wrong..:-))

More information about the Debian-l10n-devel mailing list