[Debian-l10n-devel] po statistics

Helge Kreutzmann debian at helgefjell.de
Fri Nov 17 16:37:14 GMT 2023


Hello Laura et. al.
Am Fri, Nov 17, 2023 at 12:41:54PM +0100 schrieb Laura Arjona Reina:
> (adding debian-i18n and debian-l10n-devel lists to CC because I don't know
> very well the codebase)
> 
> El 17/11/23 a las 11:44, Thomas Lange escribió:
> > Hi all,
> > 
> > currently I'm looking at the po statistics. I think I do not understand
> > much of it, but I wonder why so many statistics are generated.

Because Debian has a wide user base and we try to cater all language
combinations.

> I believe that these stats use a list of languages created by the script
> gen-files.pl:
> 
> https://salsa.debian.org/webmaster-team/webwml/-/blob/master/english/international/l10n/scripts/gen-files.pl
> 
> and the list of languages is done by gathering all the languages present in
> .po files in testing or unstable archive, using the material in:
> 
> https://i18n.debian.org/material/
> 
> (that is created by the
> https://salsa.debian.org/l10n-team/dl10n/-/blob/master/cron/gen-material
> script)
> 
> > If I look at
> > /srv/www.debian.org/www/international/l10n/po
> > I see more than 9400 html files. Wow!
> > We provide the "Status of PO files for language code: de — German " in
> > 15 different languages (ls de.*.html) and all german files (including
> > de_*.*.html) are 112. What is this good for?
> > I'll just pick one:
> > de_PY.nl.html is
> > "Toestand van de PO-bestanden voor de taalcode: de_PY — German @tmpl_lang at ndash; Paraguay"
> > who needs this?

Users who do not speak english (very well). In your example this is a
combination, so a (probably wrong or very rare) language combination
(as a German I was not aware of it at all) and translator (probably
using templates) who decided this string is sensible for users
speaking Netherlands.

> I guess that the different scripts just combine all the different language
> pairs, but frankly I don't know.

Most likely.

> > First, why is the list of languages on page
> > https://www.debian.org/international/l10n/po/
> > so long? Shouldn't we just list the main lagnuages, without it's variants?
> > Should we ignore all "Unknown language"?

Who decided what is "main"? And why should we skip "minor" languages?
If there are translators willing to do the work, we should welcome
them. I remember that there was a big translation effort into Buthan
(I hope my memory serves me right, Bubulle coordinate it). And I even
heard that some countries prefere free software exactyl for this: They
are not forced to use a "main" language but can get the best results
in their native one.

And to do this, these statistics are *very* helpful, as they show the
current status and help focus on the work.

And maybe some German speaking community in Paraguay just happens to
decide right now to do the work - then they could ramp up their
efforts quickly, as the infrastructure is already there.

> This is a bug processing languages with ISO in ISO 639-2 set:
> 
> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1025749
> 
> > 
> > Let's look at these three:
> > https://www.debian.org/international/l10n/po/log
> > https://www.debian.org/international/l10n/po/log_DE
> > https://www.debian.org/international/l10n/po/man_DE
> > 
> > They list some packages for which
> > "translation is underway" or "already translated".
> > Fine.
> > 
> > But why do we then add a long list of package names under the section
> > "Packages already i18n-ed"
> > "These packages are translated in other languages, and then could be translated into your own language."
> 
> This is the list of "pending translations", in fact.

Yes, and this is again very helpful. For German, I look at it again
and again to see where my translation effort is best spend, I actually
picked entries from there several times.

> > And at the end of the web page we list again all languages, same list
> > as on the top of the page.
> 
> The list of languages is repeated at top and end of the page since almost
> the first revision of the corresponding template, I guess it was done by
> convenience.

Probably. And it doesn't hurt.

> I've just updated /english/international/l10n/po/tmpl.src removing the line
> that inserts the list again at the end of the pages.

Why?

> > I like to know which information is really needed for the translators
> > then I'll try to fix the Makefile to get rid of the unneeded files.
> 
> I'm CC'ing debian-i18n list, but many translators are not subscribed there,
> just only subscribed to their -l10n-language lists, and I guess that this
> info is mostly used by the translators translating package information (not
> website pages), so I'm not sure how to reach them.
> 
> For my case (Spanish), I'll send a mail to -l10n-spanish linking to this
> message and your message, for the case they can provide some feedback.

Please keep the information there. It *is* used. And I really hate to
create it manually, this will drive translators away.

I really do not understand why such a good resource should be
destroyed or crippled. This e-mail contains *no* rationale for this.

Greetings

        Helge

-- 
      Dr. Helge Kreutzmann                     debian at helgefjell.de
           Dipl.-Phys.                   http://www.helgefjell.de/debian.php
        64bit GNU powered                     gpg signed mail preferred
           Help keep free software "libre": http://www.ffii.de/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <http://alioth-lists.debian.net/pipermail/debian-l10n-devel/attachments/20231117/2a762e01/attachment.sig>


More information about the Debian-l10n-devel mailing list