[Debian-handbook-translators] XML markup problems

Raphael Hertzog hertzog at debian.org
Mon Nov 12 21:54:59 UTC 2012


Hello,

On Fri, 09 Nov 2012, Raphael Hertzog wrote:
> Unfortunately several translations are not buildable due to markup errors
> (invalid XML syntax). The translations which have errors are the
> following:
> * de-DE
> * it-IT
> * pt-BR

I fixed the markup errors so that the document gets built. They are
here:
http://debian-handbook.info/browse/de-DE/stable/
http://debian-handbook.info/browse/it-IT/stable/
http://debian-handbook.info/browse/pt-BR/stable/

Unfortunately the amount of mistakes in it-IT and pt-BR was very high. I
believe that those translations teams should make sure to explain
to contributors how XML is supposed to be translated.

Here are some important points where I saw lots of errors:

* you should not add spaces where there's no space in the original string
  in particular within the XML markup.

  If original is “<emphasis>foo</emphasis>” the translations
  “<emphasis>bar</emphasis>” is OK but
  “<emphasis> bar </emphasis>” is NOT OK and
  “<emphasis>bar</ emphasis>” is NOT OK

  If original is “<xref linkend="foo" />”, you should not change
  anything. In particular “<xref linkend="foo" / >” is NOT OK.

  If original is “<filename>/etc/foo/config</filename>”, you should not
  change anything. “<filename> / Etc / foo / config </filename>” is NOT
  OK.

* you should not translate the XML tags (neither the opening tag, nor the
  closing tag): it's "<command>" not "<comando>", it's "<primary>" not
  "<principal>", it's "<filename>" not "<nomefile>".

* you should not translate attributes names and values which are not displayed to
  the user (it's “<ulink type="block"…” not “<ulink tipo = "block"…”).

* you should not translate filenames, URLs, etc.

* you should not invert the order of tags in
  “<primary>foo<primary><secondary>bar</secondary>”. Those are index
  entries, where you lookup the first word first and the second word
  as a sub-entry of the former word. You should instead reword to
  accomodate for the inverted order in languages where the order of worlds
  would be the opposite (usually there are two entries with the two
  orders when it's relevant).

* you should not insert all those "zero width space" characters (Unicode
  0x200B). I don't know why translators included those but I don't believe
  that there are legitimate reasons for those characters here. Furthermore
  they are often doubled.

Note that I fixed only the errors that generated build errors but many of
the mistakes (in particular extraneous spaces) do not generate build
errors but instead will render incorrectly in the book.

It would be nice if all the translators could be more careful from now on
so that all the translations keep being buildable and can be regularly
updated on the website (I'll do it once a month). I know it's a bit more
difficult with weblate since you can't easily verify if you have
introduced errors. That's why I requested a new feature to ensure
that translations are valid XML:
https://github.com/nijel/weblate/issues/145

Cheers,
-- 
Raphaël Hertzog ◈ Debian Developer

Get the Debian Administrator's Handbook:
→ http://debian-handbook.info/get/



More information about the Debian-handbook-translators mailing list