Bug#978315: xgettext complains about UTF conformance of strings not marked for translation

Bruno Haible bruno at clisp.org
Sun Dec 27 18:40:41 GMT 2020


Hi Santiago, Samuel,

> The upload of gettext 0.21 for Debian unstable has made package "dasher",
> maintained by Samuel Thibault (in Cc), not to build anymore, as reported here
> by Lucas Nussbaum:
> 
> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=978315
> 
> We are not sure where is exactly the problem (either "dasher" or "gettext").
> 
> In short: xgettext seems to parse and complain about UTF conformance
> of strings even if they are not marked for translation.
> 
> Here is a minimal test case provided by Samuel:
> 
> ----- Begin forwarded message -----
> 
> € cat test.c
> 
> #include <wchar.h>
> 
> void f(const wchar_t *str) { }
> 
> void g(void) {
> 	f(L"\xABCDFF");
> }
> 
> 
> € xgettext test.c
> xgettext: x-c.c:1666: phase5_get: Assertion `UNICODE_VALUE (c) >= 0 && UNICODE_VALUE (c) < 0x110000' failed.
> 
> Samuel
> 
> ----- End forwarded message -----

This behaviour was introduced in gettext 0.20, with the ability to grok
C11 and C++11 string literals.

In the next gettext release, functions like 'f' (which take a 'const wchar_t *'
argument) can be designated as gettext-like functions, for which the argument
needs to be extracted and put into the POT file. For this, it must be possible
to convert it to UTF-8.

The assertion could be converted to a reasonable error message, sure.

Having a reasonable error message (with line number) *and* emitting this error
message only when the string actually gets extracted would make xgettext more
complex.

Since Samuel says:

  ... the file that poses problem is Testing/gtest/test/gtest_unittest.cc
  This is not something that contains anything to be translated, we'd need
  some option to just ignore Testing/ entirely.

this looks like the better option.

Bruno



More information about the Pkg-a11y-devel mailing list