[sane-devel] Character encoding used for sane_strstatus() strings

Mon Jul 18 11:19:10 BST 2022

Hi John,

On 2022-07-18 05:25, John Scott wrote:
> The SANE spec says that all strings are encoded in ISO-8859-1 ("Latin-
> 1"). However, from inspecting the code for sane_strstatus(), it appears
> that it just returns ordinary string literals, which use whatever
> encoding the compiler prescribes for narrow string literals and need not
> be the same.

Agreed, going by the letter of standards this is indeed a problem.

> So, what character encoding should I be assuming for strings coming from
> sane_strstatus() as an application writer? One solution to this dilemma
> is, since sane_strstatus() appears to only use characters from ASCII in
> the strings, is to use UTF-8 string literals, like this:
> 	u8"Hello, world"

This would bump compiler requirements to C11. I don't think this is bad,
because we already require C++ for at least one popular backend so it's
unlikely we have many platforms with just ancient C compiler available.

I'm CC'ing Ralph for a second opinion of whether we can start requiring C11.

By the way, does the current assumption actually break in practice, that
is, are there compilers for which ASCII text will not encode to a subset
of ISO-8859-1?

> If you can affirm that the specification needs to prevail, I can send a
> merge request to adjust the string literals accordingly.

Let's wait until Ralph replies and then we can see how to proceed.

Thanks a lot for noticing this.

Regards,
Povilas