[Nut-upsdev] ctype(3) warnings
Greg Troxel
gdt at lexort.com
Mon Mar 31 13:48:08 BST 2025
I'm building the alpha for nut-2.8.3, and find (not saying this is new):
common.c:675:26: warning: array subscript has type 'char' [-Wchar-subscripts]
powerp-txt.c:415:40: warning: array subscript has type 'char' [-Wchar-subscripts]
nutdrv_qx.c:3047:17: warning: array subscript has type 'char' [-Wchar-subscripts]
The issue is that ctype(3) calls are defined only for a limited set of
values, and negaative values of char (which could be signed) are
invalid.
So an cast to unsigned char is probably in order.
----------------------------------------
STANDARDS
These functions, with the exception of isblank(), conform to ANSI
X3.159-1989 (“ANSI C89”). All described functions, including isblank(),
also conform to IEEE Std 1003.1-2001 (“POSIX.1”).
CAVEATS
The argument of these functions is of type int, but only a very
restricted subset of values are actually valid. The argument must either
be the value of the macro EOF (which has a negative value), or must be a
non-negative value within the range representable as unsigned char.
Passing invalid values leads to undefined behavior.
Values of type int that were returned by getc(3), fgetc(3), and similar
functions or macros are already in the correct range, and may be safely
passed to these ctype functions without any casts.
Values of type char or signed char must first be cast to unsigned char,
to ensure that the values are within the correct range. Casting a
negative-valued char or signed char directly to int will produce a
negative-valued int, which will be outside the range of allowed values
(unless it happens to be equal to EOF, but even that would not give the
desired result).
Because the bugs may manifest as silent misbehavior or as crashes only
when fed input outside the US-ASCII range, the NetBSD implementation of
the ctype functions is designed to elicit a compiler warning for code
that passes inputs of type char in order to flag code that may pass
negative values at runtime that would lead to undefined behavior:
#include <ctype.h>
#include <locale.h>
#include <stdio.h>
int
main(int argc, char **argv)
{
if (argc < 2)
return 1;
setlocale(LC_ALL, "");
printf("%d %d\n", *argv[1], isprint(*argv[1]));
printf("%d %d\n", (int)(unsigned char)*argv[1],
isprint((unsigned char)*argv[1]));
return 0;
}
When compiling this program, GCC reports a warning for the line that
passes char. At runtime, you may get nonsense answers for some inputs
without the cast — if you're lucky and it doesn't crash:
% gcc -Wall -o test test.c
test.c: In function 'main':
test.c:12:2: warning: array subscript has type 'char'
% LC_CTYPE=C ./test $(printf '\270')
-72 5
184 0
% LC_CTYPE=C ./test $(printf '\377')
-1 0
255 0
% LC_CTYPE=fr_FR.ISO8859-1 ./test $(printf '\377')
-1 0
255 2
Some implementations of libc, such as glibc as of 2018, attempt to avoid
the worst of the undefined behavior by defining the functions to work for
all integer inputs representable by either unsigned char or char, and
suppress the warning. However, this is not an excuse for avoiding
conversion to unsigned char: if EOF coincides with any such value, as it
does when it is -1 on platforms with signed char, programs that pass char
will still necessarily confuse the classification and mapping of EOF with
the classification and mapping of some non-EOF inputs.
More information about the Nut-upsdev
mailing list