Bug#892228: libsphinxbase3: Causes pocketsphinx to FTBFS on 64-bit big-endian architectures (fills testsuite logs on disk with errors)

James Clarke jrtc27 at debian.org
Wed Mar 7 00:33:05 UTC 2018


Package: libsphinxbase3
Version: 0.8+5prealalpha+1-1
Severity: important
Tags: upstream
Control: affects -1 src:pocketsphinx

Hi,
The build for pocketsphinx fails on 64-bit big-endian architectures, failing
with "No space left on device", as the testsuite log files fill up with
hundreds of gigabytes of warnings. The first indication of the problem in the
log files is:

> Sorry, this does not support more than 33554432 n-grams of a particular order.  Edit util/bit_packing.hh and fix the bit packing functions

where 33554432 is 0x2000000, i.e. 32 byte-swapped. This error isn't fatal
though, and libsphinxbase3 continues to try to build the trie, with tons of
duplicate word warnings, as it's reading all kinds of garbage. The issues stem
from a widespread use of using fread to read multi-byte values with no regard
for their endianness, with the first error, the wrong number of n-grams, coming
from reading into the "counts" array in ngram_model_trie_read_bin. The library
has functions like bio_fread which can do the byte-swapping for the caller, so
presumably these should be used instead, though for this file format there does
not seem to be an easy way to determine the endianness of the file based on
some header magic like for some of the others (but maybe it's intended to
always be little-endian).

32-bit big-endian architectures have the same underlying bugs, but it seems
they die a lot earlier, failing to calloc huge sizes (presumably these same
calls are made on 64-bit architectures but can be satisfied thanks to
overcommitting) and thus don't actually try to build the trie and spew all the
warnings.

There are "only" 62 calls to fread in sphinxbase (and a further 45 in
pocketsphinx) so it shouldn't be too hard for someone with knowledge of the
codebase to audit their uses, especially since my guess is that most of them
can be turned into something like `bio_fread(..., IS_BIG_ENDIAN)`. Similarly,
the corresponding fwrite calls should be audited too.

Regards,
James



More information about the Pkg-a11y-devel mailing list