Bug#767138: fftw3: runtime detection of NEON is perhaps broken

Edmund Grimley Evans edmund.grimley.evans at gmail.com
Tue Oct 28 23:18:38 UTC 2014


> > Is this signal-handling approach the best way of detecting NEON? The
> > following blog suggests using HWCAP, but I don't know if that would
> > work with the freebsd kernels:
> 
> looks like a better way to do it, freebsd doesn't matter for us as we
> don't have a arm port of that.
> I don't know arm much nor can I test so I would need a patch ideally
> something that can be upstreamed (so it would need proper guards against
> kernels not supporting it)
> 
> Though for Debian wouldn't it be enough to change the encoding to what
> we are using?
> possibly that could be automated by compiling a vand snipped and getting
> the code with objdump during configure time.

That sounds as if it could be fragile and cause problems for Debian
ports and derivatives that might target a different ARM variant.

I think the following in neon.c would probably work for Debian, which
only has the Linux kernel with ARM, as you point out:

#include "ifftw.h"

#if HAVE_NEON

#include <sys/auxv.h>

  extern void X(check_alignment_of_sse2_pm)(void);
  // Presumably this was put here for a reason ...

  int X(have_simd_neon)(void)
  {
       return !!(getauxval(AT_HWCAP) & HWCAP_ARM_NEON);
  }

#endif

Alternatively, you could perhaps use the current signal-catching
mechanism but run something based on the actual FFT code as a test
instead of a VAND instruction. By the looks of it the actual code uses
intrinsics such as vaddq_f32. Is there any reason why one couldn't use
one of the same intrinsics to see if NEON is working? Well, I can
think of a couple of issues:

1. The signal-catching trick is probably not thread-safe.

2. If you use an intrinsic, rather than asm volatile, then you need to
take care that the instruction really gets executed as the compiler
(or a future compiler) could do constant propagation, dead code
removal, etc., perhaps even between translation units.

Another idea might be to use an ARM "VAND Q0,Q0,Q0", numerically
encoded, in a separate .s file. Then it wouldn't matter if the C code
calling it is ARM or Thumb, though clearly you are then making the
assumption that nobody will want to use the library on a Thumb-only
system. And you'd have to fiddle with the build system to include the
.s file in the build.

Or one could use inline assembler with the mnemonic (the line that's
commented out) and do something with the build system to make sure
that the compiler is invoked with an appropriate -mfpu=... option.

At the moment I think that AT_HWCAP is probably the right way to go
for Linux and for Debian. Upstream could guard it with #ifdef linux or
a config test for those facilities.



More information about the debian-science-maintainers mailing list