Bug#767138: fftw3: runtime detection of NEON is perhaps broken

Edmund Grimley Evans edmund.grimley.evans at gmail.com
Tue Oct 28 17:13:37 UTC 2014


Source: fftw3
Version: 3.3.4-1.1

In simd-support/neon.c I found:

  static int really_have_neon(void)
  {
       void (*oldsig)(int);
       oldsig = signal(SIGILL, sighandler);
       if (setjmp(jb)) {
            signal(SIGILL, oldsig);
            return 0;
       } else {
            /* paranoia: encode the instruction in binary because the
               assembler may not recognize it without -mfpu=neon */
            /*asm volatile ("vand q0, q0, q0");*/
            asm volatile (".long 0xf2000150");
            signal(SIGILL, oldsig);
            return 1;
       }
  }

Of course, that binary encoding of the VAND instruction is only valid
for ARM mode, not Thumb, and the library is mostly compiled for Thumb,
I think.

In fact, I think I have tracked down where this code appears in the
binary. In libfftw3f.so.3.4.4 I found:

   a9f84:       490f            ldr     r1, [pc, #60]   ; (a9fc4
<fftwf_guru64_kosherp+0xa4>)
   a9f86:       2004            movs    r0, #4
   a9f88:       b500            push    {lr}
   a9f8a:       4479            add     r1, pc
   a9f8c:       b083            sub     sp, #12
   a9f8e:       f765 ed0c       blx     f9a8 <_init+0x33c>
   a9f92:       9001            str     r0, [sp, #4]
   a9f94:       480c            ldr     r0, [pc, #48]   ; (a9fc8
<fftwf_guru64_kosherp+0xa8>)
   a9f96:       4478            add     r0, pc
   a9f98:       f765 ec66       blx     f868 <_init+0x1fc>
   a9f9c:       b948            cbnz    r0, a9fb2 <fftwf_guru64_kosherp+0x92>

!  a9f9e:       0150            lsls    r0, r2, #5
!  a9fa0:       f200 2004       addw    r0, r0, #516    ; 0x204
   a9fa4:       9901            ldr     r1, [sp, #4]
   a9fa6:       f765 ed00       blx     f9a8 <_init+0x33c>
   a9faa:       2001            movs    r0, #1
   a9fac:       b003            add     sp, #12
   a9fae:       f85d fb04       ldr.w   pc, [sp], #4

   a9fb2:       9901            ldr     r1, [sp, #4]
   a9fb4:       2004            movs    r0, #4
   a9fb6:       f765 ecf8       blx     f9a8 <_init+0x33c>
   a9fba:       2000            movs    r0, #0
   a9fbc:       b003            add     sp, #12
   a9fbe:       f85d fb04       ldr.w   pc, [sp], #4

This may explain some problems that people have experienced with
libfftw3:

http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=752514

http://lists.debian.org/debian-arm/2014/10/msg00051.html

Is this signal-handling approach the best way of detecting NEON? The
following blog suggests using HWCAP, but I don't know if that would
work with the freebsd kernels:

http://community.arm.com/groups/android-community/blog/2014/10/10/runtime-detection-of-cpu-features-on-an-armv8-a-cpu



More information about the debian-science-maintainers mailing list