Bug#767138: fftw3: runtime detection of NEON is perhaps broken

Gert Wollny gw.fossdev at gmail.com
Wed Oct 29 22:15:12 UTC 2014


On Wed, 2014-10-29 at 18:36 +0100, Julian Taylor wrote:
> the flags are only added to files which do the computations, the rest
> of the program should not have this flag, this should include the file
> that has the neon runtime check.
Makes sense, but then adding an intrinsic should be okay. 

Regarding the actual disassembly: 

really_have_neon is probably inlined into fftwf_have_simd_neon, because 

  strings  /usr/lib/arm-linux-gnueabihf/libfftw3f.so| \
    grep really_have_neon 

returns nothing.

(gdb) 

Dump of assembler code for function fftwf_have_simd_neon:

   0x000a9fdc <+00>:	10 b5	push	{r4, lr}
   0x000a9fde <+02>:	08 4c	ldr	r4, [pc, #32]	
                       ; (0xaa000 <fftwf_have_simd_neon+36>)
   0x000a9fe0 <+04>:	7c 44	add	r4, pc
   0x000a9fe2 <+06>:	d4 f8 88 31	ldr.w	r3, [r4, #392]	; 0x188
   0x000a9fe6 <+10>:	13 b1	cbz	r3, 0xa9fee 
                                      <fftwf_have_simd_neon+18>
   0x000a9fe8 <+12>:	d4 f8 8c 01	ldr.w	r0, [r4, #396]	; 0x18c
   0x000a9fec <+16>:	10 bd	pop	{r4, pc}
   0x000a9fee <+18>:	ff f7 c9 ff	bl	0xa9f84
   0x000a9ff2 <+22>:	01 23	movs	r3, #1
   0x000a9ff4 <+24>:	c4 f8 88 31	str.w	r3, [r4, #392]	; 0x188
   0x000a9ff8 <+28>:	c4 f8 8c 01	str.w	r0, [r4, #396]	; 0x18c
   0x000a9ffc <+32>:	10 bd	pop	{r4, pc}
   0x000a9ffe <+34>:	00 bf	nop
   0x000aa000 <+36>:	34 46	mov	r4, r6
   0x000aa002 <+38>:	0b 00	movs	r3, r1

No "f2 00 01 50" to be seen ...Now I have to admit that I don't really
read arm assembler, so I can't tell what this code actually does. 

Considering [1] it seems that one can not just put some asm statement
into the code and assume that this assembler code will really be
inserted at that very spot, and given the dump, I can only assume that
the compiler might even decide to optimize the assembler code away,
since it doesn't reference any variable. 

[1] 
https://stackoverflow.com/questions/6517860/arm-gcc-inline-assembler-optimization-problem
 
Best 
Gert 



More information about the debian-science-maintainers mailing list