Bug#767138: fftw3: runtime detection of NEON is perhaps broken
Gert Wollny
gw.fossdev at gmail.com
Wed Oct 29 22:15:12 UTC 2014
On Wed, 2014-10-29 at 18:36 +0100, Julian Taylor wrote:
> the flags are only added to files which do the computations, the rest
> of the program should not have this flag, this should include the file
> that has the neon runtime check.
Makes sense, but then adding an intrinsic should be okay.
Regarding the actual disassembly:
really_have_neon is probably inlined into fftwf_have_simd_neon, because
strings /usr/lib/arm-linux-gnueabihf/libfftw3f.so| \
grep really_have_neon
returns nothing.
(gdb)
Dump of assembler code for function fftwf_have_simd_neon:
0x000a9fdc <+00>: 10 b5 push {r4, lr}
0x000a9fde <+02>: 08 4c ldr r4, [pc, #32]
; (0xaa000 <fftwf_have_simd_neon+36>)
0x000a9fe0 <+04>: 7c 44 add r4, pc
0x000a9fe2 <+06>: d4 f8 88 31 ldr.w r3, [r4, #392] ; 0x188
0x000a9fe6 <+10>: 13 b1 cbz r3, 0xa9fee
<fftwf_have_simd_neon+18>
0x000a9fe8 <+12>: d4 f8 8c 01 ldr.w r0, [r4, #396] ; 0x18c
0x000a9fec <+16>: 10 bd pop {r4, pc}
0x000a9fee <+18>: ff f7 c9 ff bl 0xa9f84
0x000a9ff2 <+22>: 01 23 movs r3, #1
0x000a9ff4 <+24>: c4 f8 88 31 str.w r3, [r4, #392] ; 0x188
0x000a9ff8 <+28>: c4 f8 8c 01 str.w r0, [r4, #396] ; 0x18c
0x000a9ffc <+32>: 10 bd pop {r4, pc}
0x000a9ffe <+34>: 00 bf nop
0x000aa000 <+36>: 34 46 mov r4, r6
0x000aa002 <+38>: 0b 00 movs r3, r1
No "f2 00 01 50" to be seen ...Now I have to admit that I don't really
read arm assembler, so I can't tell what this code actually does.
Considering [1] it seems that one can not just put some asm statement
into the code and assume that this assembler code will really be
inserted at that very spot, and given the dump, I can only assume that
the compiler might even decide to optimize the assembler code away,
since it doesn't reference any variable.
[1]
https://stackoverflow.com/questions/6517860/arm-gcc-inline-assembler-optimization-problem
Best
Gert
More information about the debian-science-maintainers
mailing list