Bug#738981: Fwd: Bug#738981: Switch to use generic_fpu for ARM

Lennart Sorensen lsorense at csclub.uwaterloo.ca
Fri Feb 21 16:25:12 UTC 2014


On Fri, Feb 21, 2014 at 01:29:40AM +0000, peter green wrote:
> Thomas Orgis wrote:
> >So, I got conversion to float implemented now and tested with the
> >generic_nofpu decoder on x86-64. It _should_ of course work with ARM,
> >too;-) If you'd like to check the current snapshot of mpg123,
> >
> >	http://mpg123.org/snapshot/mpg123-20140220132548.tar.bz2 ,
> >
> >you hopefull will find that any normal build of mpg123 (unless
> >specifying --disable-float explicitly) now offers all usual formats. As
> >a bonus, I even implemented the 8 Bit A-Law output, which has always
> >just been a placeholder (nobody missed it, apparently).
> >
> >I'd be interested on some timings of
> >
> >	mpg123 -t -e s16 test.mp3
> >	mpg123 -t -e f32 test.mp3
> >
> >with the various builds you'll do for the ARM variants. Best would be running
> >
> >	perl scripts/benchmark-cpu.pl src/mpg123 convergence_-_points_of_view/*.mp3
> >
> >with
> >
> >	http://mpg123.orgis.org/convergence_-_points_of_view.tar.gz
> >
> >as reference album, as mentioned on
> >
> >	http://mpg123.orgis.org/benchmarking.shtml
> >
> >to be able to compare the performance of the code and machine to
> >others. This yields output like this:
> >
> >#mpg123 benchmark (user CPU time in seconds for decoding)
> >#decoder	t_s16/s	t_f32/s
> >x86-64	3.39	4.05
> >generic	6.15	6.01
> >generic_dither	6.36	5.97
> >
> >... or this, with --with-cpu=generic_fpu:
> >
> >#mpg123 benchmark (user CPU time in seconds for decoding)
> >#decoder	t_s16/s	t_f32/s
> >generic	6.14	6.29
> >
> >(on a Core2Duo machine)
> Ok, on a 1GHz freescale IMX53 (cortex A8) in a (probablly somewhat
> out of date) debian sid armhf chroot I tested with "perl
> scripts/benchmark-cpu.pl src/mpg123
> convergence_-_points_of_view/*.mp3" in the following configurations.
> 
> Built with ./configure --with-cpu=arm_nofpu
> #mpg123 benchmark (user CPU time in seconds for decoding)
> #decoder        t_s16/s t_f32/s
> ARM     30.36   34.26
> 
> Built with ./configure --with-cpu=generic_fpu
> #mpg123 benchmark (user CPU time in seconds for decoding)
> #decoder        t_s16/s t_f32/s
> generic 148.66  138.49
> 
> Build with CFLAGS=-mfpu=neon ./configure --with-cpu=neon
> #mpg123 benchmark (user CPU time in seconds for decoding)
> #decoder        t_s16/s t_f32/s
> NEON    0.03    0.04
> 
> I found the neon result unbelivable so I decided to run the test
> program you mentioned to me in my private mail asking about how to
> run the benchmarks.
> root at plugwash:/mpg123-test#
> LD_LIBRARY_PATH=/mpg123-20140220132548-arm_nofpu/src/libmpg123/.libs/
> perl compliance.pl /mpg123-20140220132548-arm_nofpu/src/mpg123
> 
> ==== Layer 1 ====
> --> 16 bit signed integer output
> fl1.bit:        RMS=3.486054e-02 (FAIL) maxdiff=5.002832e-02 (FAIL)
> fl2.bit:        RMS=3.485670e-02 (FAIL) maxdiff=5.008233e-02 (FAIL)
> fl3.bit:        RMS=3.485293e-02 (FAIL) maxdiff=5.008245e-02 (FAIL)
> fl4.bit:        RMS=1.510105e-01 (FAIL) maxdiff=5.277658e-01 (FAIL)
> fl5.bit:        RMS=3.109439e-01 (FAIL) maxdiff=4.475173e-01 (FAIL)
> fl6.bit:        RMS=1.649138e-01 (FAIL) maxdiff=4.589995e-01 (FAIL)
> fl7.bit:        RMS=2.211659e-02 (FAIL) maxdiff=2.959942e-01 (FAIL)
> fl8.bit:        RMS=3.484906e-02 (FAIL) maxdiff=5.002034e-02 (FAIL)
> --> 32 bit integer output
> fl1.bit:        RMS=3.486054e-02 (FAIL) maxdiff=5.002832e-02 (FAIL)
> fl2.bit:        RMS=3.485670e-02 (FAIL) maxdiff=5.008233e-02 (FAIL)
> fl3.bit:        RMS=3.485293e-02 (FAIL) maxdiff=5.008245e-02 (FAIL)
> fl4.bit:        RMS=1.513207e-01 (FAIL) maxdiff=4.787517e-01 (FAIL)
> fl5.bit:        RMS=3.109439e-01 (FAIL) maxdiff=4.475173e-01 (FAIL)
> fl6.bit:        RMS=1.649138e-01 (FAIL) maxdiff=4.589995e-01 (FAIL)
> fl7.bit:        RMS=2.211659e-02 (FAIL) maxdiff=2.959942e-01 (FAIL)
> fl8.bit:        RMS=3.484906e-02 (FAIL) maxdiff=5.002034e-02 (FAIL)
> --> 24 bit integer output
> fl1.bit:        RMS=3.486054e-02 (FAIL) maxdiff=5.002832e-02 (FAIL)
> fl2.bit:        RMS=3.485670e-02 (FAIL) maxdiff=5.008233e-02 (FAIL)
> fl3.bit:        RMS=3.485293e-02 (FAIL) maxdiff=5.008245e-02 (FAIL)
> fl4.bit:        RMS=1.494715e-01 (FAIL) maxdiff=4.984906e-01 (FAIL)
> fl5.bit:        RMS=3.109439e-01 (FAIL) maxdiff=4.475173e-01 (FAIL)
> fl6.bit:        RMS=1.649138e-01 (FAIL) maxdiff=4.589995e-01 (FAIL)
> fl7.bit:        RMS=2.211659e-02 (FAIL) maxdiff=2.959942e-01 (FAIL)
> fl8.bit:        RMS=3.484906e-02 (FAIL) maxdiff=5.002034e-02 (FAIL)
> --> 32 bit floating point output
> fl1.bit:        RMS=3.486054e-02 (FAIL) maxdiff=5.002832e-02 (FAIL)
> fl2.bit:        RMS=3.485670e-02 (FAIL) maxdiff=5.008233e-02 (FAIL)
> fl3.bit:        RMS=3.485293e-02 (FAIL) maxdiff=5.008245e-02 (FAIL)
> fl4.bit:        RMS=1.137037e-01 (FAIL) maxdiff=4.459082e-01 (FAIL)
> fl5.bit:        RMS=3.109439e-01 (FAIL) maxdiff=4.475173e-01 (FAIL)
> fl6.bit:        RMS=1.649138e-01 (FAIL) maxdiff=4.589995e-01 (FAIL)
> fl7.bit:        RMS=2.211659e-02 (FAIL) maxdiff=2.959942e-01 (FAIL)
> fl8.bit:        RMS=3.484906e-02 (FAIL) maxdiff=5.002034e-02 (FAIL)
> 
> ==== Layer 2 ====
> --> 16 bit signed integer output
> fl10.bit:       RMS=3.528939e-02 (FAIL) maxdiff=6.501251e-02 (FAIL)
> fl11.bit:       RMS=3.528947e-02 (FAIL) maxdiff=6.501383e-02 (FAIL)
> fl12.bit:       RMS=3.528948e-02 (FAIL) maxdiff=6.501538e-02 (FAIL)
> fl13.bit:       RMS=2.169086e-01 (FAIL) maxdiff=5.500084e-01 (FAIL)
> fl14.bit:       RMS=3.109485e-01 (FAIL) maxdiff=4.486248e-01 (FAIL)
> fl15.bit:       RMS=1.051756e-01 (FAIL) maxdiff=4.656556e-01 (FAIL)
> fl16.bit:       RMS=6.194990e-02 (FAIL) maxdiff=4.863935e-01 (FAIL)
> --> 32 bit integer output
> fl10.bit:       RMS=3.528939e-02 (FAIL) maxdiff=6.501251e-02 (FAIL)
> fl11.bit:       RMS=3.528947e-02 (FAIL) maxdiff=6.501383e-02 (FAIL)
> fl12.bit:       RMS=3.528948e-02 (FAIL) maxdiff=6.501538e-02 (FAIL)
> fl13.bit:       RMS=2.377473e-01 (FAIL) maxdiff=5.390697e-01 (FAIL)
> fl14.bit:       RMS=3.109485e-01 (FAIL) maxdiff=4.486248e-01 (FAIL)
> fl15.bit:       RMS=1.051756e-01 (FAIL) maxdiff=4.656556e-01 (FAIL)
> fl16.bit:       RMS=6.194990e-02 (FAIL) maxdiff=4.863935e-01 (FAIL)
> --> 24 bit integer output
> fl10.bit:       RMS=3.528939e-02 (FAIL) maxdiff=6.501251e-02 (FAIL)
> fl11.bit:       RMS=3.528947e-02 (FAIL) maxdiff=6.501383e-02 (FAIL)
> fl12.bit:       RMS=3.528948e-02 (FAIL) maxdiff=6.501538e-02 (FAIL)
> fl13.bit:       RMS=2.093457e-01 (FAIL) maxdiff=5.390697e-01 (FAIL)
> fl14.bit:       RMS=3.109485e-01 (FAIL) maxdiff=4.486248e-01 (FAIL)
> fl15.bit:       RMS=1.051756e-01 (FAIL) maxdiff=4.656556e-01 (FAIL)
> fl16.bit:       RMS=6.194990e-02 (FAIL) maxdiff=4.863935e-01 (FAIL)
> --> 32 bit floating point output
> fl10.bit:       RMS=3.528939e-02 (FAIL) maxdiff=6.501251e-02 (FAIL)
> fl11.bit:       RMS=3.528947e-02 (FAIL) maxdiff=6.501383e-02 (FAIL)
> fl12.bit:       RMS=3.528948e-02 (FAIL) maxdiff=6.501538e-02 (FAIL)
> fl13.bit:       RMS=1.850494e-01 (FAIL) maxdiff=5.390697e-01 (FAIL)
> fl14.bit:       RMS=3.109485e-01 (FAIL) maxdiff=4.486248e-01 (FAIL)
> fl15.bit:       RMS=1.051756e-01 (FAIL) maxdiff=4.656556e-01 (FAIL)
> fl16.bit:       RMS=6.194990e-02 (FAIL) maxdiff=4.863935e-01 (FAIL)
> 
> ==== Layer 3 ====
> --> 16 bit signed integer output
> compl.bit:      RMS=7.754415e-02 (FAIL) maxdiff=4.619989e-01 (FAIL)
> --> 32 bit integer output
> compl.bit:      RMS=9.465917e-02 (FAIL) maxdiff=5.095977e-01 (FAIL)
> --> 24 bit integer output
> compl.bit:      RMS=5.263265e-02 (FAIL) maxdiff=2.595977e-01 (FAIL)
> --> 32 bit floating point output
> compl.bit:      RMS=6.627619e-02 (FAIL) maxdiff=3.997431e-01 (FAIL)
> root at plugwash:/mpg123-test#
> 
> root at plugwash:/mpg123-test# LD_LIBRARY_PATH=/mpg123-20140220132548-generic_fpu/src/libmpg123/.libs/
> perl compliance.pl /mpg123-20140220132548-generic_fpu/src/mpg123
> 
> ==== Layer 1 ====
> --> 16 bit signed integer output
> fl1.bit:        RMS=8.683659e-06 (PASS) maxdiff=1.525879e-05 (PASS)
> fl2.bit:        RMS=8.686681e-06 (PASS) maxdiff=1.525879e-05 (PASS)
> fl3.bit:        RMS=8.737660e-06 (PASS) maxdiff=1.525879e-05 (PASS)
> fl4.bit:        RMS=8.806232e-06 (PASS) maxdiff=1.525879e-05 (PASS)
> fl5.bit:        RMS=9.013229e-06 (LIMITED) maxdiff=1.525879e-05 (PASS)
> fl6.bit:        RMS=8.759997e-06 (PASS) maxdiff=1.525879e-05 (PASS)
> fl7.bit:        RMS=7.289141e-06 (PASS) maxdiff=1.525879e-05 (PASS)
> fl8.bit:        RMS=8.687595e-06 (PASS) maxdiff=1.525879e-05 (PASS)
> --> 32 bit integer output
> fl1.bit:        RMS=1.972732e-08 (PASS) maxdiff=1.490116e-07 (PASS)
> fl2.bit:        RMS=1.977832e-08 (PASS) maxdiff=1.117587e-07 (PASS)
> fl3.bit:        RMS=2.009690e-08 (PASS) maxdiff=1.359731e-07 (PASS)
> fl4.bit:        RMS=1.908054e-08 (PASS) maxdiff=1.303852e-07 (PASS)
> fl5.bit:        RMS=3.893589e-08 (PASS) maxdiff=1.192093e-07 (PASS)
> fl6.bit:        RMS=3.228863e-08 (PASS) maxdiff=1.788139e-07 (PASS)
> fl7.bit:        RMS=1.773698e-08 (PASS) maxdiff=8.381903e-08 (PASS)
> fl8.bit:        RMS=1.866009e-08 (PASS) maxdiff=8.381903e-08 (PASS)
> --> 24 bit integer output
> fl1.bit:        RMS=4.115537e-08 (PASS) maxdiff=1.192093e-07 (PASS)
> fl2.bit:        RMS=4.173207e-08 (PASS) maxdiff=1.192093e-07 (PASS)
> fl3.bit:        RMS=4.138772e-08 (PASS) maxdiff=1.788139e-07 (PASS)
> fl4.bit:        RMS=4.128280e-08 (PASS) maxdiff=1.192093e-07 (PASS)
> fl5.bit:        RMS=4.807225e-08 (PASS) maxdiff=1.192093e-07 (PASS)
> fl6.bit:        RMS=4.042650e-08 (PASS) maxdiff=1.788139e-07 (PASS)
> fl7.bit:        RMS=4.193615e-08 (PASS) maxdiff=1.192093e-07 (PASS)
> fl8.bit:        RMS=4.116006e-08 (PASS) maxdiff=1.192093e-07 (PASS)
> --> 32 bit floating point output
> fl1.bit:        RMS=1.971453e-08 (PASS) maxdiff=1.490116e-07 (PASS)
> fl2.bit:        RMS=1.977459e-08 (PASS) maxdiff=1.117587e-07 (PASS)
> fl3.bit:        RMS=2.010474e-08 (PASS) maxdiff=1.341105e-07 (PASS)
> fl4.bit:        RMS=1.907853e-08 (PASS) maxdiff=1.266599e-07 (PASS)
> fl5.bit:        RMS=3.920450e-08 (PASS) maxdiff=1.192093e-07 (PASS)
> fl6.bit:        RMS=3.231139e-08 (PASS) maxdiff=1.788139e-07 (PASS)
> fl7.bit:        RMS=1.772345e-08 (PASS) maxdiff=8.195639e-08 (PASS)
> fl8.bit:        RMS=1.864753e-08 (PASS) maxdiff=8.376082e-08 (PASS)
> 
> ==== Layer 2 ====
> --> 16 bit signed integer output
> fl10.bit:       RMS=9.249037e-06 (LIMITED) maxdiff=1.525879e-05 (PASS)
> fl11.bit:       RMS=9.069406e-06 (LIMITED) maxdiff=1.525879e-05 (PASS)
> fl12.bit:       RMS=9.060801e-06 (LIMITED) maxdiff=1.525879e-05 (PASS)
> fl13.bit:       RMS=8.778473e-06 (PASS) maxdiff=1.525879e-05 (PASS)
> fl14.bit:       RMS=8.688631e-06 (PASS) maxdiff=1.525879e-05 (PASS)
> fl15.bit:       RMS=8.784508e-06 (PASS) maxdiff=1.525879e-05 (PASS)
> fl16.bit:       RMS=7.982127e-06 (PASS) maxdiff=1.525879e-05 (PASS)
> --> 32 bit integer output
> fl10.bit:       RMS=1.787536e-08 (PASS) maxdiff=8.288771e-08 (PASS)
> fl11.bit:       RMS=1.796403e-08 (PASS) maxdiff=1.005828e-07 (PASS)
> fl12.bit:       RMS=1.791192e-08 (PASS) maxdiff=8.940697e-08 (PASS)
> fl13.bit:       RMS=1.770106e-08 (PASS) maxdiff=4.470348e-08 (PASS)
> fl14.bit:       RMS=3.900489e-08 (PASS) maxdiff=1.192093e-07 (PASS)
> fl15.bit:       RMS=2.472882e-08 (PASS) maxdiff=1.490116e-07 (PASS)
> fl16.bit:       RMS=1.951551e-08 (PASS) maxdiff=1.788139e-07 (PASS)
> --> 24 bit integer output
> fl10.bit:       RMS=4.102704e-08 (PASS) maxdiff=1.192093e-07 (PASS)
> fl11.bit:       RMS=4.111382e-08 (PASS) maxdiff=1.192093e-07 (PASS)
> fl12.bit:       RMS=4.105048e-08 (PASS) maxdiff=1.192093e-07 (PASS)
> fl13.bit:       RMS=4.127193e-08 (PASS) maxdiff=5.960464e-08 (PASS)
> fl14.bit:       RMS=4.979262e-08 (PASS) maxdiff=1.192093e-07 (PASS)
> fl15.bit:       RMS=4.137607e-08 (PASS) maxdiff=1.788139e-07 (PASS)
> fl16.bit:       RMS=4.163997e-08 (PASS) maxdiff=1.788139e-07 (PASS)
> --> 32 bit floating point output
> fl10.bit:       RMS=1.787791e-08 (PASS) maxdiff=8.288771e-08 (PASS)
> fl11.bit:       RMS=1.796339e-08 (PASS) maxdiff=1.005828e-07 (PASS)
> fl12.bit:       RMS=1.790768e-08 (PASS) maxdiff=8.568168e-08 (PASS)
> fl13.bit:       RMS=1.769857e-08 (PASS) maxdiff=4.470348e-08 (PASS)
> fl14.bit:       RMS=3.885701e-08 (PASS) maxdiff=1.192093e-07 (PASS)
> fl15.bit:       RMS=2.477253e-08 (PASS) maxdiff=1.788139e-07 (PASS)
> fl16.bit:       RMS=1.951434e-08 (PASS) maxdiff=1.490116e-07 (PASS)
> 
> ==== Layer 3 ====
> --> 16 bit signed integer output
> compl.bit:      RMS=8.907547e-06 (LIMITED) maxdiff=1.531839e-05 (PASS)
> --> 32 bit integer output
> compl.bit:      RMS=2.152941e-08 (PASS) maxdiff=1.769513e-07 (PASS)
> --> 24 bit integer output
> compl.bit:      RMS=4.205970e-08 (PASS) maxdiff=1.788139e-07 (PASS)
> --> 32 bit floating point output
> compl.bit:      RMS=2.153062e-08 (PASS) maxdiff=1.769513e-07 (PASS)
> root at plugwash:/mpg123-test#
> 
> root at plugwash:/mpg123-test#
> LD_LIBRARY_PATH=/mpg123-20140220132548-neon/src/libmpg123/.libs/
> perl compliance.pl /mpg123-20140220132548-neon/src/mpg123
> 
> ==== Layer 1 ====
> --> 16 bit signed integer output
> fl1.bit:        RMS=     nan (FAIL) maxdiff=0.000000e+00 (PASS)
> Illegal instruction
> fl2.bit:        RMS=     nan (FAIL) maxdiff=0.000000e+00 (PASS)
> Illegal instruction
> fl3.bit:        RMS=     nan (FAIL) maxdiff=0.000000e+00 (PASS)
> Illegal instruction
> fl4.bit:        RMS=     nan (FAIL) maxdiff=0.000000e+00 (PASS)
> Illegal instruction
> fl5.bit:        RMS=     nan (FAIL) maxdiff=0.000000e+00 (PASS)
> Illegal instruction
> fl6.bit:        RMS=     nan (FAIL) maxdiff=0.000000e+00 (PASS)
> Illegal instruction
> fl7.bit:        RMS=     nan (FAIL) maxdiff=0.000000e+00 (PASS)
> Illegal instruction
> fl8.bit:        RMS=     nan (FAIL) maxdiff=0.000000e+00 (PASS)
> Illegal instruction
> 
> ==== Layer 2 ====
> --> 16 bit signed integer output
> fl10.bit:       RMS=     nan (FAIL) maxdiff=0.000000e+00 (PASS)
> Illegal instruction
> fl11.bit:       RMS=     nan (FAIL) maxdiff=0.000000e+00 (PASS)
> Illegal instruction
> fl12.bit:       RMS=     nan (FAIL) maxdiff=0.000000e+00 (PASS)
> Illegal instruction
> fl13.bit:       RMS=     nan (FAIL) maxdiff=0.000000e+00 (PASS)
> Illegal instruction
> fl14.bit:       RMS=     nan (FAIL) maxdiff=0.000000e+00 (PASS)
> Illegal instruction
> fl15.bit:       RMS=     nan (FAIL) maxdiff=0.000000e+00 (PASS)
> Illegal instruction
> fl16.bit:       RMS=     nan (FAIL) maxdiff=0.000000e+00 (PASS)
> Illegal instruction
> 
> ==== Layer 3 ====
> --> 16 bit signed integer output
> compl.bit:      RMS=     nan (FAIL) maxdiff=0.000000e+00 (PASS)
> Illegal instruction
> root at plugwash:/mpg123-test#

I tried on my arm system.  Here is what I got:

root at rceng05:/mpg123-20140220132548# export LD_LIBRARY_PATH=/mpg123-20140220132548/src/libmpg123/.libs/
root at rceng05:/mpg123-20140220132548# perl scripts/benchmark-cpu.pl src/mpg123 /convergence_-_points_of_view/*mp3
Found 1 CPU optimizations to test...

#mpg123 benchmark (user CPU time in seconds for decoding)
#decoder        t_s16/s t_f32/s
ARM     9.19    9.52

...rebuild with new configure option...

root at rceng05:/mpg123-20140220132548# perl scripts/benchmark-cpu.pl src/mpg123 /convergence_-_points_of_view/*mp3
Found 1 CPU optimizations to test...

#mpg123 benchmark (user CPU time in seconds for decoding)
#decoder        t_s16/s t_f32/s
generic 19.71   14.43

Testing with the neon build I get a return code of 4, and it seems to
be failing to run.  It was a pain to even get it to compile.  Using just
the configure option, the assembler complained about the NEON instructions
being invalid for the chosen cpu type.  Adding -mfpu=neon to the CFLAGS
made it able to compile, but it still crashes with illegal instruction.
I tried with CFLAGS set to -mcpu=cortex-a15 -mfpu=neon, and that still
gives illegal instruction when running it.

It might be a good idea to have the benchmark script actuall check the
return code of system() to make sure it actually ran successfully rather
than giving crazy low numbers and pretending all was fine.

I was building and testing under Debian armhf sid.
gcc (Debian 4.8.2-16) 4.8.2

CPU is a dual Cortex-A15 1.5GHz (TI OMAP 57xx).

-- 
Len Sorensen



More information about the pkg-multimedia-maintainers mailing list