[Debian-med-packaging] Bug#776812: usage of -mtune=core2 ? (Was: Bug#776812: vsearch: FTBFS on non-x86: uses non-portable flags)

Mon Feb 2 10:53:38 UTC 2015

Hi Gert,

thanks for your helpful comments.

On Mon, Feb 02, 2015 at 11:38:20AM +0100, Gert Wollny wrote:
> Hello, 
> 
> On Mon, 2015-02-02 at 07:51 +0100, Andreas Tille wrote:
> > Hi Mentors,
> 
> > It is very important to build vsearch with the maximum optimisation for speed
> > and thus I wonder whether dropping this option is a good idea or whether
> > I should enable it on i386 and amd64 (the question extends also to
> > freebsd-i386/freebsd-amd64 once an other issue in freebsd with this
> > package is solved).
> 
> On amd64 sse/sse2 is enabled by default. 
> 
> Tuning the code for a specific processor (i.e. core2) might not be such
> a good idea, according to the GCC man page one should use -mtune=generic
> instead: 
> 
> "generic: 
> 
>  Produce code optimized for the most common IA32/AMD64/EM64T processors.
> If you know the CPU on which your code will run, then you should use the
> corresponding -mtune or -march option instead of -mtune=generic.  But,
> if you do not know exactly what CPU users of your application will have,
> then you should use this option.
> As new processors are deployed in the marketplace, the behavior of this
> option will change.  Therefore, if you upgrade to a newer version of
> GCC, code generation controlled by this option will change to reflect
> the processors that are most common at the time that version of GCC is
> released. " 

Tim, could you clarify with upstream if they agree that -mtune=generic is
the option that should be used?  In this case my patch in svn I prepared
in advance (x86_spezific_opts.patch) should be dropped.

> In addition, with itksnap I saw that -funroll-loops and -ftree-vectorize
> improved performance a lot, and these are options that do not depend on
> the architecture, but are also not enabled by default.
> 
> -funroll-loops may also slow down the code, you should check this. It is
> especially effective if there are many small loops of fixed size (like
> it is the case with ITK's types that are templated over dimensions). 
> 
> -ftree-vectorize may be useless on x86 without SSE but on amd64 it could
> give some speedups.

Tim, could you do some performance checks?  I have no idea whether the
usual upstream test suite is a proper check for this. 

Kind regards

      Andreas.

-- 
http://fam-tille.de