[Debian-med-packaging] Bug#813438: How to specify a generic architecture to GCC (Was: SSE3 issue with iqtree when trying to enable i386)

Tung Nguyen tung.nguyen at univie.ac.at
Wed Jun 29 12:36:17 UTC 2016


Dear Andreas and others,

Thanks a lot for your help in fixing the compatibility issues in IQ-TREE.
We will apply the patches to our GIT repository.

Cheers
Tung

On Wed, Jun 29, 2016 at 2:16 PM, Andreas Tille <andreas at an3as.eu> wrote:

> Hi Tung,
>
> thanks to the very helpful and detailed response of Christian Seiler
> which I would like you to read below in detail since it explains the
> patches applied in the Debian packaging I was able to build the latest
> iqtree version.  You can find all patches that are applied to iqtree
> here:
>
>
> https://anonscm.debian.org/cgit/debian-med/iqtree.git/tree/debian/patches
>
> Please also note that I have updated the spelling patch with some
> spelling mistakes (in code and comments).
>
> Kind regards and thanks for your cooperation
>
>      Andreas.
>
> On Tue, Jun 28, 2016 at 08:09:15PM +0200, Christian Seiler wrote:
> > On 06/28/2016 11:01 AM, Andreas Tille wrote:
> > > I admit I can not answer the question asked by upstream.  The package
> in
> > > question is iqtree[1] and they said that they have different
> > > computational kernels implemented to respect different hardware.
> > > Current Git[1] does not even build - may be due to some fine tuning of
> > > gcc options needed???
> >
> > I've looked at this, and there are a couple of things going on here:
> >
> > 0. Debian's build flags by default assume a generic architecture, so
> > you don't have to do anything by yourself.
> >
> > 1. Upstream's build system supports multiple options for the entirety
> > of the code. So you can compile the entire code with the AVX or FMA
> > instruction set. You patch that out completely from the CMakeLists.txt
> > in sse.patch, but that isn't actually required. (IQTREE_FLAGS would
> > have to be explicitly set to enable this.)
> >
> > 2. Furthermore, upstream's build system provides SSE and AVX kernels
> > for regardless of the build flags of the rest of the code, and they are
> > always compiled. (Well, you can disable compilation of the AVX kernel
> > if you add "novx" to IQTREE_FLAGS, but there's no reason to.) This
> > should work out of the box.
> >
> > That said, the code doesn't support non-SSE at all, because it hard-
> > codes at least SSE2 intrinsics in a lot of platces (and the one part
> > where it hardcodes SSE3, you already have a patch for). The code can
> > therefore not be compiled without SSE support enabled, unfortunately,
> > even on i386. If you want to support non-SSE at all on i386, upstream
> > (or yourself) needs to implement the routines in the vectorclass/
> > directory (and possibly others) for non-SSE systems. (The kernel that
> > optionally uses AVX also exists in a non-SSE variant, so upstream is
> > not completely wrong about that, but there's a lot of _other_ code that
> > forces at least SSE2.
> >
> > 3. pll/ has a bug that it calls posix_memalign with PLL_BYTE_ALIGNMENT.
> > However, according to the manpage, the alignment must be a multiple of
> > sizeof(void *) for posix_memalign to work (and a power of 2), but
> > PLL_BYTE_ALIGNMENT is 1 if SSE3 is not used. If you explicitly set it
> > to 8 (to catch both 32bit and 64bit), posix_memalign will not fail and
> > the program won't segfault anymore. (posix_memalign with wrong align
> > argument will just return without a possibility to check for an error,
> > but also not allocating a buffer, leaving it empty.)
> >
> > Note though that if you don't compile with sse3 flags enabled, pll will
> > not use SSE code at all (other than that which the compiler generates),
> > which is probably slower. But it does work, though. (A grep for __SSE3
> > shows though that porting this would be a LOT of work.)
> >
> > Irrespective of the SSE-stuff, two things:
> >
> > 1. Your debian/rules calls dh_auto_clean/configure/build in
> >    override_dh_auto_build to build two variants. This can be done in a
> >    more elegant way, because CMake does support out of tree builds, and
> >    you can have debhelper use a specific build directory by specifying
> >    -Bdirname to dh_auto_*.
> >
> > 2. You might want to add --parallel to your dh call in debian/rules.
> >    CMake-based projects tend to support paralle builds, and iqtest is
> >    no exception to that rule. Would speed up build times quite a bit.
> >
> > 3. If you want to test the -omp binary as you do in debian/rules
> >    currently, you have to pass -redo, otherwise the second call will
> >    simply fail.
> >
> > I've update your sse.patch to include the SSE-related fixes, and have
> > updated debian/rules to incorporate the two other things. Attached both
> > to this email. The package now builds on amd64 and i386 (and probably
> > will build on the kfreebsd and hurd variants thereof, though I haven't
> > checked) and the test suite runs. The AVX/FMA checks in CMakeLists.txt
> > are now not removed, because debian/rules never sets IQTEST_FLAGS to
> > fma or avx. (On amd64 the avx kernel is built with -mavx regardless
> > separately by the build system, so that's also OK; and on my Haswell
> > system the AVX detection works. On i386 the AVX kernel is never built,
> > as per what the upstream build system decided.) However, even on i386,
> > SSE2 support is required for this to work, otherwise the program will
> > crash with either illegal instruction or a segfault at start. (I can
> > provide you with a preinst script that checks for SSE2 support to show
> > a nice error message at package installation time, if you so wish.)
> >
> > Additionally (what I've NOT done): please check the lintian info and
> > pedantic messages of the package:
> >
> >  - out-of-date-standards-version 3.9.7
> >  - a couple of spelling-error-in-binary
> >  - hardening-no-pie/hardening-no-bindnow: consider enabling all
> >    hardening flags (if that works, haven't checked)
> >  - copyright-refers-to-symlink-license: you should refer to
> >    /usr/share/common-licenses/GPL-3 and not .../GPL in the GPL-3
> >    block of debian/copyright
> >
> > Hope that helps.
> >
> > Regards,
> > Christian
>
> > #!/usr/bin/make -f
> >
> > # DH_VERBOSE := 1
> >
> > pkg := $(shell dpkg-parsechangelog | sed -n 's/^Source: //p')
> > version=$(shell dpkg-parsechangelog -ldebian/changelog | grep Version: |
> cut -f2 -d' ' | cut -f1 -d- )
> > mandir=$(CURDIR)/debian/$(pkg)/usr/share/man/man1/
> >
> > %:
> >       dh $@ --parallel
> >
> > VARIANTS = omp serial
> >
> > override_dh_auto_configure: $(foreach
> variant,$(VARIANTS),dh_auto_configure_$(variant))
> > override_dh_auto_build:     $(foreach
> variant,$(VARIANTS),dh_auto_build_$(variant))
> > override_dh_auto_install:   $(foreach
> variant,$(VARIANTS),dh_auto_install_$(variant))
> > override_dh_auto_clean:     $(foreach
> variant,$(VARIANTS),dh_auto_clean_$(variant))
> >
> > dh_auto_configure_omp:
> >       dh_auto_configure -Bbuild.omp -- -DIQTREE_FLAGS="omp"
> >
> > dh_auto_configure_serial:
> >       dh_auto_configure -Bbuild.serial -- -DIQTREE_FLAGS=""
> >
> > dh_auto_build_%:
> >       dh_auto_build -Bbuild.$(subst dh_auto_build_,,$@)
> >
> > dh_auto_install_%:
> >       dh_auto_install -Bbuild.$(subst dh_auto_install_,,$@)
> >
> > dh_auto_clean_%:
> >       dh_auto_clean -Bbuild.$(subst dh_auto_clean_,,$@)
> >
> > override_dh_installexamples:
> >       dh_installexamples
> >       # remove example files in unusual dir
> >       rm -f debian/*/usr/models.nex
> >       rm -f debian/*/usr/example.[np][eh][xy]
> >
> > override_dh_installman:
> >       mkdir -p $(mandir)
> >       help2man --no-info --no-discard-stderr --help-option="-h" \
> >           --name='efficient phylogenetic software by maximum likelihood'
> \
> >           --version-string="$(version)"
> $(CURDIR)/debian/$(pkg)/usr/bin/iqtree > $(mandir)/iqtree.1
> >       help2man --no-info --no-discard-stderr --help-option="-h" \
> >           --name='efficient phylogenetic software by maximum likelihood
> (multiprocessor version)' \
> >           --version-string="$(version)"
> $(CURDIR)/debian/$(pkg)/usr/bin/iqtree-omp > $(mandir)/iqtree-omp.1
> >
> > override_dh_auto_test:
> >       # use only the first example for build time test to save time on
> autobuilders
> > #     if [ "`find $(CURDIR) -name iqtree -type f -executable`" = "" ] ;
> then \
> > #             iqtreeomp=`find $(CURDIR) -name iqtree-omp -type f
> -executable` ; \
> > #             ln -s iqtree-omp `dirname $$iqtreeomp`/iqtree ; \
> > #     fi
> >       sed '/ myprefix/,$$d' debian/Documents_source/example.sh >
> example.short
> >       echo 'time $(CURDIR)/build.omp/iqtree-omp -s example.phy -omp 2
> -redo' >> example.short
> >       time sh example.short
> >       rm example.short
>
> > Description: Do not use -m32 and -msse3 flags
> > Bug-Debian: https://bugs.debian.org/813436
> > Author: Andreas Tille <tille at debian.org>
> > Last-Update: Tue, 02 Feb 2016 08:41:45 +0100
> >
> > --- a/CMakeLists.txt
> > +++ b/CMakeLists.txt
> > @@ -1,4 +1,4 @@
> > -##################################################################
> > +     ##################################################################
> >  # IQ-TREE cmake build definition
> >  # Copyright (c) 2012-2015 Bui Quang Minh, Lam Tung Nguyen
> >  ##################################################################
> > @@ -172,7 +172,7 @@ if(CMAKE_SIZEOF_VOID_P EQUAL 4 OR IQTREE
> >       endif()
> >       SET(EXE_SUFFIX "${EXE_SUFFIX}32")
> >       if (GCC OR CLANG)
> > -             set(COMBINED_FLAGS "${COMBINED_FLAGS} -m32")
> > +             set(COMBINED_FLAGS "${COMBINED_FLAGS}")
> >       endif()
> >      add_definitions(-DBINARY32)
> >  else()
> > @@ -237,7 +237,7 @@ SET(SSE_FLAGS "")
> >  if (VCC)
> >       set(SSE_FLAGS "/arch:SSE2 -D__SSE3__")
> >  elseif (GCC OR CLANG)
> > -     set(SSE_FLAGS "-msse3")
> > +     set(SSE_FLAGS "-msse2")
> >  elseif (ICC)
> >       if (WIN32)
> >               set(SSE_FLAGS "/arch:SSE3")
> > @@ -273,8 +273,7 @@ elseif (IQTREE_FLAGS MATCHES "avx") # AV
> >
> >       SET(EXE_SUFFIX "${EXE_SUFFIX}-avx")
> >  else() #SSE intruction set
> > -     message("Vectorization : SSE3")
> > -     add_definitions(-D__SSE3)
> > +     message("Vectorization : SSE2")
> >
> >  endif()
> >
> > --- a/phylokernel.h
> > +++ b/phylokernel.h
> > @@ -15,6 +15,10 @@
> >  inline Vec2d horizontal_add(Vec2d x[2]) {
> >  #if  INSTRSET >= 3  // SSE3
> >      return _mm_hadd_pd(x[0],x[1]);
> > +#elif   INSTRSET >= 2  // SSE3
> > +    Vec2d help0 = _mm_shuffle_pd(x[0], x[1], _MM_SHUFFLE2(0,0));
> > +    Vec2d help1 = _mm_shuffle_pd(x[0], x[1], _MM_SHUFFLE2(1,1));
> > +    return _mm_add_pd(help0, help1);
> >  #else
> >  #error "You must compile with SSE3 enabled!"
> >  #endif
> > --- a/pll/pll.h
> > +++ b/pll/pll.h
> > @@ -82,7 +82,7 @@ extern "C" {
> >  #define PLL_VECTOR_WIDTH 2
> >
> >  #else
> > -#define PLL_BYTE_ALIGNMENT 1
> > +#define PLL_BYTE_ALIGNMENT 8
> >  #define PLL_VECTOR_WIDTH 1
> >  #endif
> >
>
>
> --
> http://fam-tille.de
>



-- 
Dipl.-Ing. Tung Nguyen, PhD
Center for Integrative Bioinformatics Vienna (CIBIV)
Max F. Perutz Laboratories (MFPL)
Campus Vienna Biocenter 5 (VBC5), Ebene 1, Room 1812.4
A-1030 Wien, Austria
Phone: +43 +1 / 42777-24025
Handy: +4369911551566
Email:   tung.nguyen at univie.ac.at
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.alioth.debian.org/pipermail/debian-med-packaging/attachments/20160629/a873f7fe/attachment-0003.html>


More information about the Debian-med-packaging mailing list