[med-svn] [Git][med-team/tantan][master] 5 commits: New upstream version 31

Sat May 28 08:34:06 BST 2022


Nilesh Patra pushed to branch master at Debian Med / tantan


Commits:
8b9fdd19 by Nilesh Patra at 2022-05-28T12:52:56+05:30
New upstream version 31
- - - - -
ba34a247 by Nilesh Patra at 2022-05-28T12:52:57+05:30
Update upstream source from tag 'upstream/31'

Update to upstream version '31'
with Debian dir acdb49353d1c255231926e952a0ebc6d3b1c7c75
- - - - -
6d744e71 by Nilesh Patra at 2022-05-28T12:52:58+05:30
Bump Standards-Version to 4.6.1 (no changes needed)

- - - - -
bdd75ef1 by Nilesh Patra at 2022-05-28T13:00:50+05:30
Re-diff patch

- - - - -
c9b5da30 by Nilesh Patra at 2022-05-28T13:00:50+05:30
Upload to unstable

- - - - -


10 changed files:

- + .gitignore
- Makefile
- README.rst
- debian/changelog
- debian/control
- debian/patches/buildflags.patch
- src/Makefile
- + src/mcf_simd.hh
- src/tantan.cc
- test/tantan_test.sh


Changes:

=====================================
.gitignore
=====================================
@@ -0,0 +1,2 @@
+src/version.hh
+bin/tantan


=====================================
Makefile
=====================================
@@ -1,4 +1,4 @@
-CXXFLAGS = -O3
+CXXFLAGS = -msse4 -O3 -g
 all:
 	@cd src && ${MAKE} CXXFLAGS="${CXXFLAGS}"
 
@@ -7,7 +7,7 @@ exec_prefix = ${prefix}
 bindir = ${exec_prefix}/bin
 install: all
 	mkdir -p ${bindir}
-	cp src/tantan ${bindir}
+	cp bin/tantan ${bindir}
 
 clean:
 	@cd src && ${MAKE} clean


=====================================
README.rst
=====================================
@@ -1,41 +1,40 @@
 tantan
 ======
 
-Introduction
-------------
-
-tantan is a tool for masking simple regions (low complexity and
-short-period tandem repeats) in biological sequences.
-
-The aim of tantan is to prevent false predictions when searching for
-homologous regions between two sequences.  Simple repeats often align
-strongly to each other, causing false homology predictions.
+tantan identifies simple regions / low complexity / tandem repeats in
+DNA or protein sequences.  Its main aim is to prevent false homology
+predictions between sequences.  Simple repeats often align strongly to
+each other, causing false homology predictions.
 
 Setup
 -----
 
-You need to have a C++ compiler.  On Linux, you might need to install
-a package called "g++".  On Mac, you might need to install
-command-line developer tools.  On Windows, you might need to install
-Cygwin.
-
-Using the command line, go into the tantan directory.  To compile it,
-type::
+Please download the highest version number from
+https://gitlab.com/mcfrith/tantan/-/tags.  Using the command line, go
+into the downloaded directory and type::
 
   make
 
-Optionally, copy tantan to a standard "bin" directory (using "sudo" to
-request administrator permissions)::
+This assumes you have a C++ compiler.  On Linux, you might need to
+install a package called "g++".  On Mac, you might need to install
+command-line developer tools.  On Windows, you might need to install
+Cygwin.
+
+This puts ``tantan`` in a ``bin`` directory.  For convenient usage,
+set up your computer to find it automatically.  Some possible ways:
 
-  sudo make install
+* Copy ``tantan`` to a standard directory: ``sudo make install``
+  (using "sudo" to request administrator permissions).
 
-Or copy it to your personal bin directory::
+* Copy it to your personal bin directory: ``make install prefix=~``
 
-  make install prefix=~
+* Adjust your `PATH variable`_.
 
 You might need to log out and back in for your computer to recognize
 the new program.
 
+**Alternative:** Install tantan from bioconda_.
+
 Normal usage
 ------------
 
@@ -133,6 +132,8 @@ repeats, so it's easy to lift the masking after determining homology.
 Options
 -------
 
+-h, --help  just show a help message, with default option values
+--version   just show version information
 -p  interpret the sequences as proteins
 -x  letter to use for masking, instead of lowercase
 -c  preserve uppercase/lowercase in non-masked regions
@@ -149,8 +150,6 @@ Options
 -n  minimum copy number, affects -f4 only
 -f  output type: 0=masked sequence, 1=repeat probabilities,
                  2=repeat counts, 3=BED, 4=tandem repeats
--h, --help  show help message, then exit
---version   show version information, then exit
 
 Advanced issues
 ---------------
@@ -176,7 +175,7 @@ align it on the other strand::
 Finding straightforward tandem repeats
 --------------------------------------
 
-Option -f4 runs tantan in a different mode, where it finds
+Option ``-f4`` runs tantan in a different mode, where it finds
 straightforward tandem repeats only.  (Technically, it uses a Viterbi
 algorithm instead of a Forward-Backward algorithm.)  This is *not*
 recommended for avoiding false homologs!  But it might be useful for
@@ -187,16 +186,14 @@ studying tandem repeats.  The output looks like this::
   mySeq   1278353 1278369 3       6.5     TCA     TCA,TCA,TCA,TC-,TC,TC,T
   mySeq   3616084 3616100 3       5.33333 TGG     TGA,TGA,TGG,TGG,TGG,T
 
-The first 3 columns show the start and end coordinates of the
-repetitive region, in `BED
-<https://genome.ucsc.edu/FAQ/FAQformat.html#format1>`_ format.  Column
-4 shows the length of the repeating unit (which might vary due to
-insertions and deletions, so this column shows the most common
-length).  Column 5 shows the number of repeat units.  Column 6 shows
-the repeating unit (which again might vary, so this is just a
-representative).  Column 7 shows the whole repeat: lowercase letters
-are insertions relative to the previous repeat unit, and dashes are
-deletions relative to the previous repeat unit.
+The first 3 columns show the start and end coordinates of the repeat,
+in BED_ format.  Column 4 shows the length of the repeating unit
+(which might vary due to insertions and deletions, so this column
+shows the most common length).  Column 5 shows the number of repeat
+units.  Column 6 shows the repeating unit (which again might vary, so
+this is just a representative).  Column 7 shows the whole repeat:
+lowercase letters are insertions relative to the previous repeat unit,
+and dashes are deletions relative to the previous repeat unit.
 
 Miscellaneous
 -------------
@@ -208,3 +205,7 @@ details, see COPYING.txt.
 If you use tantan in your research, please cite:
 "A new repeat-masking method enables specific detection of homologous
 sequences", MC Frith, Nucleic Acids Research 2011 39(4):e23.
+
+.. _BED: https://genome.ucsc.edu/FAQ/FAQformat.html#format1
+.. _PATH variable: https://en.wikipedia.org/wiki/PATH_(variable)
+.. _bioconda: https://bioconda.github.io/


=====================================
debian/changelog
=====================================
@@ -1,3 +1,11 @@
+tantan (31-1) unstable; urgency=medium
+
+  * Team upload.
+  * New upstream version 31
+  * Bump Standards-Version to 4.6.1 (no changes needed)
+
+ -- Nilesh Patra <nilesh at debian.org>  Sat, 28 May 2022 12:53:07 +0530
+
 tantan (26-1) unstable; urgency=medium
 
   * Team Upload.


=====================================
debian/control
=====================================
@@ -4,7 +4,7 @@ Uploaders: Sascha Steinbiss <satta at debian.org>
 Section: science
 Priority: optional
 Build-Depends: debhelper-compat (= 13)
-Standards-Version: 4.5.1
+Standards-Version: 4.6.1
 Vcs-Browser: https://salsa.debian.org/med-team/tantan
 Vcs-Git: https://salsa.debian.org/med-team/tantan.git
 Homepage: https://gitlab.com/mcfrith/tantan


=====================================
debian/patches/buildflags.patch
=====================================
@@ -3,26 +3,27 @@ Description: add buildflags
 Author: Sascha Steinbiss <sascha at steinbiss.name> 
 --- a/src/Makefile
 +++ b/src/Makefile
-@@ -1,9 +1,12 @@
--CXXFLAGS = -O3 -Wall
-+#CXXFLAGS = -O3 -Wall
+@@ -1,10 +1,13 @@
+-CXXFLAGS = -msse4 -O3 -Wall -g
++#CXXFLAGS = -O3 -Wall -g
  
- all: tantan
+ all: ../bin/tantan
  
--tantan: *.cc *.hh version.hh Makefile
--	$(CXX) $(CPPFLAGS) $(CXXFLAGS) $(LDFLAGS) -o $@ *.cc
+-../bin/tantan: *.cc *.hh version.hh Makefile
 +CCSRCS = $(sort $(wildcard *.cc))
 +CCHDRS = $(sort $(wildcard *.hh))
 +
-+tantan: $(CCSRCS) $(CCHDRS) version.hh Makefile
++../bin/tantan: $(CCSRCS) $(CCHDRS) version.hh Makefile
+ 	mkdir -p ../bin
+-	$(CXX) $(CPPFLAGS) $(CXXFLAGS) $(LDFLAGS) -o $@ *.cc
 +	$(CXX) $(CPPFLAGS) $(CXXFLAGS) $(LDFLAGS) -o $@ $(CCSRCS)
  
  clean:
- 	rm -f tantan
+ 	rm -f ../bin/tantan
 --- a/Makefile
 +++ b/Makefile
 @@ -1,6 +1,5 @@
--CXXFLAGS = -O3
+-CXXFLAGS = -msse4 -O3 -g
  all:
 -	@cd src && ${MAKE} CXXFLAGS="${CXXFLAGS}"
 +	@cd src && ${MAKE}


=====================================
src/Makefile
=====================================
@@ -1,15 +1,16 @@
-CXXFLAGS = -O3 -Wall
+CXXFLAGS = -msse4 -O3 -Wall -g
 
-all: tantan
+all: ../bin/tantan
 
-tantan: *.cc *.hh version.hh Makefile
+../bin/tantan: *.cc *.hh version.hh Makefile
+	mkdir -p ../bin
 	$(CXX) $(CPPFLAGS) $(CXXFLAGS) $(LDFLAGS) -o $@ *.cc
 
 clean:
-	rm -f tantan
+	rm -f ../bin/tantan
 
 VERSION1 = git describe --dirty
-VERSION2 = echo ' (HEAD -> main, tag: 26) ' | sed -e 's/.*tag: *//' -e 's/[,) ].*//'
+VERSION2 = echo ' (HEAD -> main, tag: 31) ' | sed -e 's/.*tag: *//' -e 's/[,) ].*//'
 
 VERSION = \"`test -e ../.git && $(VERSION1) || $(VERSION2)`\"
 


=====================================
src/mcf_simd.hh
=====================================
@@ -0,0 +1,538 @@
+// Author: Martin C. Frith 2019
+// SPDX-License-Identifier: GPL-3.0-or-later
+
+#ifndef MCF_SIMD_HH
+#define MCF_SIMD_HH
+
+#if defined __SSE4_1__
+#include <immintrin.h>
+#elif defined __ARM_NEON
+#include <arm_neon.h>
+#endif
+
+#include <stddef.h>  // size_t
+
+namespace mcf {
+
+#if defined __AVX2__
+
+typedef __m256i SimdInt;
+typedef __m256i SimdUint1;
+typedef __m256d SimdDbl;
+
+const int simdBytes = 32;
+
+static inline SimdInt simdZero() {
+  return _mm256_setzero_si256();
+}
+
+static inline SimdInt simdZero1() {
+  return _mm256_setzero_si256();
+}
+
+static inline SimdDbl simdZeroDbl() {
+  return _mm256_setzero_pd();
+}
+
+static inline SimdInt simdOnes1() {
+  return _mm256_set1_epi32(-1);
+}
+
+static inline SimdInt simdLoad(const void *p) {
+  return _mm256_loadu_si256((const SimdInt *)p);
+}
+
+static inline SimdInt simdLoad1(const void *p) {
+  return _mm256_loadu_si256((const SimdInt *)p);
+}
+
+static inline SimdDbl simdLoadDbl(const double *p) {
+  return _mm256_loadu_pd(p);
+}
+
+static inline void simdStore(void *p, SimdInt x) {
+  _mm256_storeu_si256((SimdInt *)p, x);
+}
+
+static inline void simdStore1(void *p, SimdInt x) {
+  _mm256_storeu_si256((SimdInt *)p, x);
+}
+
+static inline void simdStoreDbl(double *p, SimdDbl x) {
+  _mm256_storeu_pd(p, x);
+}
+
+static inline SimdInt simdOr1(SimdInt x, SimdInt y) {
+  return _mm256_or_si256(x, y);
+}
+
+static inline SimdInt simdBlend(SimdInt x, SimdInt y, SimdInt mask) {
+  return _mm256_blendv_epi8(x, y, mask);
+}
+
+const int simdLen = 8;
+const int simdDblLen = 4;
+
+static inline SimdInt simdSet(int i7, int i6, int i5, int i4,
+			      int i3, int i2, int i1, int i0) {
+  return _mm256_set_epi32(i7, i6, i5, i4, i3, i2, i1, i0);
+}
+
+static inline SimdInt simdSet1(char jF, char jE, char jD, char jC,
+			       char jB, char jA, char j9, char j8,
+			       char j7, char j6, char j5, char j4,
+			       char j3, char j2, char j1, char j0,
+			       char iF, char iE, char iD, char iC,
+			       char iB, char iA, char i9, char i8,
+			       char i7, char i6, char i5, char i4,
+			       char i3, char i2, char i1, char i0) {
+  return _mm256_set_epi8(jF, jE, jD, jC, jB, jA, j9, j8,
+			 j7, j6, j5, j4, j3, j2, j1, j0,
+			 iF, iE, iD, iC, iB, iA, i9, i8,
+			 i7, i6, i5, i4, i3, i2, i1, i0);
+}
+
+static inline SimdDbl simdSetDbl(double i3, double i2, double i1, double i0) {
+  return _mm256_set_pd(i3, i2, i1, i0);
+}
+
+static inline SimdInt simdFill(int x) {
+  return _mm256_set1_epi32(x);
+}
+
+static inline SimdInt simdFill1(char x) {
+  return _mm256_set1_epi8(x);
+}
+
+static inline SimdDbl simdFillDbl(double x) {
+  return _mm256_set1_pd(x);
+}
+
+static inline SimdInt simdGt(SimdInt x, SimdInt y) {
+  return _mm256_cmpgt_epi32(x, y);
+}
+
+static inline SimdInt simdGe1(SimdInt x, SimdInt y) {
+  return _mm256_cmpeq_epi8(_mm256_min_epu8(x, y), y);
+}
+
+static inline SimdInt simdAdd(SimdInt x, SimdInt y) {
+  return _mm256_add_epi32(x, y);
+}
+
+static inline SimdInt simdAdd1(SimdInt x, SimdInt y) {
+  return _mm256_add_epi8(x, y);
+}
+
+static inline SimdInt simdAdds1(SimdInt x, SimdInt y) {
+  return _mm256_adds_epu8(x, y);
+}
+
+static inline SimdDbl simdAddDbl(SimdDbl x, SimdDbl y) {
+  return _mm256_add_pd(x, y);
+}
+
+static inline SimdInt simdSub(SimdInt x, SimdInt y) {
+  return _mm256_sub_epi32(x, y);
+}
+
+static inline SimdInt simdSub1(SimdInt x, SimdInt y) {
+  return _mm256_sub_epi8(x, y);
+}
+
+static inline SimdDbl simdMulDbl(SimdDbl x, SimdDbl y) {
+  return _mm256_mul_pd(x, y);
+}
+
+static inline SimdInt simdQuadruple1(SimdInt x) {
+  return _mm256_slli_epi32(x, 2);
+}
+
+static inline SimdInt simdMax(SimdInt x, SimdInt y) {
+  return _mm256_max_epi32(x, y);
+}
+
+static inline SimdInt simdMin1(SimdInt x, SimdInt y) {
+  return _mm256_min_epu8(x, y);
+}
+
+static inline int simdHorizontalMax(SimdInt x) {
+  __m128i z = _mm256_castsi256_si128(x);
+  z = _mm_max_epi32(z, _mm256_extracti128_si256(x, 1));
+  z = _mm_max_epi32(z, _mm_shuffle_epi32(z, 0x4E));
+  z = _mm_max_epi32(z, _mm_shuffle_epi32(z, 0xB1));
+  return _mm_cvtsi128_si32(z);
+}
+
+static inline int simdHorizontalMin1(SimdInt x) {
+  __m128i z = _mm256_castsi256_si128(x);
+  z = _mm_min_epu8(z, _mm256_extracti128_si256(x, 1));
+  z = _mm_min_epu8(z, _mm_srli_epi16(z, 8));
+  z = _mm_minpos_epu16(z);
+  return _mm_extract_epi16(z, 0);
+}
+
+static inline double simdHorizontalAddDbl(SimdDbl x) {
+  __m128d z = _mm256_castpd256_pd128(x);
+  z = _mm_add_pd(z, _mm256_extractf128_pd(x, 1));
+  return _mm_cvtsd_f64(_mm_hadd_pd(z, z));
+}
+
+static inline SimdInt simdChoose1(SimdInt items, SimdInt choices) {
+  return _mm256_shuffle_epi8(items, choices);
+}
+
+#elif defined __SSE4_1__
+
+typedef __m128i SimdInt;
+typedef __m128i SimdUint1;
+typedef __m128d SimdDbl;
+
+const int simdBytes = 16;
+
+static inline SimdInt simdZero() {
+  return _mm_setzero_si128();
+}
+
+static inline SimdInt simdZero1() {
+  return _mm_setzero_si128();
+}
+
+static inline SimdDbl simdZeroDbl() {
+  return _mm_setzero_pd();
+}
+
+static inline SimdInt simdOnes1() {
+  return _mm_set1_epi32(-1);
+}
+
+static inline SimdInt simdLoad(const void *p) {
+  return _mm_loadu_si128((const SimdInt *)p);
+}
+
+static inline SimdInt simdLoad1(const void *p) {
+  return _mm_loadu_si128((const SimdInt *)p);
+}
+
+static inline SimdDbl simdLoadDbl(const double *p) {
+  return _mm_loadu_pd(p);
+}
+
+static inline void simdStore(void *p, SimdInt x) {
+  _mm_storeu_si128((SimdInt *)p, x);
+}
+
+static inline void simdStore1(void *p, SimdInt x) {
+  _mm_storeu_si128((SimdInt *)p, x);
+}
+
+static inline void simdStoreDbl(double *p, SimdDbl x) {
+  _mm_storeu_pd(p, x);
+}
+
+static inline SimdInt simdOr1(SimdInt x, SimdInt y) {
+  return _mm_or_si128(x, y);
+}
+
+static inline SimdInt simdBlend(SimdInt x, SimdInt y, SimdInt mask) {
+  return _mm_blendv_epi8(x, y, mask);  // SSE4.1
+}
+
+const int simdLen = 4;
+const int simdDblLen = 2;
+
+static inline SimdInt simdSet(int i3, int i2, int i1, int i0) {
+  return _mm_set_epi32(i3, i2, i1, i0);
+}
+
+static inline SimdInt simdSet1(char iF, char iE, char iD, char iC,
+			       char iB, char iA, char i9, char i8,
+			       char i7, char i6, char i5, char i4,
+			       char i3, char i2, char i1, char i0) {
+  return _mm_set_epi8(iF, iE, iD, iC, iB, iA, i9, i8,
+		      i7, i6, i5, i4, i3, i2, i1, i0);
+}
+
+static inline SimdDbl simdSetDbl(double i1, double i0) {
+  return _mm_set_pd(i1, i0);
+}
+
+static inline SimdInt simdFill(int x) {
+  return _mm_set1_epi32(x);
+}
+
+static inline SimdInt simdFill1(char x) {
+  return _mm_set1_epi8(x);
+}
+
+static inline SimdDbl simdFillDbl(double x) {
+  return _mm_set1_pd(x);
+}
+
+static inline SimdInt simdGt(SimdInt x, SimdInt y) {
+  return _mm_cmpgt_epi32(x, y);
+}
+
+static inline SimdInt simdGe1(SimdInt x, SimdInt y) {
+  return _mm_cmpeq_epi8(_mm_min_epu8(x, y), y);
+}
+
+static inline SimdInt simdAdd(SimdInt x, SimdInt y) {
+  return _mm_add_epi32(x, y);
+}
+
+static inline SimdInt simdAdd1(SimdInt x, SimdInt y) {
+  return _mm_add_epi8(x, y);
+}
+
+static inline SimdInt simdAdds1(SimdInt x, SimdInt y) {
+  return _mm_adds_epu8(x, y);
+}
+
+static inline SimdDbl simdAddDbl(SimdDbl x, SimdDbl y) {
+  return _mm_add_pd(x, y);
+}
+
+static inline SimdInt simdSub(SimdInt x, SimdInt y) {
+  return _mm_sub_epi32(x, y);
+}
+
+static inline SimdInt simdSub1(SimdInt x, SimdInt y) {
+  return _mm_sub_epi8(x, y);
+}
+
+static inline SimdDbl simdMulDbl(SimdDbl x, SimdDbl y) {
+  return _mm_mul_pd(x, y);
+}
+
+static inline SimdInt simdQuadruple1(SimdInt x) {
+  return _mm_slli_epi32(x, 2);
+}
+
+static inline SimdInt simdMax(SimdInt x, SimdInt y) {
+  return _mm_max_epi32(x, y);  // SSE4.1
+}
+
+static inline SimdInt simdMin1(SimdInt x, SimdInt y) {
+  return _mm_min_epu8(x, y);
+}
+
+static inline int simdHorizontalMax(SimdInt x) {
+  x = simdMax(x, _mm_shuffle_epi32(x, 0x4E));
+  x = simdMax(x, _mm_shuffle_epi32(x, 0xB1));
+  return _mm_cvtsi128_si32(x);
+}
+
+static inline int simdHorizontalMin1(SimdInt x) {
+  x = _mm_min_epu8(x, _mm_srli_epi16(x, 8));
+  x = _mm_minpos_epu16(x);  // SSE4.1
+  return _mm_extract_epi16(x, 0);
+}
+
+static inline double simdHorizontalAddDbl(SimdDbl x) {
+  return _mm_cvtsd_f64(_mm_hadd_pd(x, x));
+}
+
+static inline SimdInt simdChoose1(SimdInt items, SimdInt choices) {
+  return _mm_shuffle_epi8(items, choices);  // SSSE3
+}
+
+#elif defined __ARM_NEON
+
+typedef int32x4_t SimdInt;
+typedef uint32x4_t SimdUint;
+typedef uint8x16_t SimdUint1;
+typedef float64x2_t SimdDbl;
+
+const int simdBytes = 16;
+
+static inline SimdInt simdZero() {
+  return vdupq_n_s32(0);
+}
+
+static inline SimdUint1 simdZero1() {
+  return vdupq_n_u8(0);
+}
+
+static inline SimdDbl simdZeroDbl() {
+  return vdupq_n_f64(0);
+}
+
+static inline SimdUint1 simdOnes1() {
+  return vdupq_n_u8(-1);
+}
+
+static inline SimdInt simdLoad(const int *p) {
+  return vld1q_s32(p);
+}
+
+static inline SimdUint1 simdLoad1(const unsigned char *p) {
+  return vld1q_u8(p);
+}
+
+static inline SimdDbl simdLoadDbl(const double *p) {
+  return vld1q_f64(p);
+}
+
+static inline void simdStore(int *p, SimdInt x) {
+  vst1q_s32(p, x);
+}
+
+static inline void simdStore1(unsigned char *p, SimdUint1 x) {
+  vst1q_u8(p, x);
+}
+
+static inline void simdStoreDbl(double *p, SimdDbl x) {
+  vst1q_f64(p, x);
+}
+
+static inline SimdUint1 simdOr1(SimdUint1 x, SimdUint1 y) {
+  return vorrq_u8(x, y);
+}
+
+static inline SimdInt simdBlend(SimdInt x, SimdInt y, SimdUint mask) {
+  return vbslq_s32(mask, y, x);
+}
+
+const int simdLen = 4;
+const int simdDblLen = 2;
+
+static inline SimdInt simdSet(unsigned i3, unsigned i2,
+                              unsigned i1, unsigned i0) {
+  size_t lo = i1;
+  size_t hi = i3;
+  return
+    vcombine_s32(vcreate_s32((lo << 32) | i0), vcreate_s32((hi << 32) | i2));
+}
+
+static inline SimdUint1 simdSet1(unsigned char iF, unsigned char iE,
+				 unsigned char iD, unsigned char iC,
+				 unsigned char iB, unsigned char iA,
+				 unsigned char i9, unsigned char i8,
+				 unsigned char i7, unsigned char i6,
+				 unsigned char i5, unsigned char i4,
+				 unsigned char i3, unsigned char i2,
+				 unsigned char i1, unsigned char i0) {
+  size_t lo =
+    (size_t)i0       | (size_t)i1 <<  8 | (size_t)i2 << 16 | (size_t)i3 << 24 |
+    (size_t)i4 << 32 | (size_t)i5 << 40 | (size_t)i6 << 48 | (size_t)i7 << 56;
+
+  size_t hi =
+    (size_t)i8       | (size_t)i9 <<  8 | (size_t)iA << 16 | (size_t)iB << 24 |
+    (size_t)iC << 32 | (size_t)iD << 40 | (size_t)iE << 48 | (size_t)iF << 56;
+
+  return vcombine_u8(vcreate_u8(lo), vcreate_u8(hi));
+}
+
+static inline SimdDbl simdSetDbl(double i1, double i0) {
+  return vcombine_f64(vdup_n_f64(i0), vdup_n_f64(i1));
+}
+
+static inline SimdInt simdFill(int x) {
+  return vdupq_n_s32(x);
+}
+
+static inline SimdUint1 simdFill1(unsigned char x) {
+  return vdupq_n_u8(x);
+}
+
+static inline SimdDbl simdFillDbl(double x) {
+  return vdupq_n_f64(x);
+}
+
+static inline SimdUint simdGt(SimdInt x, SimdInt y) {
+  return vcgtq_s32(x, y);
+}
+
+static inline SimdUint1 simdGe1(SimdUint1 x, SimdUint1 y) {
+  return vcgeq_u8(x, y);
+}
+
+static inline SimdInt simdAdd(SimdInt x, SimdInt y) {
+  return vaddq_s32(x, y);
+}
+
+static inline SimdUint1 simdAdd1(SimdUint1 x, SimdUint1 y) {
+  return vaddq_u8(x, y);
+}
+
+static inline SimdUint1 simdAdds1(SimdUint1 x, SimdUint1 y) {
+  return vqaddq_u8(x, y);
+}
+
+static inline SimdDbl simdAddDbl(SimdDbl x, SimdDbl y) {
+  return vaddq_f64(x, y);
+}
+
+static inline SimdInt simdSub(SimdInt x, SimdInt y) {
+  return vsubq_s32(x, y);
+}
+
+static inline SimdUint1 simdSub1(SimdUint1 x, SimdUint1 y) {
+  return vsubq_u8(x, y);
+}
+
+static inline SimdDbl simdMulDbl(SimdDbl x, SimdDbl y) {
+  return vmulq_f64(x, y);
+}
+
+static inline SimdUint1 simdQuadruple1(SimdUint1 x) {
+  return vshlq_n_u8(x, 2);
+}
+
+static inline SimdInt simdMax(SimdInt x, SimdInt y) {
+  return vmaxq_s32(x, y);
+}
+
+static inline SimdUint1 simdMin1(SimdUint1 x, SimdUint1 y) {
+  return vminq_u8(x, y);
+}
+
+static inline int simdHorizontalMax(SimdInt x) {
+  return vmaxvq_s32(x);
+}
+
+static inline int simdHorizontalMin1(SimdUint1 x) {
+  return vminvq_u8(x);
+}
+
+static inline double simdHorizontalAddDbl(SimdDbl x) {
+  return vaddvq_f64(x);
+}
+
+static inline SimdUint1 simdChoose1(SimdUint1 items, SimdUint1 choices) {
+  return vqtbl1q_u8(items, choices);
+}
+
+#else
+
+typedef int SimdInt;
+typedef double SimdDbl;
+const int simdBytes = 1;
+const int simdLen = 1;
+const int simdDblLen = 1;
+static inline int simdZero() { return 0; }
+static inline double simdZeroDbl() { return 0; }
+static inline int simdSet(int x) { return x; }
+static inline double simdSetDbl(double x) { return x; }
+static inline int simdFill(int x) { return x; }
+static inline int simdLoad(const int *p) { return *p; }
+static inline double simdLoadDbl(const double *p) { return *p; }
+static inline void simdStore(int *p, int x) { *p = x; }
+static inline void simdStoreDbl(double *p, double x) { *p = x; }
+static inline double simdFillDbl(double x) { return x; }
+static inline int simdGt(int x, int y) { return x > y; }
+static inline int simdAdd(int x, int y) { return x + y; }
+static inline double simdAddDbl(double x, double y) { return x + y; }
+static inline int simdSub(int x, int y) { return x - y; }
+static inline double simdMulDbl(double x, double y) { return x * y; }
+static inline int simdMax(int x, int y) { return x > y ? x : y; }
+static inline int simdBlend(int x, int y, int mask) { return mask ? y : x; }
+static inline int simdHorizontalMax(int a) { return a; }
+static inline double simdHorizontalAddDbl(double x) { return x; }
+
+#endif
+
+}
+
+#endif


=====================================
src/tantan.cc
=====================================
@@ -1,6 +1,7 @@
 // Copyright 2010 Martin C. Frith
 
 #include "tantan.hh"
+#include "mcf_simd.hh"
 
 #include <algorithm>  // fill, max
 #include <cassert>
@@ -14,6 +15,8 @@
 
 namespace tantan {
 
+using namespace mcf;
+
 void multiplyAll(std::vector<double> &v, double factor) {
   for (std::vector<double>::iterator i = v.begin(); i < v.end(); ++i)
     *i *= factor;
@@ -308,15 +311,37 @@ struct Tantan {
     }
 
     double b = backgroundProb;
-    double fromForeground = 0;
-    double *foregroundBeg = BEG(foregroundProbs);
+    const double *b2f = BEG(b2fProbs);
+    double *fp = BEG(foregroundProbs);
     const double *lrRow = likelihoodRatioMatrix[*seqPtr];
     int maxOffset = maxOffsetInTheSequence();
-
-    for (int i = 0; i < maxOffset; ++i) {
-      double f = foregroundBeg[i];
+    const uchar *sp = seqPtr;
+
+    SimdDbl bV = simdFillDbl(b);
+    SimdDbl tV = simdFillDbl(f2f0);
+    SimdDbl sV = simdZeroDbl();
+
+    int i = 0;
+    for (; i <= maxOffset - simdDblLen; i += simdDblLen) {
+      SimdDbl rV = simdSetDbl(
+#if defined __SSE4_1__ || defined __ARM_NEON
+#ifdef __AVX2__
+			      lrRow[sp[-i-4]],
+			      lrRow[sp[-i-3]],
+#endif
+			      lrRow[sp[-i-2]],
+#endif
+			      lrRow[sp[-i-1]]);
+      SimdDbl fV = simdLoadDbl(fp+i);
+      sV = simdAddDbl(sV, fV);
+      SimdDbl xV = simdMulDbl(bV, simdLoadDbl(b2f+i));
+      simdStoreDbl(fp+i, simdMulDbl(simdAddDbl(xV, simdMulDbl(fV, tV)), rV));
+    }
+    double fromForeground = simdHorizontalAddDbl(sV);
+    for (; i < maxOffset; ++i) {
+      double f = fp[i];
       fromForeground += f;
-      foregroundBeg[i] = (b * b2fProbs[i] + f * f2f0) * lrRow[seqPtr[-i-1]];
+      fp[i] = (b * b2f[i] + f * f2f0) * lrRow[sp[-i-1]];
     }
 
     backgroundProb = b * b2b + fromForeground * f2b;
@@ -330,15 +355,36 @@ struct Tantan {
     }
 
     double toBackground = f2b * backgroundProb;
-    double toForeground = 0;
-    double *foregroundBeg = BEG(foregroundProbs);
+    const double *b2f = BEG(b2fProbs);
+    double *fp = BEG(foregroundProbs);
     const double *lrRow = likelihoodRatioMatrix[*seqPtr];
     int maxOffset = maxOffsetInTheSequence();
-
-    for (int i = 0; i < maxOffset; ++i) {
-      double f = foregroundBeg[i] * lrRow[seqPtr[-i-1]];
-      toForeground += b2fProbs[i] * f;
-      foregroundBeg[i] = toBackground + f2f0 * f;
+    const uchar *sp = seqPtr;
+
+    SimdDbl bV = simdFillDbl(toBackground);
+    SimdDbl tV = simdFillDbl(f2f0);
+    SimdDbl sV = simdZeroDbl();
+
+    int i = 0;
+    for (; i <= maxOffset - simdDblLen; i += simdDblLen) {
+      SimdDbl rV = simdSetDbl(
+#if defined __SSE4_1__ || defined __ARM_NEON
+#ifdef __AVX2__
+			      lrRow[sp[-i-4]],
+			      lrRow[sp[-i-3]],
+#endif
+			      lrRow[sp[-i-2]],
+#endif
+			      lrRow[sp[-i-1]]);
+      SimdDbl fV = simdMulDbl(simdLoadDbl(fp+i), rV);
+      sV = simdAddDbl(sV, simdMulDbl(simdLoadDbl(b2f+i), fV));
+      simdStoreDbl(fp+i, simdAddDbl(bV, simdMulDbl(tV, fV)));
+    }
+    double toForeground = simdHorizontalAddDbl(sV);
+    for (; i < maxOffset; ++i) {
+      double f = fp[i] * lrRow[sp[-i-1]];
+      toForeground += b2f[i] * f;
+      fp[i] = toBackground + f2f0 * f;
     }
 
     backgroundProb = b2b * backgroundProb + toForeground;


=====================================
test/tantan_test.sh
=====================================
@@ -5,7 +5,7 @@
 cd $(dirname $0)
 
 # Make sure we use this version of tantan:
-PATH=../src:$PATH
+PATH=../bin:$PATH
 
 countLowercaseLetters () {
     grep -v '^>' "$@" | tr -cd a-z | wc -c | tr -d ' '



View it on GitLab: https://salsa.debian.org/med-team/tantan/-/compare/9bce2420a9e2f8e322dd685bfe73d82337b2d441...c9b5da3096c0e42186405a47d164108a4c976e72

-- 
View it on GitLab: https://salsa.debian.org/med-team/tantan/-/compare/9bce2420a9e2f8e322dd685bfe73d82337b2d441...c9b5da3096c0e42186405a47d164108a4c976e72
You're receiving this email because of your account on salsa.debian.org.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://alioth-lists.debian.net/pipermail/debian-med-commit/attachments/20220528/b420da7d/attachment-0001.htm>