[med-svn] [Git][med-team/andi][master] 3 commits: New upstream version 0.14

Fabian Klötzl (@kloetzl-guest) gitlab at salsa.debian.org
Thu Dec 30 12:13:25 GMT 2021



Fabian Klötzl pushed to branch master at Debian Med / andi


Commits:
def4028e by Fabian Klötzl at 2021-12-30T13:04:50+01:00
New upstream version 0.14
- - - - -
76881b2e by Fabian Klötzl at 2021-12-30T13:04:51+01:00
Update upstream source from tag 'upstream/0.14'

Update to upstream version '0.14'
with Debian dir 34361e6eb5c06623340a3006545064809cf0a8a4
- - - - -
4a50cf38 by Fabian Klötzl at 2021-12-30T13:10:51+01:00
bump log for v0.14

- - - - -


18 changed files:

- .travis.yml
- README.md
- andi-manual.pdf
- configure.ac
- debian/changelog
- docs/andi.1.in
- docs/manual/andi-manual.tex
- docs/manual/references.bib
- libs/pfasta.c
- libs/pfasta.h
- scripts/_andi
- src/Makefile.am
- src/andi.c
- src/global.h
- src/io.c
- src/model.c
- src/model.h
- test/Makefile.am


Changes:

=====================================
.travis.yml
=====================================
@@ -2,6 +2,9 @@ language: cpp
 compiler:
   - gcc
   - clang
+arch:
+  - amd64
+  - ppc64le
 sudo: false
 addons:
   apt:


=====================================
README.md
=====================================
@@ -15,7 +15,7 @@ For Debian and Ubuntu:
 
     sudo apt-get install andi
 
-For OS X with Homebrew:
+For macOS with Homebrew:
 
     brew tap brewsci/bio
     brew install andi
@@ -68,7 +68,7 @@ The release of this software is accompanied by a paper from [Haubold et al.](htt
 
 ## License
 
-Copyright © 2014 - 2020 Fabian Klötzl  
+Copyright © 2014 - 2021 Fabian Klötzl  
 License GPLv3+: GNU GPL version 3 or later.
 
 This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. The full license text is available at <http://gnu.org/licenses/gpl.html>.


=====================================
andi-manual.pdf
=====================================
Binary files a/andi-manual.pdf and b/andi-manual.pdf differ


=====================================
configure.ac
=====================================
@@ -1,4 +1,4 @@
-AC_INIT([andi], [0.13])
+AC_INIT([andi], [0.14])
 AM_INIT_AUTOMAKE([-Wall foreign])
 
 AC_CONFIG_MACRO_DIR([m4])


=====================================
debian/changelog
=====================================
@@ -1,3 +1,9 @@
+andi (0.14-1) unstable; urgency=medium
+
+  * New upstream version 0.14
+
+ -- Fabian Klötzl <kloetzl at evolbio.mpg.de>  Mon, 06 Apr 2020 09:14:39 +0200
+
 andi (0.13-3) unstable; urgency=medium
 
   * Add build-essential as autopkgtests dependency


=====================================
docs/andi.1.in
=====================================
@@ -23,7 +23,7 @@ Use this mode if each of your \fIFASTA\fR files represents one assembly with num
 In multithreaded mode, \fBandi\fR requires memory linear to the amount of threads. The low memory mode changes this to a constant demand independent from the used number of threads. Unfortunately, this comes at a significant runtime cost.
 .TP
 \fB\-m\fR \fIMODEL\fR, \fB\-\-model\fR=\fIMODEL\fR
-Set the nucleotide evolution model to one of 'Raw', 'JC', or 'Kimura'. By default the Jukes-Cantor correction is used.
+Set the nucleotide evolution model to one of 'Raw', 'JC', 'Kimura', or 'LogDet'. By default the Jukes-Cantor correction is used.
 .TP
 \fB\-p\fR \fIFLOAT\fR
 Significance of an anchor; default: 0.025.
@@ -48,7 +48,7 @@ Prints the synopsis and an explanation of available options.
 \fB\-\-version\fR
 Outputs version information and acknowledgments.
 .SH COPYRIGHT
-Copyright \(co 2014 - 2020 Fabian Klötzl
+Copyright \(co 2014 - 2021 Fabian Klötzl
 License GPLv3+: GNU GPL version 3 or later.
 .br
 This is free software: you are free to change and redistribute it.


=====================================
docs/manual/andi-manual.tex
=====================================
@@ -18,6 +18,7 @@
 \usepackage{caption}
 \usepackage{subcaption}
 \usepackage{xspace}
+\usepackage{microtype}
 
 \bibliographystyle{alpha}
 
@@ -134,16 +135,16 @@ To build \andi from source, download the latest \href{https://github.com/EvolBio
 Once you have downloaded the package, unzip it and change into the newly created directory. 
 
 \begin{lstlisting}
-~ %  tar -xzvf andi-0.12.tar.gz
-~ %  cd andi-0.12
+~ %  tar -xzvf andi-0.14.tar.gz
+~ %  cd andi-0.14
 \end{lstlisting}
 
 \noindent Now build and install \andi.
 
 \begin{lstlisting}
-~/andi-0.12 %  ./configure
-~/andi-0.12 %  make
-~/andi-0.12 %  sudo make install
+~/andi-0.14 %  ./configure
+~/andi-0.14 %  make
+~/andi-0.14 %  sudo make install
 \end{lstlisting}
 
 \noindent This installs \andi for all users on your system. If you do not have root privileges, you will find a working copy of \andi in the \lstinline$src$ subdirectory. For the rest of this documentation, it is assumed, that \andi is in your \textdollar\lstinline!PATH!.
@@ -151,15 +152,16 @@ Once you have downloaded the package, unzip it and change into the newly created
 Now \andi should be ready for use. Try invoking the help.
 
 \begin{lstlisting}
-~/andi-0.12 %  Usage: andi [OPTIONS...] FILES...
-  FILES... can be any sequence of FASTA files.
-  Use '-' as file name to read from stdin.
+~/andi-0.14 %  ~/andi
+Usage: andi [OPTIONS...] FILES...
+	FILES... can be any sequence of FASTA files.
+	Use '-' as file name to read from stdin.
 Options:
   -b, --bootstrap=INT  Print additional bootstrap matrices
       --file-of-filenames=FILE  Read additional filenames from FILE; one per line
   -j, --join           Treat all sequences from one file as a single genome
   -l, --low-memory     Use less memory at the cost of speed
-  -m, --model=MODEL    Pick an evolutionary model of 'Raw', 'JC', 'Kimura'; default: JC
+  -m, --model=MODEL    Pick an evolutionary model of 'Raw', 'JC', 'Kimura', 'LogDet'; default: JC
   -p FLOAT             Significance of an anchor; default: 0.025
       --progress=WHEN  Print a progress bar 'always', 'never', or 'auto'; default: auto
   -t, --threads=INT    Set the number of threads; by default, all processors are used
@@ -276,7 +278,7 @@ S2        0.0995 0.0000
 
 In the above examples the runtime dropped from \SI{0.613}{\second}, to \SI{0.362}{\second} using two threads. Giving \andi more threads than input genomes leads to no further speed improvement. \, The other important option is \lstinline$--join$ (see Section~\ref{sec:join}).
 
-By default, the distances computed by \andi are \emph{Jukes-Cantor} corrected \cite{jukescantor}. Other evolutionary models are also implemented (Kimura, raw). The \lstinline$--model$ parameter can be used to switch between them.
+By default, the distances computed by \andi are \emph{Jukes-Cantor} corrected \cite{jukescantor}. Other evolutionary models are also implemented (Kimura \cite{kimura}, LogDet \cite{logdet}, raw). The \lstinline$--model$ parameter can be used to switch between them.
 
 Since version 0.9.4 \andi includes a bootstrapping method. It can be activated via the \lstinline$--bootstrap$ or \lstinline$-b$ switch. This option takes a numeric argument representing the number of matrices to create. The output can then be piped into \algo{phylip}. For more information on computing support values from distance matrices see \cite{afra}.
 
@@ -363,7 +365,7 @@ Our models of genome evolution (JC, Kimura) only work on the four canonical type
 
 \subsection*{Too Short Sequence}
 
-\andi was designed for big data sets of whole genomes. On short sequences the distance estimates are inaccurate. Use an multiple sequence alignment instead.
+\andi was designed for big data sets of whole genomes. On short sequences the distance estimates are inaccurate. Use a multiple sequence alignment instead.
 
 \subsection*{Too Long Sequence}
 


=====================================
docs/manual/references.bib
=====================================
@@ -122,5 +122,25 @@
   DOI = {10.3390/life6010011}
 }
 
+ at article{logdet,
+  AUTHOR = {Lockhart, P.J. and M.A. Steel and M.D. Hendy and D. Penny},
+  TITLE = {Recovering Evolutionary Trees under a More Realistic Model of Sequence Evolution},
+  JOURNAL = {Molecular Biology and Evolution},
+  VOLUME = {11},
+  YEAR = {1994},
+  NUMBER = {4},
+  PAGES = {605-612},
+  DOI = {10.1093/oxfordjournals.molbev.a040136}
+}
 
+ at article{kimura,
+  AUTHOR = {Kimura, M.},
+  TITLE = {A Simple Method for Estimating Evolutionary Rate of Base Substitutions Through Comparative Studies of Nucleotide Sequences},
+  JOURNAL = {Journal of Molecular Evolution},
+  VOLUME = {16},
+  YEAR = {1980},
+  NUMBER = {2},
+  PAGES = {111-120},
+  DOI = {10.1007/BF01731581}
+}
 


=====================================
libs/pfasta.c
=====================================
@@ -1,5 +1,5 @@
 /*
- * Copyright (c) 2015-2018, Fabian Klötzl <fabian-pfasta at kloetzl.info>
+ * Copyright (c) 2015-2020, Fabian Klötzl <fabian-pfasta at kloetzl.info>
  *
  * Permission to use, copy, modify, and/or distribute this software for any
  * purpose with or without fee is hereby granted, provided that the above
@@ -27,18 +27,24 @@
 
 #include "pfasta.h"
 
-#define VERSION "v14"
+#define VERSION "v15"
 
 #ifdef __SSE2__
 #include <emmintrin.h>
 #endif
 
-#ifdef __STDC_NO_THREADS__
-#define thread_local
-#else
+#if __STDC_VERSION__ >= 201112L && !defined(__STDC_NO_THREADS__)
 #include <threads.h>
+#define PFASTA_THREADSAFE 1
+#else
+#define thread_local
+#define PFASTA_THREADSAFE 0
 #endif
 
+int pfasta_threadsafe() {
+	return PFASTA_THREADSAFE ;
+}
+
 /** The following is the maximum length of an error string. It has to be
  * carefully chosen, so that all calls to PF_FAIL_STR succeed. For instance,
  * the line number can account for up to 20 characters.
@@ -159,7 +165,8 @@ cleanup:
 }
 
 int buffer_peek(struct pfasta_parser *pp) {
-	return LIKELY(pp->read_ptr < pp->fill_ptr) ? *pp->read_ptr : EOF;
+	return LIKELY(pp->read_ptr < pp->fill_ptr) ? *(unsigned char *)pp->read_ptr
+	                                           : EOF;
 }
 
 char *buffer_begin(struct pfasta_parser *pp) { return pp->read_ptr; }
@@ -257,7 +264,8 @@ size_t count_newlines(const char *begin, const char *end) {
 static int copy_word(struct pfasta_parser *pp, dynstr *target) {
 	int return_code = 0;
 
-	while (LIKELY(!my_isspace(buffer_peek(pp)))) {
+	int c;
+	while (c = buffer_peek(pp), c != EOF && LIKELY(!my_isspace(c))) {
 		char *end_of_word = find_first_space(buffer_begin(pp), buffer_end(pp));
 		size_t word_length = end_of_word - buffer_begin(pp);
 
@@ -439,7 +447,9 @@ int pfasta_read_sequence(struct pfasta_parser *pp, struct pfasta_record *pr) {
 		PF_FAIL_STR(pp, "Empty sequence on line %zu.", pp->line_number);
 	PF_FAIL_BUBBLE_CHECK(pp, check);
 
-	while (LIKELY(isalpha(buffer_peek(pp)))) {
+	// Assume a line begins only with alpha, -, *, or more spaces
+	char c;
+	while (c = buffer_peek(pp), LIKELY(isalpha(c) || c == '-' || c == '*')) {
 		int check = copy_word(pp, &sequence);
 		if (UNLIKELY(check == E_EOF)) break;
 		PF_FAIL_BUBBLE_CHECK(pp, check);


=====================================
libs/pfasta.h
=====================================
@@ -1,5 +1,5 @@
 /*
- * Copyright (c) 2015-2018, Fabian Klötzl <fabian-pfasta at kloetzl.info>
+ * Copyright (c) 2015-2020, Fabian Klötzl <fabian-pfasta at kloetzl.info>
  *
  * Permission to use, copy, modify, and/or distribute this software for any
  * purpose with or without fee is hereby granted, provided that the above
@@ -83,11 +83,10 @@ void pfasta_free(struct pfasta_parser *pp);
  */
 const char *pfasta_version(void);
 
-#ifdef __STDC_NO_THREADS__
-/** If the preprocessor macro `PFASTA_NO_THREADS` is defined, the parser is not
- * fully thread safe. */
-#define PFASTA_NO_THREADS
-#endif
+/**
+ * Returns 0 iff pfasta is not threadsafe.
+ */
+int pfasta_threadsafe();
 
 #ifdef __cplusplus
 }


=====================================
scripts/_andi
=====================================
@@ -25,6 +25,7 @@ args+=(
 		Raw\:Uncorrected\ distances
 		JC\:Jukes\-Cantor\ corrected
 		Kimura\:Kimura\-two\-parameter
+		LogDet\:Logarithmic\ determinant
 	))'
 	"($info)-p+[Significance of an anchor; default\: 0.025]:float:"
 	"($info)--progress=[Show progress bar]:when:(always auto never)"


=====================================
src/Makefile.am
=====================================
@@ -4,7 +4,7 @@ andi_SOURCES = andi.c esa.c process.c sequence.c io.c global.h esa.h process.h s
 model.h model.c
 andi_CPPFLAGS = $(OPENMP_CFLAGS) -I$(top_srcdir)/libs -I$(top_srcdir)/opt -std=gnu99
 andi_CFLAGS = $(OPENMP_CFLAGS) -Wall -Wextra -Wno-missing-field-initializers
-andi_LDADD = $(PSUFSORT) $(top_builddir)/libs/libpfasta.a $(top_builddir)/opt/libcompat.a
+andi_LDADD = $(top_builddir)/libs/libpfasta.a $(top_builddir)/opt/libcompat.a
 
 .PHONY: perf
 perf: CFLAGS+= -g -O3 -ggdb -fno-omit-frame-pointer


=====================================
src/andi.c
=====================================
@@ -142,7 +142,7 @@ int main(int argc, char *argv[]) {
 
 				if (prop <= 0.0 || prop >= 1.0) {
 					soft_errx("A probability should be a value between 0 and "
-							  "1, exlusive; Ignoring -p %f argument.",
+							  "1, exclusive; Ignoring -p %f argument.",
 							  prop);
 					break;
 				}
@@ -199,17 +199,17 @@ int main(int argc, char *argv[]) {
 				break;
 			}
 			case 'm': {
-				// valid options are 'RAW' and 'JC'
 				if (strcasecmp(optarg, "RAW") == 0) {
 					MODEL = M_RAW;
 				} else if (strcasecmp(optarg, "JC") == 0) {
 					MODEL = M_JC;
 				} else if (strcasecmp(optarg, "KIMURA") == 0) {
 					MODEL = M_KIMURA;
+				} else if (strcasecmp(optarg, "LOGDET") == 0) {
+					MODEL = M_LOGDET;
 				} else {
-					soft_errx(
-						"Ignoring argument for --model. Expected Raw, JC or "
-						"Kimura");
+					soft_errx("Ignoring argument for --model. Expected Raw, "
+							  "JC, Kimura or LogDet");
 				}
 				break;
 			}
@@ -340,14 +340,13 @@ void usage(int status) {
 		"\tUse '-' as file name to read from stdin.\n"
 		"Options:\n"
 		"  -b, --bootstrap=INT  Print additional bootstrap matrices\n"
-		"      --file-of-filenames=FILE  Read additional filenames from "
-		"FILE; one per line\n"
+		"      --file-of-filenames=FILE  Read additional filenames from FILE; "
+		"one per line\n"
 		"  -j, --join           Treat all sequences from one file as a single "
 		"genome\n"
 		"  -l, --low-memory     Use less memory at the cost of speed\n"
 		"  -m, --model=MODEL    Pick an evolutionary model of 'Raw', 'JC', "
-		"'Kimura'; default: "
-		"JC\n"
+		"'Kimura', 'LogDet'; default: JC\n"
 		"  -p FLOAT             Significance of an anchor; default: 0.025\n"
 		"      --progress=WHEN  Print a progress bar 'always', 'never', or "
 		"'auto'; default: auto\n"


=====================================
src/global.h
=====================================
@@ -47,7 +47,7 @@ extern gsl_rng *RNG;
  */
 extern int MODEL;
 
-enum { M_RAW, M_JC, M_KIMURA };
+enum { M_RAW, M_JC, M_KIMURA, M_LOGDET };
 
 /**
  * This enum contains the available flags. Please note that all


=====================================
src/io.c
=====================================
@@ -227,7 +227,6 @@ void read_fasta(const char *file_name, dsa_t *dsa) {
 		pfasta_record_free(&pr);
 	}
 
-
 fail:
 	pfasta_free(&pp);
 	close(file_descriptor);
@@ -264,6 +263,7 @@ void print_distances(const struct model *D, const seq_t *sequences, size_t n,
 		 * compiler warnings. The real default is set in andi.c.*/
 		case M_JC: estimate = &estimate_JC; break;
 		case M_KIMURA: estimate = &estimate_KIMURA; break;
+		case M_LOGDET: estimate = &estimate_LOGDET; break;
 	}
 
 	for (i = 0; i < n; i++) {


=====================================
src/model.c
=====================================
@@ -80,7 +80,8 @@ double model_coverage(const model *MM) {
  */
 double estimate_RAW(const model *MM) {
 	size_t nucl = model_total(MM);
-	size_t SNPs = model_sum(MM, AtoC, AtoG, AtoT, CtoG, CtoT, GtoT);
+	size_t SNPs = model_sum(MM, AtoC, AtoG, AtoT, CtoA, CtoG, CtoT, GtoA, GtoC,
+							GtoT, TtoA, TtoC, TtoG);
 
 	// Insignificant results. All abort the fail train.
 	if (nucl <= 3) {
@@ -111,8 +112,9 @@ double estimate_JC(const model *MM) {
  */
 double estimate_KIMURA(const model *MM) {
 	size_t nucl = model_total(MM);
-	size_t transitions = model_sum(MM, AtoG, CtoT);
-	size_t transversions = model_sum(MM, AtoC, AtoT, GtoC, GtoT);
+	size_t transitions = model_sum(MM, AtoG, GtoA, CtoT, TtoC);
+	size_t transversions =
+		model_sum(MM, AtoC, CtoA, AtoT, TtoA, GtoC, CtoG, GtoT, TtoG);
 
 	double P = (double)transitions / (double)nucl;
 	double Q = (double)transversions / (double)nucl;
@@ -124,6 +126,77 @@ double estimate_KIMURA(const model *MM) {
 	return dist <= 0.0 ? 0.0 : dist;
 }
 
+/** @brief computes the evolutionary distance using LogDet.
+ *
+ * The LogDet distance between sequence X and and sequence Y
+ * is given as
+ *
+ * -(1 / K) * (log(det(Fxy)) - 0.5 * log(det(Fxx * Fyy)))
+ *
+ * Where K is the number of character states, Fxy is the site-pattern
+ * frequency matrix, and diagonal matrices Fxx and Fyy give the
+ * frequencies of the different character states in sequences X and Y.
+ *
+ * Each i,j-th entry in Fxy is the proportion of homologous sites
+ * where sequences X and Y have character states i and j, respectively.
+ *
+ * For our purposes, X is the Subject (From) sequence and Y is the
+ * Query (To) sequence and matrix Fxy looks like
+ *
+ *  To   A  C  G  T
+ * From
+ *  A  (            )
+ *  C  (            )
+ *  G  (            )
+ *  T  (            )
+ *
+ * @param MM - The mutation matrix.
+ * @returns The LogDet distance.
+ */
+double estimate_LOGDET(const model *MM) {
+
+	double nucl = (double)model_total(MM);
+	double P[MUTCOUNTS];
+	for (int i = 0; i < MUTCOUNTS; i++) {
+		P[i] = MM->counts[i] / nucl;
+	}
+
+	double logDetFxxFyy =
+		// log determinant of diagonal matrix of row sums
+		log(model_sum(MM, AtoA, AtoC, AtoG, AtoT) / nucl) +
+		log(model_sum(MM, CtoA, CtoC, CtoG, CtoT) / nucl) +
+		log(model_sum(MM, GtoA, GtoC, GtoG, GtoT) / nucl) +
+		log(model_sum(MM, TtoA, TtoC, TtoG, TtoT) / nucl) +
+		// log determinant of diagonal matrix of column sums
+		log(model_sum(MM, AtoA, CtoA, GtoA, TtoA) / nucl) +
+		log(model_sum(MM, AtoC, CtoC, GtoC, TtoC) / nucl) +
+		log(model_sum(MM, AtoG, CtoG, GtoG, TtoG) / nucl) +
+		log(model_sum(MM, AtoT, CtoT, GtoT, TtoT) / nucl);
+
+	// determinant of the site-pattern frequency matrix
+	double detFxy =
+		P[AtoA] * P[CtoC] * (P[GtoG] * P[TtoT] - P[TtoG] * P[GtoT]) -
+		P[AtoA] * P[CtoG] * (P[GtoC] * P[TtoT] - P[TtoC] * P[GtoT]) +
+		P[AtoA] * P[CtoT] * (P[GtoC] * P[TtoG] - P[TtoC] * P[GtoG]) -
+
+		P[AtoC] * P[CtoA] * (P[GtoG] * P[TtoT] - P[TtoG] * P[GtoT]) +
+		P[AtoC] * P[CtoG] * (P[GtoA] * P[TtoT] - P[TtoA] * P[GtoT]) -
+		P[AtoC] * P[CtoT] * (P[GtoA] * P[TtoG] - P[TtoA] * P[GtoG]) +
+
+		P[AtoG] * P[CtoA] * (P[GtoC] * P[TtoT] - P[TtoC] * P[GtoT]) -
+		P[AtoG] * P[CtoC] * (P[GtoA] * P[TtoT] - P[TtoA] * P[GtoT]) +
+		P[AtoG] * P[CtoT] * (P[GtoA] * P[TtoC] - P[TtoA] * P[GtoC]) -
+
+		P[AtoT] * P[CtoA] * (P[GtoC] * P[TtoG] - P[TtoC] * P[GtoG]) +
+		P[AtoT] * P[CtoC] * (P[GtoA] * P[TtoG] - P[TtoA] * P[GtoG]) -
+		P[AtoT] * P[CtoG] * (P[GtoA] * P[TtoC] - P[TtoA] * P[GtoC]);
+
+	double dist = -0.25 * (log(detFxy) - 0.5 * logDetFxxFyy);
+
+	// fix negative zero
+	return dist <= 0.0 ? 0.0 : dist;
+}
+
 /** @brief Bootstrap a mutation matrix.
  *
  * The classical bootstrapping process, as described by Felsenstein, resamples
@@ -132,25 +205,17 @@ double estimate_KIMURA(const model *MM) {
  * resample the elements of the mutation matrix. See Klötzl & Haubold (2016)
  * for details. http://www.mdpi.com/2075-1729/6/1/11/htm
  *
- * @param MM - The original mutation matrix.
+ * @param datum - The original mutation matrix.
  * @returns A bootstrapped mutation matrix.
  */
-model model_bootstrap(const model MM) {
-	model datum = MM;
-
-	size_t nucl = model_total(&MM);
+model model_bootstrap(model datum) {
+	size_t nucl = model_total(&datum);
 	double p[MUTCOUNTS];
 	for (size_t i = 0; i < MUTCOUNTS; ++i) {
-		p[i] = MM.counts[i] / (double)nucl;
+		p[i] = datum.counts[i] / (double)nucl;
 	}
 
-	unsigned int n[MUTCOUNTS];
-
-	gsl_ran_multinomial(RNG, MUTCOUNTS, nucl, p, n);
-
-	for (size_t i = 0; i < MUTCOUNTS; ++i) {
-		datum.counts[i] = n[i];
-	}
+	gsl_ran_multinomial(RNG, MUTCOUNTS, nucl, p, datum.counts);
 
 	return datum;
 }
@@ -216,7 +281,7 @@ void model_count_equal(model *MM, const char *S, size_t len) {
  * @param c - input nucleotide
  * @returns 2bit representation.
  */
-char nucl2bit(char c) {
+char nucl2bit(unsigned char c) {
 	c &= 6;
 	c ^= c >> 1;
 	return c >> 1;
@@ -244,25 +309,13 @@ void model_count(model *MM, const char *S, const char *Q, size_t len) {
 
 		// Pick the correct two bits representing s and q.
 		unsigned char foo = nucl2bit(s);
-		unsigned char baz = nucl2bit(q);
-
-		/*
-		 * The mutation matrix is symmetric. For convenience we define the
-		 * mutation XtoY with X > Y.
-		 */
-		if (baz > foo) {
-			int temp = foo;
-			foo = baz;
-			baz = temp;
-		}
+		unsigned char bar = nucl2bit(q);
 
 		/*
 		 * Finally, we want to map the indices to the correct mutation. This is
 		 * done by utilising the mutation types in model.h.
 		 */
-		static const unsigned int map4 = 0x9740;
-		unsigned int base = (map4 >> (4 * baz)) & 0xf;
-		unsigned int index = base + (foo - baz);
+		unsigned int index = (foo << 2) + bar;
 
 		local_counts[index]++;
 	}


=====================================
src/model.h
=====================================
@@ -8,8 +8,7 @@
 #include <stdlib.h>
 
 /**
- * This enum contains all possible mutations. Some of them are considered
- * symmetric (e.g. AtoT and TtoA) and thus have the same value. The total number
+ * This enum contains all possible mutations. The total number
  * of different possible mutations is MUTCOUNTS.
  */
 enum {
@@ -17,19 +16,19 @@ enum {
 	AtoC,
 	AtoG,
 	AtoT,
+	CtoA,
 	CtoC,
 	CtoG,
 	CtoT,
+	GtoA,
+	GtoC,
 	GtoG,
 	GtoT,
+	TtoA,
+	TtoC,
+	TtoG,
 	TtoT,
-	MUTCOUNTS,
-	CtoA = AtoC,
-	GtoA = AtoG,
-	GtoC = CtoG,
-	TtoA = AtoT,
-	TtoC = CtoT,
-	TtoG = GtoT
+	MUTCOUNTS
 };
 
 /** @brief The mutation matrix.
@@ -46,12 +45,15 @@ enum {
  *
  * The cells are absolute counts. Together with seq_len (the query length),
  * we can deduce the substitution rate and coverage.
+ *
+ * As libdivsufsort is 32 bit the sequence length is limited to (INT_MAX-1)/2.
+ * We can thus use the same limit for the counts.
  */
 typedef struct model {
 	/** The absolute counts of mutation types. */
-	size_t counts[MUTCOUNTS];
+	unsigned int counts[MUTCOUNTS];
 	/** The query length. */
-	size_t seq_len;
+	unsigned int seq_len;
 } model;
 
 void model_count_equal(model *, const char *, size_t);
@@ -61,4 +63,5 @@ double model_coverage(const model *);
 double estimate_RAW(const model *);
 double estimate_JC(const model *);
 double estimate_KIMURA(const model *);
-model model_bootstrap(const model);
+double estimate_LOGDET(const model *);
+model model_bootstrap(model);


=====================================
test/Makefile.am
=====================================
@@ -9,12 +9,12 @@ test_seq_LDADD = $(GLIB_LIBS) $(top_builddir)/opt/libcompat.a
 test_process_SOURCES = test_process.c $(top_srcdir)/src/esa.c $(top_srcdir)/src/io.c $(top_srcdir)/src/model.c $(top_srcdir)/src/process.c $(top_srcdir)/src/sequence.c $(top_srcdir)/src/global.h
 test_process_CPPFLAGS = $(OPENMP_CFLAGS) -I$(top_srcdir)/src -I$(top_srcdir)/opt -I$(top_srcdir)/libs -DDEBUG -std=gnu99
 test_process_CFLAGS = $(OPENMP_CFLAGS) -Wall -Wextra $(GLIB_CFLAGS) -Wno-missing-field-initializers
-test_process_LDADD = $(GLIB_LIBS) $(PSUFSORT) $(top_builddir)/opt/libcompat.a $(top_builddir)/libs/libpfasta.a
+test_process_LDADD = $(GLIB_LIBS) $(top_builddir)/opt/libcompat.a $(top_builddir)/libs/libpfasta.a
 
 test_esa_SOURCES = test_esa.c $(top_srcdir)/src/esa.c $(top_srcdir)/src/sequence.c $(top_srcdir)/src/esa.h
 test_esa_CPPFLAGS = $(OPENMP_CFLAGS) -I$(top_srcdir)/libs -I$(top_srcdir)/opt -I$(top_srcdir)/src -DDEBUG -std=gnu99
 test_esa_CFLAGS = $(OPENMP_CFLAGS) -Wall -Wextra $(GLIB_CFLAGS) -Wno-missing-field-initializers
-test_esa_LDADD = $(GLIB_LIBS) $(PSUFSORT) $(top_builddir)/opt/libcompat.a
+test_esa_LDADD = $(GLIB_LIBS) $(top_builddir)/opt/libcompat.a
 
 test_fasta_SOURCES = test_fasta.cxx
 



View it on GitLab: https://salsa.debian.org/med-team/andi/-/compare/ef56a146404fcdb0fd25f1c36f3d88e9230cc37d...4a50cf382857c38f2c84515855a562da5105a3fe

-- 
View it on GitLab: https://salsa.debian.org/med-team/andi/-/compare/ef56a146404fcdb0fd25f1c36f3d88e9230cc37d...4a50cf382857c38f2c84515855a562da5105a3fe
You're receiving this email because of your account on salsa.debian.org.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://alioth-lists.debian.net/pipermail/debian-med-commit/attachments/20211230/c6141e64/attachment-0001.htm>


More information about the debian-med-commit mailing list