[med-svn] [fasttree] 03/07: Imported Upstream version 2.1.9

Andreas Tille tille at debian.org
Wed Apr 20 20:20:32 UTC 2016


This is an automated email from the git hooks/post-receive script.

tille pushed a commit to branch master
in repository fasttree.

commit 520fc034ff09e0d8a0e476ed5553e4cf98278a90
Author: Andreas Tille <tille at debian.org>
Date:   Wed Apr 20 22:15:49 2016 +0200

    Imported Upstream version 2.1.9
---
 changelog  | 44 +++++++++++++++++++++-----------------
 fasttree.c | 72 +++++++++++++++++++++++++++++++++++++++++++-------------------
 2 files changed, 75 insertions(+), 41 deletions(-)

diff --git a/changelog b/changelog
index d6f63f2..3e34002 100644
--- a/changelog
+++ b/changelog
@@ -1,3 +1,9 @@
+Version 2.1.9: March 29, 2016
+
+	Add the -lg option to use Le/Gascuel model for amino acid
+	substitutions. (Thanks to Doug Phelan and Ashley Superson at
+	Oakland University.)
+
 Version 2.1.8: March 24, 2015
 
 	To provide useful branch lengths for very wide alignments of very
@@ -34,9 +40,9 @@ Version 2.1.6: September 20, 2012
 Version 2.1.5: August 30, 2012
 
 	Added the -out option to meet popular demand
-	
+
 	Added a warning for Windows users to run it inside a command shell
-	
+
 Version 2.1.4: July 28, 2011
 
 	Added the -quote option (thanks to Samuel Shepard at the CDC)
@@ -47,7 +53,7 @@ Version 2.1.3: April 12, 2010
 
 	Corrected a bug that could lead to infinite recursion and stack
 	overflow within the NJ phase with -fastest
-	
+
 Version 2.1.2: March 24, 2010
 
 	Corrected a bug with printing out trees when sequence names
@@ -90,7 +96,7 @@ Version 2.1.0: October 15, 2009
 	distributions to ensure that we save space (realloc was not
 	shrinking them effectively). (2) Eliminate the 2nd copy of the
 	alignment.
-	
+
 	With -fastest, added a 2nd level of top-hits heuristic to reduce
 	memory usage and running time for the neighbor joining phase. Let
 	q = sqrt(m) = N**0.25. FastTree will store top-hits lists of size
@@ -102,7 +108,7 @@ Version 2.1.0: October 15, 2009
 	not affect accuracy in huge simulations, but may lead to slight
 	reductions in tree quality. Use -fastest -no2nd to turn off
 	2nd-level top-hits.
-	
+
 	Added the OpenMP version. Most steps are parallelized: key
 	exceptions are ML lengths, optimizing rate categoriess, ME NNIs
 	and ME SPRs. The parallelizing of the ML phase only uses 3
@@ -136,12 +142,12 @@ Version 2.0.0: July 22, 2009
 	phase, FastTree also reports any "bad" splits (NNIs that would
 	improve the likelihood) and whether these are due to constraints
 	or not.
-	
+
 	Implemented the generalized time-reversible nucleotide model of
 	evolution (-gtr option). The base frequencies are the same as in
 	the alignment; the rates are optimized just before selecting site
 	categories. Or use -gtrfreq and -gtrrates to set them yourself.
-	
+
 	Improvements to heuristics for maximum-likelihood NNIs:
 	(1) Fixed a bug in the code for deciding whether to skip the 2nd
 	round of optimizing branch lengths around a quartet.
@@ -168,7 +174,7 @@ Version 2.0.0: July 22, 2009
 
 	The -log option saves intermediate trees, settings, and model
 	details to the specified log file.
-	
+
 	Use SSE3 instructions if the compiler has set __SSE__ to indicate
 	that they are available. SSE3 improves performance of protein
 	maximum likelihood up to 50% and speeds up the protein
@@ -189,7 +195,7 @@ Version 2.0.0: July 22, 2009
 	a difference in log likelihoods.
 
 	Use exact posteriors for protein sequences by default.
-	
+
 Version 1.9.0: June 19, 2009
 
 	Added maximum-likelihood nearest-neighbor interchanges, with the
@@ -210,7 +216,7 @@ Version 1.9.0: June 19, 2009
 
 	Two heuristics to speed up ML NNIs: the star topology test and the
 	one-round heuristic.
-	
+
 	Star topology test: If ((A,B),(C,D)) is noticeably more likely
 	than the star topology (the log-likelihood with the optimal
 	internal branch length is at least 5 better), and there was no NNI
@@ -284,7 +290,7 @@ Version 1.1.0: May 11, 2009
 	number of active nodes. (The intuition behind this optimization is
 	that the average out-distance changes very little after a join if
 	we have 1,000 active nodes.)
-	
+
 	Optimized the constrained topology search to be faster. The
 	constraintWeight is now rather ad hoc, but in practice, a weight
 	of 10 means the constraints are always satisfied.
@@ -297,16 +303,16 @@ Version 1.1.0: May 11, 2009
 	Compact reporting of unexpected characters
 
 	Progress indicator (unless running with -quiet)
-	
+
 Version 1.0.5: April 8, 2009
 
 	Added -intree and -intree1 options to specify a starting tree
 	before doing NNIs and/or local bootstrap.
-	
+
 	Corrected a bug in estimating how many rounds of NNIs to do if
 	analyzing multiple alignments with the -n option (#NNIs was being
 	set only from the first alignment's size)
-	
+
 Version 1.0.4: February 4, 2009
 
         Corrected a bug in the local bootstrap that led to overly low
@@ -316,11 +322,11 @@ Version 1.0.4: February 4, 2009
 Version 1.0.3: January 7, 2009
 
 	Added constrained topology search
-	
+
 Version 1.0.2: December 15, 2008
 
 	Add the -pseudo option
-	
+
 Version 1.0.1: October 27, 2008
 
 	Report the total length of the tree (unless using -quiet).
@@ -331,7 +337,7 @@ Version 1.0.1: October 27, 2008
 
 	Comment out malloc.h, unless using TRACK_MEMORY, so that it
 	compiles on Macs.
-	
+
 Version 1.0.0: September 3, 2008
 
 	Added nearest-neighbor interchanges (NNIs) according to a
@@ -352,7 +358,7 @@ Version 1.0.0: September 3, 2008
 	Eliminated crashes on alignments with large numbers of
 	highly-gapped and non-overlapping sequences, by adding checks to
 	avoid divide-by-zero errors.
-	
+
 Version 0.9.1: April 22, 2008
 
 	Fixed bug that crashed windows version if using top-hits
@@ -360,5 +366,5 @@ Version 0.9.1: April 22, 2008
 	results under Linux.
 
 	Allow lower-case letters in input alignments.
-	
+
 Version 0.9: Initial release, April 15, 2008
diff --git a/fasttree.c b/fasttree.c
index e8adc7a..120477f 100644
--- a/fasttree.c
+++ b/fasttree.c
@@ -343,7 +343,7 @@ typedef float numeric_t;
 
 #endif /* USE_SSE3 */
 
-#define FT_VERSION "2.1.8"
+#define FT_VERSION "2.1.9"
 
 char *usage =
   "  FastTree protein_alignment > tree\n"
@@ -368,6 +368,7 @@ char *usage =
   "        (for faster global bootstrap on huge alignments)\n"
   "  -pseudo to use pseudocounts (recommended for highly gapped sequences)\n"
   "  -gtr -- generalized time-reversible model (nucleotide alignments only)\n"
+  "  -lg -- Le-Gascuel 2008 model (amino acid alignments only)\n"
   "  -wag -- Whelan-And-Goldman 2001 model (amino acid alignments only)\n"
   "  -quote -- allow spaces and other restricted characters (but not ' ) in\n"
   "           sequence names and quote names in the output tree (fasta input only;\n"
@@ -395,7 +396,7 @@ char *expertUsage =
   "           [-slow | -fastest] [-2nd | -no2nd] [-slownni] [-seed 1253] \n"
   "           [-top | -notop] [-topm 1.0 [-close 0.75] [-refresh 0.8]]\n"
   "           [-matrix Matrix | -nomatrix] [-nj | -bionj]\n"
-  "           [-wag] [-nt] [-gtr] [-gtrrates ac ag at cg ct gt] [-gtrfreq A C G T]\n"
+  "           [-lg] [-wag] [-nt] [-gtr] [-gtrrates ac ag at cg ct gt] [-gtrfreq A C G T]\n"
   "           [ -constraints constraintAlignment [ -constraintWeight 100.0 ] ]\n"
   "           [-log logfile]\n"
   "         [ alignment_file ]\n"
@@ -461,6 +462,7 @@ char *expertUsage =
   "       ML and ME NNIs)\n"
   "\n"
   "Maximum likelihood model options:\n"
+  "  -lg -- Le-Gascuel 2008 model instead of (default) Jones-Taylor-Thorton 1992 model (a.a. only)\n"
   "  -wag -- Whelan-And-Goldman 2001 model instead of (default) Jones-Taylor-Thorton 1992 model (a.a. only)\n"
   "  -gtr -- generalized time-reversible instead of (default) Jukes-Cantor (nt only)\n"
   "  -cat # -- specify the number of rate categories of sites (default 20)\n"
@@ -1632,6 +1634,10 @@ distance_matrix_t matrixBLOSUM45;
 double matrixJTT92[MAXCODES][MAXCODES];
 double statJTT92[MAXCODES];
 
+/* The Le-Gascuel 2008 amino acid transition matrix */
+double matrixLG08[MAXCODES][MAXCODES];
+double statLG08[MAXCODES];
+
 /* The WAG amino acid transition matrix (Whelan-And-Goldman 2001) */
 double matrixWAG01[MAXCODES][MAXCODES];
 double statWAG01[MAXCODES];
@@ -1655,6 +1661,7 @@ int main(int argc, char **argv) {
   int nRateCats = nDefaultRateCats;
   char *logfile = NULL;
   bool bUseGtr = false;
+  bool bUseLg = false;
   bool bUseWag = false;
   bool bUseGtrRates = false;
   double gtrrates[6] = {1,1,1,1,1,1};
@@ -1835,6 +1842,8 @@ int main(int argc, char **argv) {
       }
     } else if (strcmp(argv[iArg],"-nocat") == 0) {
       nRateCats = 1;
+    } else if (strcmp(argv[iArg], "-lg") == 0) {
+        bUseLg = true;
     } else if (strcmp(argv[iArg], "-wag") == 0) {
         bUseWag = true;
     } else if (strcmp(argv[iArg], "-gtr") == 0) {
@@ -1985,7 +1994,11 @@ int main(int argc, char **argv) {
       
       if (MLnni != 0 || MLlen) {
 	fprintf(fp, "ML Model: %s,",
-		(nCodes == 4) ? (bUseGtr ? "Generalized Time-Reversible" : "Jukes-Cantor") : (bUseWag ? "Whelan-And-Goldman" : "Jones-Taylor-Thorton"));
+		(nCodes == 4) ? 
+			(bUseGtr ? "Generalized Time-Reversible" : "Jukes-Cantor") : 
+			(bUseLg ? "Le-Gascuel 2008" : (bUseWag ? "Whelan-And-Goldman" : "Jones-Taylor-Thorton"))
+			
+	);
 	if (nRateCats == 1)
 	  fprintf(fp, " No rate variation across sites");
 	else
@@ -2133,7 +2146,9 @@ int main(int argc, char **argv) {
 
       transition_matrix_t *transmat = NULL;
       if (nCodes == 20) {
-	transmat = bUseWag? CreateTransitionMatrix(matrixWAG01,statWAG01) : CreateTransitionMatrix(matrixJTT92,statJTT92);
+			transmat = bUseLg? CreateTransitionMatrix(matrixLG08,statLG08) : 
+                          (bUseWag? CreateTransitionMatrix(matrixWAG01,statWAG01) :
+                           CreateTransitionMatrix(matrixJTT92,statJTT92));
       } else if (nCodes == 4 && bUseGtr && (bUseGtrRates || bUseGtrFreq)) {
 	transmat = CreateGTR(gtrrates,gtrfreq);
       }
@@ -4418,7 +4433,7 @@ void NormalizeFreq(/*IN/OUT*/numeric_t *freq, distance_matrix_t *dmat) {
 	freq[k] = 1.0/nCodes;
     } else {
       for (k = 0; k < nCodes; k++)
-	freq[k] = dmat->codeFreq[0][k];/*XXX gapFreq[k];*/
+	freq[k] = dmat->codeFreq[0][k];
     }
   }
 }
@@ -4834,13 +4849,6 @@ profile_t *PosteriorProfile(profile_t *p1, profile_t *p2,
   if (transmat == NULL) {	/* Jukes-Cantor */
     assert(nCodes == 4);
 
-    numeric_t fAll[128][4];
-    for (j = 0; j < 4; j++)
-      for (k = 0; k < 4; k++)
-	fAll[j][k] = (j==k) ? 1.0 : 0.0;
-    for (k = 0; k < 4; k++)
-      fAll[NOCODE][k] = 0.25;
-    
     double *PSame1 = PSameVector(len1, rates);
     double *PDiff1 = PDiffVector(PSame1, rates);
     double *PSame2 = PSameVector(len2, rates);
@@ -5170,7 +5178,7 @@ double PairLogLk(profile_t *pA, profile_t *pB, double length, int nPos,
 		 /*OPTIONAL IN/OUT*/double *site_likelihoods) {
   double lk = 1.0;
   double loglk = 0.0;		/* stores underflow of lk during the loop over positions */
-  int i,j,k;
+  int i,j;
   assert(rates != NULL && rates->nRateCategories > 0);
   numeric_t *expeigenRates = NULL;
   if (transmat != NULL)
@@ -5180,12 +5188,6 @@ double PairLogLk(profile_t *pA, profile_t *pB, double length, int nPos,
     assert (nCodes == 4);
     double *pSame = PSameVector(length, rates);
     double *pDiff = PDiffVector(pSame, rates);
-    numeric_t fAll[128][4];
-    for (j = 0; j < 4; j++)
-      for (k = 0; k < 4; k++)
-	fAll[j][k] = (j==k) ? 1.0 : 0.0;
-    for (k = 0; k < 4; k++)
-      fAll[NOCODE][k] = 0.25;
     
     int iFreqA = 0;
     int iFreqB = 0;
@@ -5277,6 +5279,7 @@ double PairLogLk(profile_t *pA, profile_t *pB, double length, int nPos,
       }
       /* SSE3 instructions do not speed this step up:
 	 numeric_t lkAB = vector_multiply3_sum(expeigen, fA, fB); */
+		// dsp this is where check for <=0 was added in 2.1.1.LG
       double lkAB = 0;
       for (j = 0; j < 4; j++)
 	lkAB += expeigen[j]*fA[j]*fB[j];
@@ -6133,9 +6136,6 @@ double RescaleGammaLogLk(int nPos, int nRateCats, /*IN*/numeric_t *rates, /*IN*/
 double MLPairOptimize(profile_t *pA, profile_t *pB,
 		      int nPos, /*OPTIONAL*/transition_matrix_t *transmat, rates_t *rates,
 		      /*IN/OUT*/double *branch_length) {
-  double len5[5];
-  int j;
-  for (j=0;j<5;j++) len5[j] = *branch_length;
   quartet_opt_t qopt = { nPos, transmat, rates,
 			 /*nEval*/0, /*pair1*/pA, /*pair2*/pB };
   double f2x,negloglk;
@@ -10129,3 +10129,31 @@ double matrixWAG01[MAXCODES][MAXCODES] = {
 	{0.008912, 0.014125, 0.040205, 0.012058, 0.020133, 0.008430, 0.007267, 0.003836, 0.143398, 0.015555, 0.014757, 0.004934, 0.015861, 0.238943, 0.007998, 0.029135, 0.010779, 0.092011, -0.726275, 0.011652},
 	{0.149259, 0.018739, 0.014602, 0.011335, 0.074565, 0.022417, 0.043805, 0.013932, 0.008807, 0.581952, 0.133956, 0.022726, 0.153161, 0.048356, 0.023429, 0.017317, 0.103293, 0.027186, 0.023418, -1.085487},
 };
+
+/* Le-Gascuel 2008 model data from Harry Yoo
+   https://github.com/hyoo/FastTree
+*/
+double statLG08[MAXCODES] = {0.079066, 0.055941, 0.041977, 0.053052, 0.012937, 0.040767, 0.071586, 0.057337, 0.022355, 0.062157, 0.099081, 0.0646, 0.022951, 0.042302, 0.04404, 0.061197, 0.053287, 0.012066, 0.034155, 0.069147};
+
+double matrixLG08[MAXCODES][MAXCODES] = {
+   {-1.08959879,0.03361031,0.02188683,0.03124237,0.19680136,0.07668542,0.08211337,0.16335306,0.02837339,0.01184642,0.03125763,0.04242021,0.08887270,0.02005907,0.09311189,0.37375830,0.16916131,0.01428853,0.01731216,0.20144931},
+   {0.02378006,-0.88334349,0.04206069,0.00693409,0.02990323,0.15707674,0.02036079,0.02182767,0.13574610,0.00710398,0.01688563,0.35388551,0.02708281,0.00294931,0.01860218,0.04800569,0.03238902,0.03320688,0.01759004,0.00955956},
+   {0.01161996,0.03156149,-1.18705869,0.21308090,0.02219603,0.07118238,0.02273938,0.06034785,0.18928374,0.00803870,0.00287235,0.09004368,0.01557359,0.00375798,0.00679131,0.16825837,0.08398226,0.00190474,0.02569090,0.00351296},
+   {0.02096312,0.00657599,0.26929909,-0.86328733,0.00331871,0.02776660,0.27819699,0.04482489,0.04918511,0.00056712,0.00079981,0.01501150,0.00135537,0.00092395,0.02092662,0.06579888,0.02259266,0.00158572,0.00716768,0.00201422},
+   {0.03220119,0.00691547,0.00684065,0.00080928,-0.86781864,0.00109716,0.00004527,0.00736456,0.00828668,0.00414794,0.00768465,0.00017162,0.01156150,0.01429859,0.00097521,0.03602269,0.01479316,0.00866942,0.01507844,0.02534728},
+   {0.03953956,0.11446966,0.06913053,0.02133682,0.00345736,-1.24953177,0.16830979,0.01092385,0.19623161,0.00297003,0.02374496,0.13185209,0.06818543,0.00146170,0.02545052,0.04989165,0.04403378,0.00962910,0.01049079,0.00857458},
+   {0.07434507,0.02605508,0.03877888,0.37538659,0.00025048,0.29554848,-0.84254259,0.02497249,0.03034386,0.00316875,0.00498760,0.12936820,0.01243696,0.00134660,0.03002373,0.04380857,0.04327684,0.00557310,0.00859294,0.01754095},
+   {0.11846020,0.02237238,0.08243001,0.04844538,0.03263985,0.01536392,0.02000178,-0.50414422,0.01785951,0.00049912,0.00253779,0.01700817,0.00800067,0.00513658,0.01129312,0.09976552,0.00744439,0.01539442,0.00313512,0.00439779},
+   {0.00802225,0.05424651,0.10080372,0.02072557,0.01431930,0.10760560,0.00947583,0.00696321,-1.09324335,0.00243405,0.00818899,0.01558729,0.00989143,0.01524917,0.01137533,0.02213166,0.01306114,0.01334710,0.11863394,0.00266053},
+   {0.00931296,0.00789336,0.01190322,0.00066446,0.01992916,0.00452837,0.00275137,0.00054108,0.00676776,-1.41499789,0.25764421,0.00988722,0.26563382,0.06916358,0.00486570,0.00398456,0.06425393,0.00694043,0.01445289,0.66191466},
+   {0.03917027,0.02990732,0.00677980,0.00149374,0.05885464,0.05771026,0.00690325,0.00438541,0.03629495,0.41069624,-0.79375308,0.01362360,0.62543296,0.25688578,0.02467704,0.01806113,0.03001512,0.06139358,0.02968934,0.16870919},
+   {0.03465896,0.40866276,0.13857164,0.01827910,0.00085698,0.20893479,0.11674330,0.01916263,0.04504313,0.01027583,0.00888247,-0.97644156,0.04241650,0.00154510,0.02521473,0.04836478,0.07344114,0.00322392,0.00852278,0.01196402},
+   {0.02579765,0.01111131,0.00851489,0.00058635,0.02051079,0.03838702,0.00398738,0.00320253,0.01015515,0.09808327,0.14487451,0.01506968,-1.54195698,0.04128536,0.00229163,0.00796306,0.04636929,0.01597787,0.01104642,0.04357735},
+   {0.01073203,0.00223024,0.00378708,0.00073673,0.04675419,0.00151673,0.00079574,0.00378966,0.02885576,0.04707045,0.10967574,0.00101178,0.07609486,-0.81061579,0.00399600,0.01530562,0.00697985,0.10394083,0.33011973,0.02769432},
+   {0.05186360,0.01464471,0.00712508,0.01737179,0.00331981,0.02749383,0.01847072,0.00867414,0.02240973,0.00344749,0.01096857,0.01718973,0.00439734,0.00416018,-0.41664685,0.05893117,0.02516738,0.00418956,0.00394655,0.01305787},
+   {0.28928853,0.05251612,0.24529879,0.07590089,0.17040121,0.07489439,0.03745080,0.10648187,0.06058559,0.00392302,0.01115539,0.04581702,0.02123285,0.02214217,0.08188943,-1.42842431,0.39608294,0.01522956,0.02451220,0.00601987},
+   {0.11400727,0.03085239,0.10660988,0.02269274,0.06093244,0.05755704,0.03221430,0.00691855,0.03113348,0.05508469,0.01614250,0.06057985,0.10765893,0.00879238,0.03045173,0.34488735,-1.23444419,0.00750412,0.01310009,0.11660005},
+   {0.00218053,0.00716244,0.00054751,0.00036065,0.00808574,0.00284997,0.00093936,0.00323960,0.00720403,0.00134729,0.00747646,0.00060216,0.00840002,0.02964754,0.00114785,0.00300276,0.00169919,-0.44275283,0.03802969,0.00228662},
+   {0.00747852,0.01073967,0.02090366,0.00461457,0.03980863,0.00878929,0.00409985,0.00186756,0.18125441,0.00794180,0.01023445,0.00450612,0.01643896,0.26654152,0.00306072,0.01368064,0.00839668,0.10764993,-0.71435091,0.00851526},
+   {0.17617706,0.01181629,0.00578676,0.00262530,0.13547871,0.01454379,0.01694332,0.00530363,0.00822937,0.73635171,0.11773937,0.01280613,0.13129028,0.04526924,0.02050210,0.00680190,0.15130413,0.01310401,0.01723920,-1.33539639}
+};

-- 
Alioth's /usr/local/bin/git-commit-notice on /srv/git.debian.org/git/debian-med/fasttree.git



More information about the debian-med-commit mailing list