[med-svn] [Git][med-team/paml][master] 5 commits: New upstream version
Steffen Möller
gitlab at salsa.debian.org
Sun Nov 10 18:02:40 GMT 2019
Steffen Möller pushed to branch master at Debian Med / paml
Commits:
eebeb88c by Steffen Moeller at 2019-11-10T17:54:42Z
New upstream version
- - - - -
cdd16ab7 by Steffen Moeller at 2019-11-10T17:54:43Z
New upstream version 4.9j+dfsg
- - - - -
88674894 by Steffen Moeller at 2019-11-10T17:54:48Z
Update upstream source from tag 'upstream/4.9j+dfsg'
Update to upstream version '4.9j+dfsg'
with Debian dir a049040a7b5f985546e72941417702fab607e106
- - - - -
5f904022 by Steffen Moeller at 2019-11-10T17:55:50Z
Upload to unstable
- - - - -
3eb60261 by Steffen Moeller at 2019-11-10T17:58:58Z
About to upload new version
- - - - -
12 changed files:
- debian/changelog
- debian/control
- debian/rules
- doc/pamlHistory.txt
- − doc/pamlHistory.txt~
- − paml4.9i.tgz
- src/mcmctree.c
- src/paml.h
- − src/pll_avx.c
- − src/pll_sse.c
- src/tools.c
- src/treesub.c
Changes:
=====================================
debian/changelog
=====================================
@@ -1,3 +1,11 @@
+paml (4.9j+dfsg-1) unstable; urgency=medium
+
+ * New upstream version
+ * Also cleaning binaries after build
+ * Standards-Version: 4.4.1
+
+ -- Steffen Moeller <moeller at debian.org> Sun, 10 Nov 2019 18:54:50 +0100
+
paml (4.9i+dfsg-1) unstable; urgency=medium
* New upstream version
=====================================
debian/control
=====================================
@@ -6,7 +6,7 @@ Uploaders: Pjotr Prins <pjotr.debian at thebird.nl>,
Section: science
Priority: optional
Build-Depends: debhelper-compat (= 12)
-Standards-Version: 4.4.0
+Standards-Version: 4.4.1
Vcs-Browser: https://salsa.debian.org/med-team/paml
Vcs-Git: https://salsa.debian.org/med-team/paml.git
Homepage: http://abacus.gene.ucl.ac.uk/software/paml.html
=====================================
debian/rules
=====================================
@@ -39,3 +39,8 @@ override_dh_link-arch:
override_dh_fixperms-arch:
dh_fixperms -a
find debian/*/usr/lib/paml/data -type f -exec chmod -x \{\} \;
+
+override_dh_auto_clean:
+ dh_auto_clean
+ rm -f src/baseml src/basemlg src/chi2 src/codeml src/evolver src/infinitesites src/mcmctree src/pamp src/yn00
+
=====================================
doc/pamlHistory.txt
=====================================
@@ -8,6 +8,31 @@ discussion site
https://groups.google.com/forum/#!forum/pamlsoftware.
+
+Version 4.9j, October 2019
+
+(*) mcmctree, initial values have ages that are too old, exceeding max
+bounds. In theory bounds in mcmctree are always soft so that the node
+ages will move to the area of posterior mode when burnin is long
+enough. In practice the poor starting values are problematic and
+requires long burn-in. I have rewritten the code for generating
+initial values to respect the min and max bounds specified in the
+fossil calibrations.
+
+(*) codeml, clade labels in the tree file. A bug introduced in 4.9i
+caused the clade labels ($) to be ignored. This affects the branch,
+branch-site and clade models. If your tree has branch labels (#) only
+and no clade models, everything will be fine. If you have used the
+clade label ($) in the tree with or without the branch label (#),
+either the program will abort or the results will be incorrect. The
+clade label ($) is supposed to label all branches in the clade as well
+as the branch itself, but all clade labels in the tree are ignored in
+4.9i. Earlier versions are correct.
+
+Thanks to Yonghua Wu for pointing out the bug.
+
+
+
Version 4.9i, March 2019
(*) mcmctree: I have added an option (duplication = 1) for dating a
=====================================
doc/pamlHistory.txt~ deleted
=====================================
@@ -1,1969 +0,0 @@
-History and bug fixes of PAML (Phylogenetic Analysis by Maximum Likelihood)
-
-Ziheng Yang
-
-
-If you find problems or have questions, please visit the PAML
-discussion site
-https://groups.google.com/forum/#!forum/pamlsoftware.
-
-
-Version 4.9i, February 2019
-
-(*) mcmctree: I have added an option (duplication = 1) for dating a
-tree with both speciations and gene duplications, so that some nodes
-on the tree share divergence times. Nodes sharing ages are identified
-using labels in the tree file: #1, #2, .... I have yet to update the
-document about specification of the model.
-
-(*) mcmctree: The TipDate option was written for one locus or
-partition and never worked for more than two loci/partitions. I have
-edited the code so that it works for multiple partitions, some of
-which may be molecular and the others morphological.
-
-(*) codeml: The option estFreq = 0 when codonFreq = 6 (FMutSel0) and 7
-(FMutSel) is not working in versions 4.9g and 4.9h. This is fixed
-now. This option uses the observed codon or amino acid frequencies
-for the mutation-selection models of codon usage. Instead the program
-estimates the frequencies using maximum likeihood, which is what the
-option estFreq = 1 does. Look at the README file in the
-examples/mtCDNAape/ folder.
-
-(*) codeml clade model D: The bounds for the w (dN/dS) ratios in the
-first site classes are set tp (0.0001, 0.5) for w0 and (0.5, 1.5) for
-w1, in versions 4.9b,c,d,e,f,g, since I added the BEB calculation for
-clade model D in 4.9b. The motivation for the bounds is that site
-class 0 represents strong purifying selection with a small w0, while
-site class 1 should include sites under weak purifying selection with
-a larger w1. However the bounds are arbitrary. In some datasets, the
-MLEs are found to be at the bounds, making the interpretation awkward.
-I have changed the bounds to the following:
-w0b[]={0.0001, 1.0}, w1b[]={0.01, 1.5}.
-This means that the user should swap the estimates of w0 and w2 if w0 > w1.
-
-
-
-Version 4.9h, March 2018
-
-(*) mcmctree: gamma-Dirichlet versus conditional i.i.d. priors for
-rates for loci. Since 4.9d, the program and the documentation are
-inconsistent about the two priors, and which value (0 or 1) means
-which prior. I have now checked the program and the documentation to
-make sure that they are consistent:
-
-prior = 0: gamma-Dirichlet (dos Reis 2014). This is the default.
-prior = 1: conditional i.i.d. prior (Zhu et al. 2015).
-
-I believe these two are similar especially if the number of loci
-(partitions) is large, but no serious comparisons between the two
-priors have been published.
-
-Thanks to Adnan Moussalli for pointing out the errors.
-
-(*) codeml. It was discovered that the mechanistic amino acid
-substitution model implemented in Yang et al. (1998; see table 3),
-specified by seqtype = 2 model = 6, has been broken for a long time,
-since version 3.0 (2000) at least. Version 2.0 (1999) seems to be
-correct. This means that the model become broken soon since it was
-published. I have now fixed this.
-
-This model of amino acid substitution starts from a Markov chain for
-codons and then aggregate the states and merge the synonymous codons
-into one state (the coded amino acid). This is an approximate
-formulation since the process after state aggregation is not Markovian
-anymore.
-
-I have now added another codon-based amino acid substitution model
-that treats amino acids as ambiguities codons. The model is specified
-by seqtype = 2 model = 5. This is an exact formulation.
-
-
-(*) codeml. The number of categories in the BEB calculation under M2
-and M8 is unintentionally set to 4 rather than 10. I have changed
-this back to 10. The details of this calculation are in Yang et
-al. 2005 MBE.
-
-
-Version 4.9g, December 2017
-
-(*) codeml. A bug caused the BEB calculation under the site model M8
-(NSsites = 8) to be incorrect, with the program printing out warming
-messages like "strange: f[ 5] = -0.0587063 very small." This bug was
-introduced in version 4.9b and affects versions 4.9b-f. A different
-bug was introduced in version 4.9f that causes the log likelihood
-function under the site model M8 (NSsites = 8) to be calculated
-incorrectly. These are now fixed.
-
-
-
-Version 4.9f, October 2017
-
-(*) baseml, nonhomogeneous models (nhomo & fix_kappa). Those models
-allow different branches on the tree to have different Q matrices.
-Roughly nhomo controls the base frequency parameters while gix_kappa
-controls kappa or the exchangeability parameters (a b c d e in
-GTR/REV, for example). I added the option (nhomo = 5, fix_kappa = 2),
-which lets the user to define branch types, so that branches of the
-same type have the same exchangeability parameters (a b c d e for GTR)
-and base composition parameters, while branches of different types
-have different parameters. Branch types are labeled (using # and $),
-0, 1, 2, .... The labels should be consecutive positive integers.
-The old options nhomo = 3 or 4 work for some models like GTR, but not
-some other models which also have base composition parameters. In
-this update, I think those options should work with all those models.
-I have also edited the documentation (look for option variable nhomo
-for baseml).
-
-
-(*) baseml & codeml. i added an option fix_blength = 3
-(proportional), which means that branch lengths will be proporational
-to those given in the tree file, and the proportionality factor is
-estimated by ML.
-
-(*) codeml. The program does not count the parameters correctly for
-model M0 when fix_kappa = 1. The bug was introduced in version 4.9c
-and affects versions 4.9c-e. This is now fixed.
-
-(*) codeml (seqtype = 2 model = 2). If you are analyzing multiple
-protein data sets (ndata > 1) under the empirical models such as wag,
-jtt, dayhoff. The results for the first data set are correct, but all
-later data sets are analyzed incorrectly under the corresponding +F
-models, that is, wag+F, jtt+F, dayhoff+F, etc. A bug in the program
-means that for the second and later data sets, the equilibrium amino
-acid frequencies are taken from the real data and not correctly set to
-those specified by the empirical models. I note that this bug was
-recorded in the update Version 3.14b, April 2005, but it was somehow
-not fixed, even in that version. This is now fixed. Thanks to Nick
-Goldman for reporting this again.
-
-(*) evolver (options 5, 6, 7 for simulating nucleotide, codon and
-amino acid alignments). If you choose the option of printing out the
-site pattern counts instead of the sequences (specified at the
-beginning of the control file such as MCbase.dat), and if you are
-simulating two or more alignments, the program crashes after finishing
-the first alignment. This is now fixed.
-
-(*) mcmctree. The program crashes if you have a mixture of
-morphological loci and molecular loci, if not all the morphological
-loci are before the molecular loci. I have now fixed this.
-I think this was never described anyway.
-
-
-
-Version 4.9e, March 2017
-
-(*) Edited the readme files to change the license to GPL.
-
-(*) mcmctree. A bug was introduced in version 4.9b which causes the
-program to read the fossil calibration information in the tree file
-incorrectly, if joint (minimum and maximum) bounds are specified using
-the symbol '<' and '>'. If you use the notation "B()", "L()", and
-'U()', the information is read correctly. This bug was introduced in
-version 4.9b and exists in 4.9c and 4.9d. Versions 4.9a and earlier
-were correct.
-
-
-
-Version 4.9d, February 2017
-
-(*) mcmctree. Changed the default prior for rates for loci to
-gamma-Dirichlet (dos Reis 2014), and updated the documentation as
-well. It was set to the conditional i.i.d. prior (Zhu et al. 2015).
-
-(*) mcmctree. Added Bayes factor calculation. A program called
-BFdriver is included in the release, as well as a pdf document in the
-folder examples/DatingSoftBound/BFdriverDOC.pdf. We suggest that you
-use the exact likelihood calculation when you use this option, since the
-normal approximation is unreliable when the power posterior is close to
-the prior (when beta is small).
-
-
-
-Version 4.9c, September 2016
-
-(*) Added GPL license statement in various places.
-
-(*) codeml. When two or more trees are in the tree file and the
-specified model is the branch model, the program counts the number of
-parameters correctly only for the first tree. For the second or later
-trees, it overcounts the parameters in the model. The lnL and
-parameter estimates are still correct, but the printed values for the
-extra parameters are more or less random numbers and vary among
-different runs of the same analysis. The bug was introduced in
-version 4.7a and affects the versions since then until 4.9b. It is
-now fixed. Thanks to Casey Bergman.
-
-
-
-Version 4.9b, March 2016
-
-(*) codeml. I have added the BEB calculation for clade model D. I
-deleted the option of having two site classes under model D so that
-both clade models C and D now assume three site classes. The
-difference between models C and D is that in C, w1 = 1 is fixed, while
-in D, w1 is a free parameter, just like w0. Both the ML iteration and
-the BEB calculation under models C and D allow two or more clades
-(branch types), but note that an increase in the number of clades by 1
-roughly increases the computation by 10 times. Also the BEB
-calculation under model D is about 10 times more expensive than that
-under model C. I updated the codeml section of the documentation to
-remove the description of the old models (like M1 and M2) and focus on
-the new models that are in the program (like M1a and M2a).
-
-baseml & codeml. If the tree has labels using the symbol #,
-the program will crash. This is caused by a bug in the routine for
-reading trees from the tree file. This is now fixed.
-
-
-
-Version 4.9a, September 2015
-
-codeml (seqtype = 2 for amino acid sequences) crashes when the dataset
-is large, with more than 2M sites, say. This is due to a stupid bug
-in an irrelevant part of the code. This is fixed in 4.9a.
-
-
-Version 4.9, January 2015
-
-(*) mcmctree. There are now two priors for locus rates (mu_i or rates
-for loci), as summarized in Zhu et al. (2014 Sys Biol). They are
-specified as follows
-
- rgene_gamma = alpha_u beta_u alpha prior * gamma prior for overall rates for genes
-
-where prior = 0 (default) is the conditional i.i.d. prior and prior =
-1 is the gamma-dirichlet prior.
-
-For example the following specifies the conditional i.i.d. prior with
-alpha_mu = 100, beta_mu = 1000, and alpha = 0.5. Here the average
-overall rate is 0100/1000 = .1 changes per site per time unit. alpha
-= 0.5 describes the extent of rate variation among loci.
-
- rgene_gamma = 100 1000 0.5 0 * conditional i.i.d. prior for locus rates
- sigma2_gamma = 4 100 0.5 * prior for sigma^2 (for clock=2 or 3)
-
-The sigma2_i prior has parameters alpha_u, beta_u, alpha, specified in
-the same format.
-
-(*) mcmctree. I have changed the option of setting the finetune step
-lengths for the MCMC algorithms so that each parameter in the model
-has its own step lengths. As a result, the step lengths are all
-automatically adjusted during the burn-in, and cannot be fixed or
-specified by the user. The line
- finetune = 1: .0040 0.030 0.015 0.03 .55 * times, rates, mixing, paras, RateParas
-in the control file, if present, is read but ignored.
-
-(*) mcmctree. The calibration density for S2N (skew 2 normals) is
-incorrectly implemented. While the correct formula should be
- p0 * f1(t) + (1 - p0) f2(t)
-the incorrect formula used is
- p0 * f1(t) + f2(t).
-Basically the two skew-normal densities was not given the correct weights.
-If you run the program without data to make sure that the effective prior
-is sensible, then this mistake should have little impact. The bug is now
-fixed. Thanks to Mario dos Reis.
-
-
-
-
-Version 4.8a, August 2014
-
-(*) baseml (nhomo = 3 or 4). I have added the nonhomogeneous GTR
-(REV) model with a set of abcde parameters for every branch. The
-specifications are as follows.
-
-model=7 nhomo=4 fix_kappa=0: one set of rate parameters abcde in GTR for the whole tree.
-model=7 nhomo=4 fix_kappa=1: one set of rate parameters abcde in GTR for every branch.
-
-In version 4.8, only the second option (one set of abcde parameters
-for the whole tree) is implemented, no matter whether you choose 0 or
-1 for fix_kappa. After this change, the use of fix_kappa in the nhomo
-models is consistent between HKY and GTR. For HKY, the specifications
-have always been as follows.
-
-model=4 nhomo=4 fix_kappa=0: one kappa in HKY for the whole tree.
-model=4 nhomo=4 fix_kappa=1: one kappa in HKY for every branch.
-
-Note that nhomo=4 means one set of frequency parameters for every
-branch on the tree.
-
-I have also added a method for calculating the expected number of
-changes from any nucleotide i to nucleotide j along every branch under
-the nonhomogeneous models (nhomo=3 or 4). This is called EMC, and
-accounts for the fact that the base frequencies are changing over
-time, and so does the rate of substitution. It is invoked by using
-RateAncestor = 1 and verbose = 1.
-
-I tested only the nhomo=4 option carefully and the options nhomo=3 and
-5 are not well tested.
-
-
-
-Version 4.8, March 2014
-
-(*) mcmctree. The prior for rates for loci (with ndata = 2 or more)
-has changed in this version, and the gamma-Dirichlet prior is used.
-The old i.i.d. prior is replaced and unavailable anymore. The new
-gamma-Dirichlet prior is decribed in a paper
-
-dos Reis, M., T. Zhu, and Z. Yang. 2014. The impact of the rate prior
-on Bayesian estimation of divergence times with multiple
-loci. Syst. Biol. 63:555-565.
-
-(*) codeml. Added a Bayesian method for estimating distance t and dN/dS
-ratio w in pairwise comparisons. The option is
- runmode = -3 * -3: pairwise Bayesian
- runmode = -3 1.1 1.1 1.1 2.2 * -3: pairwise Bayesian
-The four numbers are the parameters in the gamma priors for t and w (a_t b_t a_w b_w).
-The method is described in the following paper:
-
-Angelis, K., M. dos Reis, and Z. Yang. 2014. Bayesian estimation of
-nonsynonymous/synonymous rate ratios for pairwise sequence
-comparisons. Mol. Biol. Evol. 31:1902-1913.
-
-(*) baseml, codeml, mcmctree). In version 4.7a, I disabled the option
-of reading species indexes in the tree file (for example, the number
-19, instead of the sequence name, may be used in the tree file to mean
-the 19th sequence in the sequence alignment). I have added this back.
-Version 4.7 does not have this problem.
-
-(*) codeml in paml 4.6 and 4.7 has a bug if your sequences have stop
-codons. Stop codons have always been disallowed in the codon models
-implemented in codeml. In versions 4.6-4.7, if there is a stop codon
-in any sequence, the codons in all sequences at that site (column in
-alignment) are changed into ??? and then the sequences are processed
-in the usual way. The likelihood should then be the same as the
-likelihood if that column is removed from the alignment. However, the
-implementation of the idea was incorrect and gives wrong results if
-the sequences do not contain any ambiguity characters and if you
-choose cleandata = 0. I have now fixed this. The idea is not really
-useful, so the good advice is that you delete the column of stop
-codons before the data are analyzed using codeml.
-
-
-Version 4.7a, October 2013
-
-(*) codeml. The mutation-selection model of Yang and Nielsen (2008)
-is noted to work for the site and branch-site models but not for the
-branch models. This is now fixed.
-
-(*) mcmctree. I fixed and modified the infinitesites program, which
-generates the limiting distribution when the amount of sequence data
-approaches infinity. This works for both the clock (clock=1) and
-relaxed clock models (clock=2 or 3). The mcmctree tutorial explains
-how to run this program.
-
-
-Version 4.7, January 2013
-
-(*) mcmctree. This version includes a method for dating divergences
-using sampled tips, as in viral datasets. The prior for node ages in
-the given rooted tree is generated using a
-birth-death-sequential-sampling model, described in Stadler and Yang
-(2012 Syst Biol, submitted). The SIV/HIV-2 example dataset analyzed
-in the paper is in the examples/TipDate/ folder. Look at the readme
-file there.
-
-(*) codeml. Version 4.6 crashes if you have a lot of ambiguity codons
-in the alignment, or precisely if the total number of fully determined
-codons (61 in the standard code) and the ambiguous codons exceed 127.
-I was using "unsigned char" to represent the codons, which go from 0
-to 255 and can represent 256 distinct codons. Somehow I changed this
-to "char", which on some systems goes from -128 to 127. Can't
-remember why I made the change. Anyway this is now fixed. Versions
-4.5 or earlier do not have this problem. Thanks to David Yu for
-pointing out the problem.
-
-
-
-Version 4.6, September 2012
-
-(*) baseml. The model of autocorrelated-rates among sites (Yang 1995
-Genetics 139:993-1005) is broken in paml versions 4.3 to 4.5
-inclusive. Earlier versions were apparently fine. When you select
-the auto-discrete-gamma model (alpha > 0, rho > 0) or the
-nonparametric autocorrelated-rates model (fix_alpha = 0, ncatG = 3,
-nparK = 4), the program aborts with the following error message:
-
-"Error: use 10, 20, 32, 64, 128, 512, 1024 for npoints for legendre.."
-
-This is now fixed. The option of nparK = 3 seems to have problems in
-all versions, and the program prints out the following warning
-message.
-
-"nparK==3, double stochastic, may not work. Use nparK=4?"
-
-This option is supposed to fit a Markov-dependent model with equal
-proportions in the rate categies. I have not got time to investigate
-this option, but a glance at table 3 in that paper indicates that this
-model was not used in the table and probably never properly
-implemented, so please follow the advice and use nparK = 4 instead.
-
-(*) codeml. I have now added the site model M2a_rel of Weadick & Chang
-(2012 Mol. Biol. Evol.), in which w2 > 0. This is specified using
-NSsites = 22 while model M2a is specified as NSsites = 2 (with w2 >
-1). M2a_rel is the null model for the likelihood ratio test based on
-clade model C, suggested by Weadick & Chang (2012 Mol. Biol. Evol.).
-Note that the very old site model M2 from Nielsen & Yang (1998
-Genetics) assumes w0 = 0. Yang, Wong, Nielsen (2005 MBE) changed the
-model so that 0 < w0 < 1 and renamed the model M2a. My tests using
-the versions around that time reveals the following:
-
-3.13d (M2): w0 = 0, w2 > 0
-3.14a (M2a): w0 > 0, w2 >= 1
-3.14b (M2a): w0 > 0, w2 >= 1
-
-If seems that when we changed M2 into M2a, we also changed the
-constraint on w2. I don't have a copy of version 3.14 to check.
-
-(*) basseml & codeml. To following option
- bootstrap = 1000
-
-should produce 1000 bootstrap samples in the file boot.txt. A bug is
-introduced in versions 4.4 and 4.5, so that garbiage is created
-instead of bootstrap datasets. Versions 4.3 and earlier were correct.
-This bug is now fixed.
-
-(*) evolver. The simulation of codon sequences (evolver 6
-MCcodon.dat) is incorrect when a genetic code different from the
-universal code is used. This is a conceptual error I made and it has
-been in all previous versions when simulation of codon sequences is
-allowed. I have assumed incorrectly that setting the frequencies of
-stop codons to 0 should be sufficient to get correct results. The
-zero frequencies allow the program to identify the stop codons
-correctly, but one still needs the code table to know which changes
-are synonymous and which nonsynonymous. evolver has been using the
-universal code to simulate codon sequences. Simulation using the
-universal code has been correct, and simulation using the
-mitochondrial code (for example) is affcted and the results should be
-incorrect although the impact may be slight. This bug is now fixed.
-A variable is added in the control file MCcodon.dat right below the
-codon frequencies, which tells the program which genetic code to use.
-
-1 * genetic code (0:universal; 1:mammalian mt; 2-10:see below)
-
-The program also checks that the stop codons have frequency 0 and that
-the frequencies of the sense codons sum to 1. Thanks to Mario dos
-Reis for pointing this out.
-
-(*) mcmctree. A bug is found in the calculation of the prior on rates
-under the correlated-rates model (clock=3). This bug has been in
-version 4 (July 2007) to version 4.5 (December 2011), inclusive. The
-prior is calculated by multiplying the conditional densities of
-equation (7) in Rannala & Yand (2007) over all interior nodes on the
-master tree (see also Figure 1 in that paper). However, the rA used
-in the program was not for the current branch: it was instead for the
-ancestral branch. The correct formula used was correct for the root
-of the master tree, but incorrect for other interior nodes. The
-impact of the bug on posterior time estimates may be expected to be
-small, but if the clock is seriously violated and rates change over
-neighboring branches, this may be a concern.
-
-
-
-Version 4.5, December 2011
-
-baseml. I have added the joint ancestral reconstruction under the
-nonhomogeneous models (nhomo = 3 or 4). This works for the model of
-one rate for all sites, and does not work for the model of gamma rates
-for sites or the partition models. Also I implemented the joint
-reconstruction only and not the marginal reconstruction.
-
-mcmctree. The program now prints out a file called FigTree.tre in the
-current working directory that is readable by FigTree. The branch
-lengths show the posterior means of times while the bars generated by
-FigTree will indicate the 95% CIs. I have also added an option to
-deal with TipDate data, such as viral sequences with dates. See
-examples/TipDate/README.txt for more notes.
-
-
-
-Version 4.4e, May 2011
-
-mcmctree. I have added an option of automatically adjusting the step
-lengths for proposals in the MCMC algorithm. The program will use the
-burn in to automatically adjust the step lengths, using those values
-specified in the control file as initial values.
-
-codeml: The iteration method that updates one branch length at a time
-(method = 1) is broken for the free-ratios model (model = 1) from
-versions 4.3 to 4.4d inclusive. When codeml runs, you will see
-messages like the following on the screen:
-Warning rounding error? b=3 cycle=0 lnL=6036.9413757 != 6070.1069284
-Version 4.2 and earlier are correct. This bug is now fixed.
-Thanks to Tae-Kun Seo for reporting the bug.
-
-
-
-Version 4.4d, March 2011
-
-yn00. The option of using weighting (weighting = 1) in the
-calculation of dN and dS by the YN00 method (Yang and Nielsen 2000)
-has been broken since Version 4.4 (February 2010). A bug was
-introduced when I changed the character coding in Version 4.4, so that
-this bug affects Versions 4.4, 4.4a, 4.4b, and 4.4c. The bug
-typically causes a crash. The result for weighting = 0 is not
-affected. Versions before 4.4 are correct. This bug is now fixed.
-Thanks for Charlie
-(http://www.ucl.ac.uk/discussions/viewtopic.php?t=8726&highlight=yn00).
-
-
-
-Version 4.4c, August 2010
-
-mcmctree. A bug causes the exotic calibrations (gamma, skew normal
-and skew t) to be sometimes ignored, if you have multiple loci
-(partitions) and if some species/sequences are missing at some loci.
-If all calibrations are bounds (minimum, maximum, and joint), or if
-the partitions have the same number of species/sequences, the
-calculation is still correct. This bug is in versions 4.1-4.4b. This
-is now fixed. Thanks to Mike Steiper.
-
-
-
-Version 4.4b, July 2010
-
-mcmctree. A serious bug has been discovered which affects the option
-of approximate likelihood calculation (usedata = 2). If the first
-child of the root in the tree for any locus is a tip, the MLEs of
-branch lengths in the in.BV file are assigned to wrong branches, so
-that the results are entirely incorrect, and also very different from
-those obtained using the exact likelihood calculation (usedata = 1).
-Suppose the tree for a locus is (A, B), with a bifurcation at the
-root, and with A and B representing species or clades. If A is a tip,
-the program is incorrect, while if A is an internal node, the program
-is correct. This bug is in versions 4.2b - 4.4a. Hopefully this is
-now corrected. Thanks to Mario dos Reis.
-
-mcmctree. I also changed the specification on L, U, and B bounds to
-allow the user to specify the tail probabilities (which used to be
-fixed at 2.5%) for each fossil calibration. See table 8 in
-pamlDOC.pdf for details. You will still have to run the program
-without data to get the effective prior of times used by the program.
-
-
-
-Version 4.4a, June 2010
-
-codeml (RateAncestor=1). For ancestral reconstruction under codon
-models, the program prints out the translated amino acids when it
-prints the codons. A bug caused the translation to be incorrect,
-so instead of the amino acid, the program prints something like h or
-H. This bug was introduced in Version 4.4, February 2010 when I
-changed the way that the characters are coded. This is fixed now.
-
-baseml (clock = 1, 2, or 3 Mgene option). If you assign different
-rates for different genes (or site parations) and the model assumes
-global clock or local clock, the program fails to print out the
-relative rates for genes. This bug seems to be introduced between
-3.13 and 3.14 and has since. The log likelihood value and estimates
-of parameters are all correct, and the problem is only that MLEs of
-rates for genes are not printed in the output file properly.
-
-
-
-Version 4.4a, April 2010
-
-In versions 4.2, 4.3, 4.4, Clade model D does not print the w
-estimates for the site classes correctly. For the example run in
-examples/datasets2/, the incorrect output looks like the following
-
-site class 0 1 2
-proportion 0.48590 0.10718 0.40692
-branch type 0: 0.10557 4.17186 4.17186
-branch type 1: 0.10557 4.17186 3.05340
-
-while the correct results (from version 4.1) are
-
-site class 0 1 2
-proportion 0.48590 0.10718 0.40692
-background w 0.10557 4.17186 3.05340
-foreground w 0.10557 4.17186 0.95081
-
-The ratios w2 and w3 for site class 2 are incorrectly printed, with w2
-equal to w1. The branch lengths and lnL values etc. are all correct.
-This bug affects versions 4.2, 4.3, and 4.4, and version 4.1 and
-earlier versions appear to be correct. Also the bug affects clade
-model D, and clade model C appears to be fine. This is fixed now.
-Thanks to cajawe for reporting the bug (see
-http://www.ucl.ac.uk/discussions/viewtopic.php?p=28240#28240).
-
-
-
-Version 4.4, February 2010
-
-I changed the way that the characters are coded in the programs, which
-means that essentially every program in the package is modified. It
-is likely that some bugs are introduced during the process.
-
-codeml. The iteration algorithm specified by method = 1 when applied
-to amino acid model of gamma rates among sites (seqtype = 2, alpha >
-0, method = 1) is broken in versions 4.2-4.3b. The bug causes the
-iteration to fail to converge, with messages like the following
-printed on the monitor:
-"Warning rounding error? b=5 cycle=0 lnL=1718.5667685 != 1749.4724545".
-
-mcmctree. If the gamma prior for rates has alpha <1/3, the initial
-values are generated incorrectly. This is now fixed.
-
-
-
-Version 4.3b, November 2009
-
-mcmctree. A bug was introduced in version 4.2, which causes the
-program to ignore the minimum age constraint on the root age and uses
-the maximum bound specified by RootAge in the control file, if there
-is a calibration on the root and it is a minimum-age constraint. If
-you have no calibration on the root, or if the calibration is an
-maximum bound, both minimum and maximum bound or gamma distribution,
-there is no problem. The error affects versions 4.2 and 4.3, and
-version 4.1 is correct. This is now fixed. Thanks to Mathieu
-Groussin.
-
-evolver. I have now added back the option of printing out the site
-pattern counts instead of the sequences (specified at the beginning of
-the control file such as MCbase.dat). Use of pattern counts takes
-less disk space. The option was removed in version 4.2b after the
-algorithm for collaspsing sites into patterns was rewritten.
-
-
-
-Version 4.3a, October 2009
-
-mcmctree. A bug was introduced in version 4.3, which causes the root
-age to be proposed incorrectly. The effect of the bug appears to be
-subtle and is hard to assess. It means that the results from all
-three clock models (clock = 0, 1, 2) are incorrect. Here is a bit more
-detail about the bug. The internal maximum node age was set too high,
-and caused loss of significant digits in a smart reflection algorithm
-which reflects proposed values into the feasible interval. This
-update fixes the bug. Please use this version to reanalyze your data
-if you have used version 4.3. Version 4.2 does not have this bug.
-
-mcmctree. I now supplies compiling options so that you can use hard
-minimum hard maximum, or hard minimum soft maximum. The executables
-are called mcmctreeHH and mcmctreeHS. See the commands for compiling
-at the beginning of the file mcmctree.c.
-
-
-
-Version 4.3, August 2009
-
-mcmctree. The approximate likelihood calculation was unreliable,
-because the Hessian matrix was not calculated reliably, especially if
-the maximum likelihood estimates of some branch lengths are 0. I have
-now implemented the approach of Seo, Kishino and Thorne (2004), which
-appears to be more stable.
-
-baseml & codeml: I changed these two programs to read fasta alignments
-directly, working out the number of sequences and the sequence length
-automatically. However, this does not work when you have multiple
-alignments in one file, that is, if ndata = 2 or higher.
-
-
-
-Version 4.2b, April 2009
-
-mcmctree. There is a bug that leads to crash when usedata = 3 and when
-the data at some loci have ambiguity characters while other loci do
-not. usedata = 3 means that the program invokes baseml or codeml to
-calculate the maximum likelihood estimates (MLEs) of branch lengths
-and the variances of those MLEs. The temporary sequence alignment
-files for the loci (tmp?.txt) generated by mcmctree have garbiage and
-are unreadable by baseml (or codeml). This bug is now corrected.
-(You can replace the wrong alignment file for the locus by your
-original alignment and run baseml outside mcmctree.)
-
-
-All programs. I rewrote the code for collapsing sites into patterns.
-This should have no impact on user files, except that evolver does not
-produce data files in the "pattern-count" format anymore. Now two
-formats (paml/phylip format and PAUP nexus format) are used, specified
-by the first line of the control data file (e.g., MCbase.dat,
-MCcodon.dat, MCaa.dat). This pattern-count format is still accepted
-as input by baseml, codeml, etc. (See the section on "Site pattern
-counts" in the documentation).
-
-baseml & codeml. A bug was introduced to the local clock model
-(clock=6), so that the program generates the following error message:
-"need fossils for this locus". For example, running codeml on the
-control file /examples/MouseLemurs/codonml2.ctl will do so. This bug
-is now fixed.
-
-codeml: The bug introduced in version 4.2 concerning the free-ratios
-model (Yang 1998 MBE) was not fixed properly in version 4.2a (see the
-notes for 4.2a below). The results are incorrect when there are
-exactly 7 branch types (for example when you run the free-ratios model
-on a 5-species tree) but are correct for other numbers of branch
-types. I hope this is now finally fixed.
-
-
-
-Version 4.2a, February 2009
-
-
-codeml: Two bugs were known to have been introduced in version 4.2
-when I rewrote the code for the branch-site and clade models. The
-first is the null hypothesis in the branch-site test (model = 2
-NSsites = 2, fix_omega = 1 omega = 1). The reference for this model
-is Yang, Wong & Nielsen (2005), and Zhang et al. (2005). The
-alternative model (fix_omega = 0) is fine.
-
-The second bug is in the free-ratios model (Yang 1998 MBE).
-
-The results from those two models are incorrect and should be ignored
-if you used version 4.2. Both mistakes are fixed in this update.
-
-Thanks to W.P. Zhou and BYLee.
-
-
-
-Version 4.2, December 2008
-
-MCMCtree: The implementation of the lower bound and of the upper bound
-is changed. The improper density of equation (15) of Yang and Rannala
-(2006) for the lower bound is noted to push up divergence times, a
-feature we consider undesirable. Now a truncated Cauchy distribution
-is used instead. This will be described in more detail somewhere.
-
-The default file name for the MCMC samples is mcmc.out. To change
-this, add the following line in the control file mcmctree.ctl:
- mcmcfile = mcmc.out
-
-codeml: Clade models C and D are extended to accommodate more than two
-branch types. The old models accept two branch types only, referred
-to as foreground and background branches. The structure of the new
-models are below. You use branch/node numbers to specify the branch
-types (with #0 to be the default, followed by #1, #2, #3, ...). As
-before, the BEB calculation is implemented for clade model C only, and
-is not available for clade model D. The BEB calculation is expensive,
-so I expect clade model C will work if you have only a few branch
-types. For each additional branch type, the BEB computation is 10
-times more expensive.
-
-Site class Proportion w ratio
-
- 0 p0 0 < w0 < 1
- 1 p1 w1 (w1 = 1 is fixed in model C)
- 2 p2 w2, w3, w4, ...
-
-I also added an option of fixing the last w to 1 in clade model C,
-specified in the control file as follows. For example, if there are
-three branch types in the tree (see the table above), the last w fixed
-will be w4.
-
- fix_omega = 1
- omega = 1
-
-The two versions of the clade model can be compared to test whether
-the last w is > 1.
-
-
-codeml: branch model with amino acid chemical properties, specified
-using model = 2 and aaDist = 7. Under this model, the types of
-branches are specified just like the branch model (by labelling
-branches). For each branch type, the program estimates a few w's,
-depending on specifications in the file OmegaAA.dat. This model seems
-to be broken for a long time since its implementation. Look at the
-readme.txt file in examples/mtCDNAape/.
-
-
-
-Version 4.1, August 2008
-
-MCMCtree: A few changes have been introduced to this program. The
-format for specifying fossil calibrations in the tree file has been
-changed, so that the gamma distribution is specified as 'G(alpha,
-beta)' and the old format '>0.6=0.7<0.8' does not work anymore. The
-bounds can be specified using either the old form '<0.8' or the new
-form 'U(0.8)'. Two exotic distributions, skew normal and skew t, are
-added, using the format 'SN(location, scale, shape)' and 'ST(location,
-scale, shape, df)'. I edited the documentation pamlDOC.pdf and
-rewrote the part for mcmctree. The format of the MCMC output file
-mcmc.out has been modified so that it is readable by Andrew Rambaut's
-Tracer program. Note that MCMCtree starts taking samples and writing
-into the file only after burnin.
-
-codeml. The mutation-selection models of Yang and Nielsen (2008
-Mol. Biol. Evol. 25, 568-579) are specified using the options
-
- CodonFreq = 6 or 7 * 6:FMutSel0, 7:FMutSel
- estFreq = 1
-
-Use examples/mtCDNAape/codeml.HC.ctl to duplicate results in table 1
-in the paper. It seems that some results are in the mlc file while
-some other results are in the rst and rst1 files.
-
-
-codeml (branch model): When codon sequences are analyzed under the
-branch model (model = 1 or 2, NSsites = 0), the main output file
-includes a tree under the line "w ratios as labels for TreeView:".
-This line was introduced in version 3.15. However, the branch lengths
-in the tree had the dN values, while I intended them to be the
-original branch lengths (which are t, the number of nucleotide
-substitutions per codon). This bug may also affect the results of
-ancestral reconstruction under the same branch model. This bug is
-fixed now.
-
-
-baseml and codeml (discrete-gamma model): My intention has been to use
-the mean and not the median to represent all rates in each category
-when the discrete-gamma model is implemented. See Yang (1994
-J. Mol. Evol. 39:306-314) for the distinction. However, a bug caused
-a few recent versions to use the median instead. This affects baseml
-versions 3.14c and 3.15, while version 3.14b or earlier and version 4
-or later are correct. codeml in version 4a & 4b were incorrect while
-other versions are correct. I hope both baseml and codeml are correct
-from version 4c.
-
-
-baseml and codeml (branch and clade labels): The local clock models
-(clock = 2) in the two programs and also the branch or branch-site
-models in codeml use labels to identify branches. When nested clade
-label ($) are used, the program may get confused so that the branches
-are not labeled correctly. I think this is not fixed in version 4c.
-
-
-Version 4b, January 2008
-
-mcmctree. Changed a variable name in the control file from MaxRootAge
-to RootAge. This should be specified in any of the following formats:
-
- RootAge = <1.0 * constraint on root age, used if no fossil for root.
- RootAge = '<1.0' * constraint on root age, used if no fossil for root.
- RootAge = <1.0>0.8 * constraint on root age, used if no fossil for root.
- RootAge = '<1.0>0.8' * constraint on root age, used if no fossil for root.
-
-Added some text around the output, to make the results more comprehensible.
-
-evolver (option 8). option 8 for calculating bipartition distances
-between trees is broken. I changed the program to prepare for a
-figure and forgot to remove the changes afterwards. This is now
-fixed.
-
-
-Version 4, July 2007
-
-mcmctree. The MCMC algorithm of Rannala and Yang (2007) dealing with
-rate drift is added. The section of manual on mcmctree is now
-rewritten. An example data set is included in the
-examples/DatingSoftBound/ folder.
-
-codeml. This now implements the mutation-selection models of codon
-substitution of Yang and Nielsen (2008). I should include more
-details after the models are better tested, for example, after the
-paper is written.
-
-baseml. The log likelihood under the gamma-rates model (that is, when
-alpha > 0 in the control file) is not calculated correctly, or not as
-intended. Version 3.14a is correct, but versions 3.14c and 3.15 are
-incorrect. I am not sure about version 3.14b, as I have not kept a
-copy. The bug causes the median to be used to represent all rates in
-each category of the discre-gamma model whereas the mean was intended.
-See the discussion around equation (10) and the paragraph below it in
-Yang (1994. J. Mol. Evol. 39:306-314). The bug is not expected to
-have much effect on tree topologies or branch lengths. Thanks to
-Simon Whelan for reporting the bug.
-
-
-
-Version 3.15, March 2006
-
-evolverNSbranches (simulation program to generate data under the
-branch models of codon substitution). A bug was introduced after
-version 3.14a so that versions 3.14b (?), 3.14c, and 3.15 have a bug
-that makes the program abort prematurely. This is now corrected.
-Thanks to N. Clark for reporting the bug.
-
-
-Version 3.15, January 2006
-
-evolver. The simulation option of the evolver program in version
-3.14b (?), 3.14c, and 3.15 has a serious bug that makes the program
-generate nonsensible sequences when the user tree has 0 internal
-branch lengths. The bug was introduced between versions 3.14a (which
-is correct) and 3.14c (which is wrong). I don't seem to have kept
-3.14b to check whether it is in error. In this case, the smart idea
-that led to the bug was the thought that no evolution occurs if the
-branch length is 0 (or less than 1e-20, to be more precise). If the
-branch length is 0, there is no need to evolve the sequence along the
-branch, so I decided to copy the sequence at the start of the branch
-to the node at the end of the branch. This would be fine, except that
-I introduced a bug at the same time that caused the program to skip
-generating sequences at all nodes that are descendents of the
-zero-length branch. As a result, sequences at tips that are
-descendents of the zero-length branch have no data. On some systems
-the uninitialized space is initialized as 0, and then the sequences
-will be printed out as consisting of all Ts (T is the first nucleotide
-in my program).
-
-
-At the same time, I also turned on the option of simulating data on
-random trees in evolver. You change fixtree=1 into 0, and recompile.
-Then use MCbaseRandomTree.dat to simulate data instead of MCbase.dat:
-
- evolver 5 MCbaseRandomTree.dat
-
-
-
-Version 3.15, November 2005
-
-mcmctree: This implements the MCMC algorithm of Yang and Rannala (2005
-MBE) for estimating species divergence times using soft bounds. The
-example file is in examples/SoftBound/. Look at the readme file and
-see whether you can duplicate our results in the paper. Note that
-this program works with DNA sequence data only, and assumes the
-molecular clock (clock = 1). Note that assumption of the clock can
-lead to seriously biased results if the clock is violated. Work is
-under way to relax the clock assumption.
-
-yn00: Added three methods for estimating dS and dN: LWL85 (Li, Wu &
-Luo 1985), LPB93 (Li 1993; Pamilo and Bianchi 1993), and LWL85m (Yang
-2006 Computaional Molecular Evolution, Oxford University Press. Eqs.
-2.12 & 2.13).
-
-codonml. Added example files for clade model C, in folder
-examples/CladeModelC. Thanks for Joe Bielawski for preparing these.
-
-Added a sequence data format, in which the numbers or frequencies of
-site patterns are inputed instead of the sequences. The format is
-indicated by the option variable P on the first line of the data file,
-used in the same way as option variables I (for interleaved format)
-and S (for sequential format, which is the default). See the section
-on sequence data format in the paml documentation for details. The
-old control variable readpattf for baseml and codeml is now deleted.
-You can also use evolver to simulate data using this format. Look at
-the control files MCbase.dat, MCcodon.dat, and MCaa.dat. This format
-is useful for very large data sets with >1M sites, when collapsing
-sites into patterns can take a long time.
-
-
-
-Version 3.14b, May 2005
-
-pamp crashes when there are more than 127 or 255 (depending on the
-system) species in the data. The bug is due to my declaration of the
-variable visit[] in the routine PathwayMP1 in pamp.c as char rather
-than int. Another problem is that the formula used to calculate
-distances under REV (GTR) is sometimes inapplicable and the program
-prints out messages like "err: DistanceREV". I introduced some ad hoc
-way of dealing with the problem.
-
-
-
-Version 3.14b, April 2005
-
-aaml (codeml with seqtype = 2):
-
-Under the REV model (model = 9), codeml failed to print out the
-estimated amino acid substitution rate matrix. This is fixed.
-
-
-aaml (codeml with seqtype = 2 or 3):
-
-If you have multiple data sets (ndata > 1) and analysing protein data
-sets under the empirical models such as wag, jtt, dayhoff. The
-results for the first data set are correct, but all later data sets
-are analyzed incorrectly under the corresponding +F models, that is,
-wag+F, jtt+F, dayhoff+F. A bug in the program means that for the
-second and later data sets, the equilibrium amino acid frequencies are
-from the real data and not correctly set to those specified by the
-empirical models. Thanks to Ben Evans.
-
-
-Version 3.14b, February 2005
-
-codonml and aaml:
-
-There is a bug that affects the joint reconstruction of ancestral
-sequences when some branch lengths are equal to each other and when
-those branches are visited one after the other in the program. For
-example, if some branch lengths are estimated to be 0, such a problem
-might occur. I think this bug is in the program for a long time,
-although I note that joint ancestral reconstruction is turned off in
-some versions for other reasons. The marginal reconstruction is not
-affected by this bug. The update fixes this bug.
-
-
-baseml and codeml:
-
-The program crashes on some systems for the local clock models (clock
-= 5 and 6) after the calculation is finished. If you manage to run
-the model, the results should be fine. There are bugs in the program
-when freeing memory at the end of the calculation. They are fixed in
-this update. See the discussion site at
-
-http://www.rannala.org/phpBB2/viewtopic.php?t=228
-
-Thanks to fjvera.
-
-
-
-Version 3.14a, December 2004
-
-codonml (codeml for codons):
-
-The mechanisctic amino acid substitution models (table 3 in Yang et
-al. 1998 MBBE 15: 1600-1611) appear to be broken in paml/codeml
-versions 3.13 and 3.14. They were correct in version 3.12. This is
-now fixed. If you use those models, please run the examples in the
-folder examples/mtCDNA/ and compare results with those published in
-the paper to confirm the program. The results in the paper are
-correct, but the program implementation may be broken due to lack of
-maintenance.
-
-codonml (codeml for codons)
-
-When you fit the branch models with three or more branch types (omega
-ratios), the program aborts with an error message saying that only two
-omega ratios are allowed. This is due to a bug in the program and is
-now fixed.
-
-
-Version 3.14, September 2004
-
-
-1. codonml (codeml for codons):
-
-(1a) NSsites models M1 (neutral), M2 (selection), and M8 (beta& ) have
-been changed. The references are two manuscripts that are not yet
-published (Wong, Yang, Goldman & Nielsen, 2004 Genetics 168:
-1041-1051; Yang, Wong, and Nielsen 2005 MBE). The details are as in the
-following table.
-
-[table missing or never written.]
-
-The old models M1 and M2 are noted to be rather unrealistic as they do
-not account for sites with 0 < w0 < 1. Thus in the new models 0 < w0 < 1
-is estimated from the data. Also insisting on a class of sites with 0
-= 1 appears to help avoid false positives as sites under weak
-selection will be absorbed in this neutral class rather than being
-claimed to be under positive selection. Under M8, the constraint s >
-1 is now automatically enforced to help avoid multiple local peaks and
-also to make the model a positive-selection model. Note that the
-newer models replace the old ones, which are available in older
-versions of paml/codeml only.
-
-After the changes, the models are nested as follows: M0 (one ratio) <
-M1a (NearlyNeutral) < M2a (PositiveSelection) < M3 (discrete), and M7
-(beta) < M8 (beta& ), where "<" means the model on the left is a
-special case of the model on the right:. Two LRTs can be constructed
-using those models. The first one compares M1a against M2a, using a
-chi square with df 2, and the second one compares M7 against M8, using
-a chi square with d.f. = 2.
-
-(1b) Similarly branch-site model A (Yang and Nielsen 2002) is changed.
-The old model assumes 0 = 0 and is unrealistic. This is replaced by 0
-< 0 < 1, estimated from the data. The new model is still called
-branch-site model A. It can be compared with the new M1a
-(NearlyNeutral) to form a likelihood ratio test, with d.f. 2. This
-is called test 1. Another test, called test 2, uses a null model A
-but with 2 = 1 fixed (use fix_omega = 1 and omega = 1). Test 2 is
-very conservative, but test 1 might mistake relaxed selective
-constraint on the foreground branch as positive selection.
-
-
-(1c) Similarly the clade model C ( see also Forsberg and Christiansen
-2003; Bielawski and Yang 2004) is changed as follows. The new model C
-replaces 0 = 0 by 0 < 0 < 1 and has five parameters in the
-distribution: p0, p1, 0, 2, and 3. The new model C can be compared
-with the new M1a (NearlyNeutral), with d.f. 3.
-
-(1d) The empirical Bayes procedure for calculating posterior
-probabilities of site classes under the site models, branch-site
-models, and clade models are called naïve empirical Bayes (NEB). They
-are naïve because they use the MLEs of parameters without accounting
-for sampling errors in them. This problem is now fixed using a
-procedure called Bayes empirical Bayes (BEB) (Yang, Wong, and Nielsen
-2005). The BEB procedure is implemented for the site models M2a and
-M8, the (modified) branch-site model A, and the (modified) clade model
-C. The old NEB calculation is kept in the output and the new BEB
-results follow the NEB result. Perhaps I will remove the NEB
-calculation in later versions.
-
-
-baseml/codeml: rewrote likelihood clock and local clock models.
-Implemented models for combined analysis of multiple genes
-incorporating multiple calibration points (Yang and Yoder 2003). The
-ad hoc rate smoothing procedure of Yang (2004) is implemented for
-nucleotide, amino acid, and codon substitution models, and the
-implementation also deals with missing species at some loci. The
-option variable clock in the control files is now used differently
-from before. See later in this documentation and also the readme and
-readme2 files in the folder examples/MouseLemurs/.
-
-baseml/codeml local clock models. clock = 5 and 6 are added in
-3.14beta5.
-
-codeml: added branch-site models C and D (Bielawski and Yang in press).
-
-mcmctree: this program is disabled in this release. The old program
-died and the new program is still under construction.
-
-The main body of this documentation may not be up to date.
-
-
-
-Bug fixes and minor corrections
-
-
-baseml/codeml. The output in the file rates under the gamma model
-lists inferred rate for each site. The output is incorrect if the
-tree is large and scaling is used to avoid underflow. The use of
-scaling is indicated by two lines of output on the monitor like the
-following:
-
-"2 node(s) used for scaling (Yang 2000 J Mol Evol 51:423-432):
- 155 350".
-
-The results are clearly wrong as the probabilities are much greater
-than 1, and the rates are many orders of magnitude too large, etc.
-Also under the same problematic condition the expected numbers of
-sites with certain site patterns in the file lnf are incorrect. When
-scaling is not used, the results should be o.k. and look reasonable.
-This affects versions 3.13 and 3.14beta1-3. This is fixed in version
-3.14. Thanks for Nick Goldman for pointing out the error.
-
-yn00. The program crashes for large datasets with many codons due to
-a memory allocation error. This affects versions 3.13 and some
-versions of 3.14beta1. The problem is fixed in version 3.14.
-
-aaml (codeml for amino acids). Model REVaa_0 is broken due to lack of
-maintenance in versions 3.1, 3.11, 3,12, 3,13, and 3,14beta1-4. It
-seems to be working in version 3. This is fixed in 3.14beta5. 2
-April 2004.
-
-codonml (codeml for codons) in branch-site models (model = 2 NSsites =
-2 or 3) prior to 3.14beta3 have a scaling problem, which makes the
-length of the foreground branch to be incorrectly estimated. Other
-branch lengths and substitution parameters, such as the parameters in
-the distribution, are corrected calculated, as well as the log
-likelihood values and the posterior probabilities. 27 March 2004
-
-pamp: The gamma parameter for variable rates among sites using the
-method of Yang & Kumar (1996) is not estimated correctly in 3.14beta1,
-beta2, beta3. Versions 3.13d and earlier seem fine.
-
-codonml (codeml for codons) in version 3.14beta and 3.14beta1 does not
-work when fix_omega = 1 is used for branch models. The results are
-incorrect. Please use newer versions 3.14beta3 or later. Also
-version 3.13 is fine.
-
-baseml: Models TN93 and REV are wrong when used with Mgene = 3 or 4.
-This seems correct in version 3.12 but went wrong in 3.13 and 3.14
-since I inserted TN92 between HKY85 and TN93. Thanks for Lee Bofkin
-for reporting the error.
-
-codonml (codeml for codons): When codon models are used to reconstruct
-ancestral sequences (with RateAncestor = 1), the program lists
-synonymous and nonsynonymous changes at each site under the heading
-"Changes at sites (syn nonsyn)." This listing is incorrect due to a
-bug in the program. This bug affects versions 3.12, 3.13 and 3.14,
-and version 3.11 seems correct. Thanks to Joe Bielawski.
-
-baseml/codeml: The SE's for divergence times under the clock models
-are calculated incorrectly. This happens when you use clock = 1 or 2,
-supply fossil date to calculate absolute times, and request standard
-errors for times. The estimates of times themselves are correct, but
-standard errors for times are wrong. The SEs for times and for the
-rate under the TipDate model (clock = 3) are wrong as well. The
-programs print out the variances after the +-, instead of their square
-roots. This error was introduced in version 3.13. Versions prior to
-3.13 are correct. I posted an update 3.13d to fix this bug.
-
-evolver: the simulation program can now accept species names in the
-tree. Note that the data file formats for evolver (MCbase.dat,
-MCcodon.dat, MCaa.dat) have changed and you might have to change your
-own data files. Look at the files included in the package.
-
-The documentation in paml v3.13 from August 2002 - 12 December 2002
-had a mistake about the critical values for the newer test using a
-modified M8. The critical values for the test are 2.71 at the 5%
-significance level and 5.41 at the 1% level, rather than 1.95 and 3.32
-as in the documentation.
-
-
-
-
-
-Version 3.14beta2, November 2003
-
-(1) baseml: Models TN93 and REV are wrong when used with Mgene = 3 or
- 4. This seems correct in version 3.12 but went wrong in 3.13 and
- 3.14 since I inserted TN92 between HKY85 and TN93. Thanks for Lee
- Bofkin for reporting the error.
-
-(2) codonml (codeml for codons): When codon models are used to
- reconstruct ancestral sequences (with RateAncestor = 1), the
- program lists synonymous and nonsynonymous changes at each site
- under the heading "Changes at sites (syn nonsyn)." This listing
- is incorrect due to a bug in the program. This bug affects
- versions 3.12, 3.13 and 3.14, and version 3.11 seems correct.
- Thanks to Joe Bielawski.
-
-
-
-Version 3.14beta, 20 July 2003
-
-(1) baseml/codeml: rewrote likelihood clock and local clock models.
- Implemented models for combined analysis of multiple genes using
- multiple calibration points (Yang and Yoder 2003). The option
- variable clock in the control files is now used differently from
- before. See documentation below and also the readme and example
- files in the folder paml/examples/MouseLemurs/.
-
-(2) codeml: added branch-site models C and D (Bielawski and Yang
- submitted).
-
-(3) baseml/codeml: The SE's for divergence times under the clock
- models are calculated incorrectly. This happens when you use clock
- = 1 or 2, supply fossil date to calculate absolute times, and
- request standard errors for times. The estimates of times
- themselves are correct, but standard errors for times are wrong.
- The SEs for times and for the rate under the TipDate model (clock =
- 3) are wrong as well. The programs print out the variances after
- the +-, instead of their square roots. This error was introduced
- in version 3.13. Versions prior to 3.13 are correct. I posted an
- update 3.13d to fix this bug.
-
-(4) mcmctree: this program is decommissioned in this release. The old
- program died and the new program is still under construction.
-
-(5) evolver: the simulation program can now accept species names in
- the tree. Note that the data file formats for evolver (MCbase.dat,
- MCcodon.dat, MCaa.dat) have changed and you might have to change
- your own data files. Look at the files included in the package.
-
-(6) Improved the iteration algorithm a little. No change to the
- interface.
-
-(7) The documentation in paml v3.13 from August 2002 - 12 December
- 2002 had a mistake about the critical values for the newer test
- using a modified M8. The critical values for the test are 2.71 at
- the 5% significance level and 5.41 at the 1% level, rather than
- 1.95 and 3.32 as in the documentation.
-
-(8) Edited the manual pamlDOC.pdf.
-
-
-
-
-15 May 2003
-paml3.13d
-
-baseml/codeml: The SE's for times under the clock models are
-calculated incorrectly. This happens when you use clock = 1 or 2,
-supply fossil date to calculate absolute times, and request standard
-errors for times. The estimates of times themselves are correct, but
-standard errors for times are wrong. The SEs for times and for the
-rate under the TipDate model (clock = 3) are wrong as well. The
-programs print out the variances after the +-, instead of their square
-roots.
-
-This error was introduced in version 3.13. Versions prior to 3.13 are
-correct.
-
-
-12 December 2002 correction of documentation
-
-The paml documentation in paml v3.13 from August 2002 - 12 December 2002
-has a mistake about the critical values for the newer test using a
-modified M8.
-
-The critical values for the test are 2.71 at the 5% significance level
-and 5.41 at the 1% level, rather than 1.95 and 3.32 as in the
-documentation. I made a mistake and quickly discovered it. I thought
-I corrected this a few months ago, but apparently the document
-included in the paml release had this mistake. I am sorry. I'll
-correct it now.
-
-
-
-Version 3.13, August 2002
-
-(1) baseml: nhomo = 1 and model = 6 (for both the one-rate and gamma-rates
- models) in version 3.12 gives incorrect results. The error was introduced
- in version 3.12 and is now fixed. Sorry for the trouble and thanks to
- Joshua Scott Rest.
-
-(2) baseml: Tamura's (1992) model is inserted between HKY85 and TN93.
- Unfortunately this means that the model code for TN93 and REV are shifted,
- and you have to change your control file for those two models. I also
- added user-specified REV or UNREST models.
- 0:JC69, 1:K80, 2:F81, 3:F84, 4:HKY85, 5:T92, 6:TN93,
- 7:REV, 8:UNREST, 9:REVu; 10:UNRESTu
-
-(4) I implemented the nonhomogeneous models of Galtier and Gouy
- (1998). The options are model = 5 (T92) and nhomo = 3 (N1) or 4
- (N2) or 5 (user). nhomo = 5 is added for the model of Yang and
- Roberts (1995) as well. It means that the user specifies, in the
- tree, how many sets of base frequencies (or how many GC content
- parameters in the model of Galtier and Gouy 1998) should be used
- and which branches has which sets.
-
-(5) codeml: For pairwise amino acid sequence comparison (seqtype = 2
- and runmode = -2), the program crashes. This is fixed now.
- Thanks to Dennis Paul Wall.
-
-(6) codeml: The mechanistic models of amino acid substitution (model = 6 or 7)
- are broken for quite some time due to lack of maintenance. I have
- now fixed them. See the file examples/mtCDNA/readme.txt.
-
-(7) codemlsites is now removed, which is for batch run of multiple NSsites
- models. This is now done by specifying several NSsites models in
- the control file codeml.ctl for codeml, for example: NSsites = 0 1
- 7 8.
-
-(8) baseml & codeml used to ask the user what to do if the tree has branch
- lengths. I have now introduced a variable fix_blength in the control
- file that does the same thing.
- fix_blength = -1 * 0: ignore, -1: random, 1: initial, 2: fixed
-
-
-
-Version 3.12, March 2002
-
-(1) baseml and codeml, Mgene=2 or 4 (different pi's) and RateAncestor
- = 1 Ancestral reconstruction does not work properly although the
- ML iteration is fine. The output likelihood values (in the form
- "Node 20: lnL = -1047.965624") are different from the optimum
- likelihood already calculated. The option was fine in version
- 3.0e and 3.1, and was broken in version 3.11.
-
-(2) codeml stops with an error message "pi=0" when some amino acids are
- missing in the sequences. Earlier versions of the program are correct.
-
-(3) yn00 now accepts multiple data sets (specified by ndata).
-
-
-Version 3.11, September 2001
-
-(1) baseml and codeml (clock = 1 or clock = 2) sometimes print out
- ridiculous (e.g., large negative) tree lengths and branch lengths.
- I thought this was fixed in version 3.1, but it was not.
- Hopefully it is now fixed.
-
-(2) baseml and codeml: Malpha (a separate alpha for each site
- partition) does not work with Mgene = 0. The output will give you
- the same alpha value for each gene as you specify in the control
- file. This is now fixed.
-
-(3) baseml: When the REV or UNREST models (model = 6 or 7) are used
- together with Mgene = 2, 3, 4, the program outputs one Q matrix,
- when each site partition should have a different Q matrix. The ML
- parameter estimates are correct, but I did not bother to format
- the output correctly. The single Q matrix in the output is the
- one for the first partition. I have added the output for the
- other matrices now.
-
-(4) baseml and codeml. Ancestral reconstruction was disabled for
- large trees with more than 100 or 200 sequences in previous
- versions. It is now allowed. On large trees, the multiplication
- of transition probabilities across branches in the tree cause the
- product to become too small to be represented in the computer.
- This is known as underflows. I implemented a scaling algorithm to
- avoid underflows some time ago but did not bother to do it with
- the ancestral reconstruction algorithm at the time. The algorithm
- should now work, up to the upper limit set in the program, that
- is, 1000 or 2000 sequences.
-
-
-
-Version 3.1, June 2001
-
-Changed the way of counting base or amino acid frequencies so that
-this version of baseml or codeml might generate different results from
-previous versions under the following model specifications.
-
- . baseml & aaml: if you have multiple genes and the model uses the same
- frequencies in the model for the different site partitions.
-
- . codonml:
- if you have ambiguity characters at all. Ambiguity characters were
- ignored when counting frequencies in previous versions but are used now.
-
-There is no difference if your data do not contain any alignment gaps
-or ambiguity characters or if you choose to remove them before any
-analysis (cleandata = 1). The early versions used two ad hoc methods
-in dealing with such ambiguity characters in different places of the
-programs. The new version uses another ad hoc method and in all
-places. I think it will be fine which ever version you use, but do
-not use different versions in the same analysis.
-
-codonml (codeml with seqtype = 1) with Mgene = 1 does not work
-correctly if the sequences have alignment gaps or ambiguity
-characters. Mgene = 1 means separate analysis. The bug causes data
-for the genes except the first to be read out of frames, so that most
-likely you will see a lot of messages "stop codon XXX" before the
-program aborts. This is now fixed.
-
-
-codonml (codeml with seqtype = 1) with fix_alpha=0: In the final
-output for the codon + gamma model, parameter alpha is not correctly
-identified. If you have fix_kappa = 0, fix_omega =0, and fix_alpha=0,
-the last three numbers should be kappa, omega (dn/ds), and alpha. The
-detailed output did not identify alpha correctly, although the
-likelihood value and the outputted rates and frequencies (r and f) are
-correct. I believe this is now corrected.
-
-
-
-Version 3.0e, 14 May 2001
-
-codonml (codeml with seqtype = 1) with model = 1: The program crashes
-when you have a small tree with <= 5 branches. This is a new error
-introduced in version 3.0c. It is now fixed.
-
-baseml and codeml (clock = 1 or clock = 2) sometimes print out ridiculous
-tree lengths and branch lengths.
-
-yn00 in version 3.0c ignores icode and uses the universal code only.
-It was correct in vesion 3.0b. This is fixed in version 3.0d.
-
-
-Versions 3.0c-3.0d, 20 February 2001
-
-We bought a Pentium III cluster running redhat linux 6.2. The gcc
-compiler 2.61 seems to have bugs and codeml does not run for NSsites =
-7 or 8. The program might output "points out of order" or "Det goes
-to 0" error messages. 3.0d seems to be safer for these models.
-
-codonml (codeml with seqtype = 1) with NSsites = 3 might generate a
-message "error: sortwM3." and then stop with no output after the
-iteration. This is now fixed. During the iteration, there was no
-restriction on the w values, so that w0 can be greater than w1 and so
-on. At the end of the iteration, I sort the three w values so that w0
-< w1 < w2 before outputting. 3.0c has a bug at sorting. If you don't
-see the error message, everything is fine.
-
-
-10 November 2000
-
-The probabilities for Joint ancestral reconstruction are not correct.
-Many values are larger than 1. Fixed in version 3.0c.
-
-
-
-Version 3.0c, 30 October 2000
-
-(1) Changed parameterization of clock models (clock = 1, 2, or 3), so that
-the output now lists node ages, and the SEs are for node ages as well.
-Node ages are measured by the expected number of nucleotide or amino
-acid substitutions per site (nucleotide, amino acid, or codon) from
-the node to the present time, and are proportional to the divergence
-times. In earlier versions, the output list proportions.
-
-(2) codonml: problem with NSsites model with multiple gene data
-(Mgene=1) is now fixed(?). In the earlier version, the likelihood
-calculation was o.k., but identification of sites under positive
-selection was incorrect as the program attempted to list sites in
-another gene when it was analyzing data for one gene. (1) Mgene
-options for codons and amino acids:
-
-
-
-Version 3.0b, 6 October 2000
-
-The local clock models (clock = 2) are not implemented correctly in paml3.0 or 3.0a.
-
-Yoder, A. D., and Z. Yang. 2000. Estimation of primate speciation
-dates using local molecular clocks. Molecular Biology and Evolution
-17 (7): 1081-1090.
-
-I introduced a serious error before I made the models available to the
-public, in ver3.0 (May 2000). Two symptoms are noted right now: (1)
-crashes (which are good as they make you realise that something is
-wrong immediately) and (2) the local clock model (clock=2) having a
-worse likelihood than the global clock model (clock = 1). The results
-in the paper are correct except for a typo ("37.30 - 48.48" in table 3
-for the cetacean-whale divergence using the C3 calibration should be
-"57.30 - 48.48"). I now include the example data sets and results
-from that paper to help you prepare your own data files.
-
-Sorry for any damages done.
-
-Many thanks to Toby Johnson and Philip Awadalla for spotting the error.
-
-
-
-Version 3, 4 February 2000
-
-The program evolver may not generate sequences correctly if branch
-lengths are very large, say >10.
-
-Scaling by nodes to avoid underflow. When there are many sequences in
-the data (say > 200 or 300), the probability of observing data at a
-site can become too small to represent in the machine. This is
-because the probabilities are multiplied along branches in the tree
-and a large tree has many branches. Since version 2.0, this underflow
-is dealt with by scaling, and it works with both the one-rate and
-gamma-rates models. It did not work with the NSsites models in
-codonml, but probably nobody has analysed large data sets to notice
-this. Anyway, this version hopefully fixes that problem, when I was
-analysing the data set of Bush et al. (1999 MBE 16:1457-1465), which
-has about 350 sequences. If you use the program on very large data
-sets, watch out for anything strange and let me know. I suppose with
-the option of using supplied branch lengths, data sets of such sizes
-will become manageable, and this scaling will become important.
-
-
-
-Version 2.0h, Oct 14, 1999
-
-Fixed the following errors, all simple errors.
-
-The RemoveIndel routine in 2.0e and 20.f has a bug that removes most
-sites in the sequence except those with all T's and C's for baseml.
-The same routine in 2.0g has a different bug that removes most sites
-except those will all T's and C's in the sequence for aaml. In either
-case, the results will be grossly wrong. The bugs affect the analysis
-only if you have gaps or ambiguity characters in the data and use
-runmode = 1 for aaml in 2.0e and 2.0f, and runmode = 1 for codonml for
-2.0g.
-
-The newly introduced option for amino acid reconstruction by baseml
-(icode) does not work. The program aborts with an error message.
-This is now fixed.
-
-
-
-
-Version 2.0g, Sept 21, 1999
-
-Fixed a bug in 2.0f that causes crash in codeml when getSE = 1.
-
-If the tree has branch lengths, codeml and baseml in 2.0f will simply
-use the branch lengths rather than estimating them by iteration. I
-have added a question for the user to choose to ignore the branch
-lengths, use them as fixed or use them as initial values for the
-iteration.
-
-
-
-Version 2.0d, June 30, 1999
-Fixed some convergence problems in ML pairwise estimation of dS and
-dN, in particular, in presence of data partitions (the G option).
-
-Fixed a bug in evolver. The bug means that no sensible data are
-generated if the tree has branch lengths smaller than 1e-7. At some
-point, I did something "clever" (taking advantage of the fact that at
-such small branch lengths, no evolution would occur), which destroyed
-the calculation.
-
-
-
-Version 2.0c, May 12, 1999
-Corrected a few problems with the evolver program, and with running
-multiple data sets using baseml. Added REV in MCbase.dat.
-Thanks to John Mercer.
-
-Note--Uncomment ndata in baseml or codeml to analyse multiple
-data sets generated from evolver.
-
-
-
-April 16, 1999
-
-Although posted only a few days ago, paml2.0 has an error in the
-program codeml, which invalidates most analyses based on codon
-substitution models. The symptom is relatively easy to spot and is
-indicated by one of the parameters (say omega) not being estimated at
-all and in the output you will have exactly the same number as you
-specified in the control file.
-
-Amino acid-based analysis or pairwise comparison of codon sequences
-are not affected. Use paml2.0a to redo the analysis. Apologies for
-the trouble.
-
-
-
-version 2.0, 31 March 1999
-Added back the faster calculation when there is no missing data.
-Enabled the programs/analysis that broke in the temporary versions,
-such as pamp, joint reconstruction.
-
-Renamed listtree as evolver, and added simulation options.
-
-
-
-PAML 1.4 Temporary versions
-Dec98 & Jan99
-
- pamlTMP.Jan99.notes
- ===================
- Ziheng Yang
- Fri Jan 15 10:01:42 GMT 1999
-
- .Type make to compile the programs. If you got error messages, see
- whether you know how to change the file Makefile. If it does not work
- either, look at the files paml.cc, paml.gcc, etc.
-
- .At the suggestion of David Posada, I added ML estimation of pairwise
- distances between amino acid sequences in the program codeml.
-
- The relavant variables in the control file codeml.ctl are
-
- runmode = -2 for pairwise distance calculation
- aaRatefile = jones.dat * only used for aa seqs with model=empirical(_F)
- * dayhoff.dat, jones.dat, mtmam.dat, or your own
- model = 3
- alpha = 0
-
- model specifies the amino acid substitution model. You should choose
- a value from 0, 1, 2, 3. If you want to estimate the rate matrix from
- your data, you should do that using many sequences and then move the
- estimated substitution rate matrix (in the output file mlc) into a
- file and change the variable aaRatefile.
-
- Choosing alpha>0 means using gamma distance with that specified alpha
- parameter. If you see anything strange, please let me know.
-
- .I have included only three programs baseml, codeml, and listtree.
- I have not tested the others and so they are not included.
-
-
-
- pamlTMP.Dec98.notes
- ===================
- Ziheng Yang
- Wed Dec 9 10:44:37 GMT 1998
-
-
- This temporary version of paml has involved a number of changes, and
- is quite likely to contain errors. Please let me know if you notice
- anything strange. Some of the options do not work for now. Details
- follow. I will compile the Win32 and PowerMac versions after
- everything is fixed and tested.
-
- . Missing data: baseml and codeml now handle missing data. Choose
- DelMissing = 1 in the control file to remove sites with ambiguity
- characters or alignment gaps, as did previous versions.
-
- . Clock: The clock is now fixed. It may still crash, but the option
- is not worse than the no-clock options. The definitions of the node
- times or branch lengths are now different from the documentation
- (pamlDOC.html), and so check the branch lengths in the output tree
- topology. Thanks to Jeff Thorne.
-
- . Nexus file formats: Paml programs now read sequence files and tree
- structure files in the Nexus format used by paup and MacClade. Only
- the data or tree are read, and everything else in the file is ignored.
-
- . Ancestral reconstruction: Marginal ancestral reconstructions are
- now done under the gamma-rates model as well as the constant-rate
- model. For protein-coding genes, reconstructions can now be done at
- the nucleotide level with baseml (in particular, by using models that
- account for differences at the three codon positions), at the amino
- acid level with aaml, and at the codon level with aaml. According to
- Belinda Chang, those different reconstructions tend to be similar.
- Results for ancestral reconstruction are now listed by site, and not
- by site pattern as in earlier versions. Missing data are now handled
- in those analysis. See instructions in this doc about baseml.ctl for
- more options. Thanks to Belinda Chang. I would like to take this
- opportunity to point out that the reconstructed ancestral sequences
- are not the real observed sequences and may contain errors.
-
- . I have disabled the option for variable rates among sites in
- combination with variable dN/dS rate ratios among lineages in codonml.
- This option has never been implemented in the program but was not
- disabled.
-
- . Distance matrices from codonml. The program codonml outputs
- matrices of synonymous and nonsynonymous rates in all pairwise
- comparisons using the method of Nei & Gojobori (1986) into two files
- named DistanceNG.dN & DistanceNG.dN. The matrices are lower-diagonal
- and can be used with programs, such as neighbor and fitch, in phylip
- (and possibly paup* too) for tree reconstruction or branch length
- estimation by distance methods. If you choose runmode = -2 for ML
- pairwise comparison, the program will also output two matrices of ML
- estimates into two files named DistanceML.dS & DistanceML.dN, for
- synonymous and nonsynonymous rates. Since many users seem interested
- in looking at dN/dS rate ratios among lineages, examination of the
- tree shapes indicated by branch lengths calculated from the two rates
- may be interesting although the analysis is intuitive. Obviously, you
- should NOT name your own data files with those names; otherwise they
- will get overwritten. If your species name has more than 10
- characters, you will have to edit the files and cut the names short
- before you can use Phylip programs. I have decided to leave this work
- to the user. I plan to create some other distance matrix files for
- nucleotides and amino acids as well.
-
- . An minor prompt error when running codonml with model = 2 (variable
- dN/dS ratios among lineages) is fixed. The example data set
- lysozymeSmall.nuc is included for the user to duplicate my results in
- Yang (1998 Molecular Biology and Evolution 15: 568-573). The control
- file codemlLysozyme.ctl explains in more detail how to specify the
- options. I would like to remind you that it is not correct to use the
- option model = 1 to estimate dN/dS ratios for branches to find out
- which ratios are greater than one, and then to use model = 2 to test
- whether that ratio is significantly greater than one. The problem
- with such an analysis is that the hypothesis being tested is derived
- from the data which you use to test the hypothesis, and as a result,
- you tend to get significantly results too often. Strictly speaking,
- the hypothesis should be formulated before the data are analysed.
-
- Things that do not work in this version include
-
- . All parsimony-based analyses are now broken. The program pamp is
- not included. Programs baseml and codeml used to calculate the MP
- score for each tree before doing an ML analysis on that tree. Now a
- -1 is used to indicate this is not possible.
-
-
-
-Version 1.4 (UCL)
-July 1998
-
- 1. Most of the changes are in the program codeml (aaml for amino
- acid sequences and codonml for codon sequences). A few new models of
- codon substitution are implemented (see Yang and Nielsen 1998; Nielsen
- and Yang 1998; Yang 1998).
- 2. Some problems relating to ancestral sequence reconstruction (in
- baseml and codeml) are fixed. The earlier algorithm does not work
- properly if the data contain more than several sequences. The
- algorithm makes a guess at the likely character states at the interior
- nodes of the tree, and then uses those to generate reconstructions
- (joint reconstructions, see eq. 2 in Yang et al. 1995) to be
- evaluated. Sometimes this strategy misses important reconstructions,
- and as a result, the probabilities for reconstructed characters at
- specific interior nodes (marginal reconstructions; see eq. 4 in Yang
- et al. 1995) are substantially underestimated if the number of
- sequences is not very small. I have now written separate codes to do
- the marginal reconstruction, evaluating all possible character states
- for each interior node. This works also under the gamma model of
- substitution rates among sites, a feature that was not implemented
- before. The joint reconstruction works with one rate for all sites
- only and has the old problem of possibly missing important
- reconstructions. However, the posterior probabilities for those joint
- reconstructions that do get evaluated are accurate. I have added an
- option (choose RateAncestor = 2 rather than 1) for the user to specify
- the reconstruction to be evaluated. The two approaches are expected to
- produce very similar results, and since the marginal reconstruction is
- always reliable, perhaps it can always be used for data analysis.
- (Thanks to Belinda Chang.)
- 3. The output file for estimated rates for sites under the
- variable-rates models is now "Rates.out" instead of "rst".
- 4. I have implemented ancestral sequence reconstructions based on
- codon-substitution models (codonml). This has not been tested
- carefully, and I would appreciate your comments if you use it.
- 5. A number of minor changes have been made. I have fixed several
- floating exception errors that seem to occur on DEC Alpha machines
- only.
- 6. The documentation is now changed into the html format.
-
-
-
-Version 1.3a,b,c (UCL)
-June 1998
-
- 1. Basemlg was put back at the request of a user.
-
- 2. Some changes are made to codeml concerning codon-substitution
- models (Yang and Nielsen 1998 JME; Yang 1998 MBE; Nielsen and Yang
- 1998 Genetics)
-
-
-
-PAML Version 1.3: (UC Berkeley)
-July 1997
-
- 1. Starting with this version, the program basemlg will not be
- included in the package. This program implements the (continuous)
- gamma model of substitution rate variation among nucleotide sites
- (Yang 1993). It involves intensive computation and has been
- superseded by the discrete-gamma model implemented in the program
- baseml. If you want to use this program, you should use previous
- versions of paml, which can be obtained from me if nowhere else.
-
- 2. The simulation program mcml.c is removed from the package. This is
- now superced by the program listtree.c, which does different things
- besides simulating data sets.
-
- 3. I have implemented the stepwise addition algorithm in the programs
- baseml and codeml. This algorithm is faster than the
- star-decomposition algorithm implemented before, but my implementation
- is crude and inefficient. The NNI perturbation is implemented too.
- These work without the molecular clock.
-
- 4. Many changes are made with the program codonml (codeml with
- seqtype=1). A few new codon-based models are implemented. One
- assumes different nonsynonymous/synonymous (dN/dS) rate ratios among
- branches in the phylogeny, and may be useful for detecting episodic
- positive selection. Another allows the dN/dS ratio to vary among
- sites and may be useful for testing neutrality and detecting
- positively-selected sites in the sequence. These are based on my
- collaborative work with Rasmus Nielsen. I made some changes to the
- output of estimates of dS and dN in pairwise comparisons (codonml with
- runmode=-2), so that the results are easy to understand. Another
- change with this program is in the codon-substitution model of Goldman
- and Yang (1994). We used the amino acid distance matrix of Grantham
- (1974) to modify the rate of nonsynonymous substitutions, but
- unfortunately the model does not fit data well. It does not always
- fit the data better than if we simply assume equal distance between
- any two amino acids. So I have added the option of ignoring the amino
- acid differences. Goldman and Yang's original formula is still
- available but not recommended for use. As a result of these changes,
- the formulation of the Goldman and Yang model is now somewhat
- different from the paper and previous versions of the program. See
- Section 6.2 for details.
-
- 5. I have again made some changes with the user interface, based on my
- guess of what the "user" would like to see.
-
-
-
-Version 1.1a-d: (University of California, Berkeley)
-1995-1996
-
-1.1a No records of changes remain.
-
-1.1b November, 1995:
-
-The following line at the beginning of the file tools.h
- #define __cplusplus
-is removed. I added this line to avoid some (unjustified) warning
-messages from a compiler I used to use and forgot to remove it when I
-was preparing paml1.1. This line made some compilers unable to
-compile.
-
-A bug that caused baseml and basemlg to give wrong estimates of kappa
-under the F84 model is now fixed. Estimates of kappa under the F84
-models listed in the paml 1.0 and 1.1 documents are all incorrect and
-are a bit too small. For example, the estimate of kappa under the F84
-model for the mtDNA data of Brown et al. (1982) should be 4.344 while
-the wrong estimate given in the large table of the document is 3.420.
-Likelihood value, branch lengths and other parameters seem to have
-been correctly calculated by paml 1.0 and 1.1. Versions prior to 1.0
-do not have this bug, and results in Yang (J. Mol. Evol. 1994,
-39:306-314) and Yang, Lauder, and Lin (J. Mol. Evol. 1995,
-41:587-596), where the F84 model was used, are correct.
-
-1.1c March 1996
-
-Representation of tree topologies using branches (see the document)
-are now allowed (restored) in the tree structure file. This is for
-coping with cases where some taxa are ancestors of others. If this
-representation is used, the line starts with a dollar sign. So the
-following line in the file trees.5s (the tree file for 5 species)
-
-$ 5 6 3 6 4 6 5 3 1 3 2
-
-represents the tree ((12)34), with 5 to be the ancestor of 1 and 2.
-
-Alignment gaps are now allowed when option G is used (for multiple
-genes). The gaps are removed before analysis.
-
-A small program listtree lists all the bifurcating trees for a given
-(small) number of species.
-
-codonml (codeml.c with seqtype=1) now has an option for calculating
-the number of synonymous substitutions per synonymous site (Ks) and
-the number of nonsynonymous substitutions per nonsynonymous site (Ka)
-and their ratio for each pairwise comparison, using the method of
-Goldman and Yang (1994). This is implemented with
-runmode=-2. Estimates of Ks and Ka are collected in the file rst and
-estimates of model parameters in the file mlc. This option produces
-table 2 in the Goldman & Yang paper.
-
-The interior nodes were incorrectly identified when the ancestral
-sequences were listed in the file rst. They were all one less than
-their correct numbers. Nodes 7, 8, 9, 10 for the stewart.aa data were
-incorrectly identified as nodes 6, 7, 8, 9. This is corrected now.
-
-1.1d. July 1996
-
-An unintended version of codeml.c was included in version 1.1c, which
-causes the program to produce different results under the codon-based
-model of Goldman and Yang (1994). The small number in baseml.c was
-set too small, making the program do unnecessary iterations; this is
-fixed now. I also changed the program pamp so that it will estimate
-the substitution rate matrix without using a tree.
-
-I have included a few other genetic code tables in this version of
-codeml; see the file codeml.ctl.
-
-
-PAML Version 1.2: November 1996 (UC Berkeley)
-
- 1. I have done some changes with the user interface. Name of files
-(sequence data file, control file, tree structure file) are now
-specified in the control files (The default are baseml.ctl,
-codeml.ctl, pamp.ctl, mcmctree.ctl). The command line for executing a
-program is now ProgramName ControlFileName with ControlFileName
-optional. I have added some text in the output file of baseml so that
-the results are eassier to understand. Choose verbose=0 in the
-control file if you do not want the detailed output.
-
- 2. A new program mcmctree is included, which performs Bayesian
-estimation of phylogenies using Markov chain Monte Carlo methods
-(Rannala and Yang 1996; Yang and Rannala 1997).
-
- 3. A number of minor changes are made, such as using species names
-in the parenthesis tree representation.
-
-
-
-July 1995
-Version 1.1: (IMEG PennState)
-
- Models for analyzing data of multiple genes (Yang 1996 JME) are
- implemented in baseml and codeml. Two more programs are included:
- pamp for parsimony-based analysis implementing the methods of Yang and
- Kumar (1996 MBE) and mcml for Monte Carlo simulation (prepared for
- Arndt von Haeseler but apparently not used).
-
-
-February 1995
-Version 1.0 (IMEG, PennState)
-
- This is the first official version of paml. Three main programs are
- included: baseml, basemlg and codeml. Control files are used for all
- of them. The program codeml is formed by merging two programs codonml
- and aaml, for codon sequences and amino acid sequences, respectively.
- One rate, gamma rates, and auto-discrete-gamma rates for sites are
- implemented for nucleotide, amino acid, and codon-based models. The
- document is provided as postscript files (paml1_3.ps).
-
-
-October 1994
-Version 0.12, no name for package (BAU and IMEG PennState)
-
- The bracket notation of tree topologies is introduced. Tree
- search by star decomposition is implemented in baseml and codonml.
- The options for baseml are moved into an option file baseml.ctl. The
- molecular clock seems to work with the automatic tree search model
- (runmode=1 or 2). The old readme file is renamed manual.
-
-
-March 1994
-Version 0.10, no name for package: (NHM, England)
-
- Programs tripml and comptree are renamed codonml and rell. The readme
- file is expanded and the numerical optimization routine, ming1, used
- by dnaml and codonml, is improved. The discrete-gamma model of Yang
- (1994 JME) is implemented in both baseml and codonml.
-
- Preliminary version 0.11, no name for package: April 1994 (NHM,
- England) Programs dnaml and dnamls are renamed baseml and basemlg to
- avoid confusion with Joe Felsenstein's programs. The
- auto-discrete-gamma model and the nonparametric models of rate
- variation and correlation mong sites of Yang (1995 Genetics) are added
- in baseml. Nonhomogeneous-process models of Yang and Roberts (1995
- MBE) are implemented in baseml.
-
-
-April 1993
-Version 0.01, no name for package (Cambridge). Three programs are included in
-the package:
-
- 1. dnamls implements the model of gamma rates among sites of Yang
- (1993) and works with arbitrary topologies and substitution models
- (JC69, K80, F81, HKY85).
-
- 2. dnaml implements the model of a single rate for all sites. (The
- same name as Joe Felsenstein's program was used.)
-
- 3. tripml implements the codon-based model of Goldman and Yang (1994).
- Trees are represented by the "branch notation".
-
-// end of file
=====================================
paml4.9i.tgz deleted
=====================================
Binary files a/paml4.9i.tgz and /dev/null differ
=====================================
src/mcmctree.c
=====================================
@@ -272,7 +272,7 @@ int main(int argc, char *argv[])
if ((k = stree.nodes[i].fossil) == 0) continue;
printf("Node %3d: %3s ( ", i + 1, fossils[k]);
for (j = 0; j < npfossils[k]; j++) {
- printf("%6.4f", stree.nodes[i].pfossil[j + (k == UPPER_F)]);
+ printf("%7.4f", stree.nodes[i].pfossil[j + (k == UPPER_F)]);
printf("%s", (j == npfossils[k] - 1 ? " )\n" : ", "));
}
}
@@ -1038,7 +1038,7 @@ int GetOptions(char *ctlf)
{
int transform0 = ARCSIN_B; /* default transform: SQRT_B, LOG_B, ARCSIN_B */
int iopt, i, j, nopt = 31, lline = 4096;
- char line[4096], *pline, *peq, opt[33], *comment = "*#";
+ char line[4096], *pline, opt[33], *comment = "*#";
char *optstr[] = { "seed", "seqfile","treefile", "outfile", "mcmcfile", "BayesFactorBeta",
"seqtype", "aaRatefile", "icode", "noisy", "usedata", "ndata", "duplication", "model", "clock",
"TipDate", "RootAge", "fossilerror", "alpha", "ncatG", "cleandata",
@@ -1595,128 +1595,213 @@ double Infinitesites(FILE *fout)
int GetInitialsTimes(void)
{
-/* This sets calibration node ages and root age first, and then populates other node ages, by
- a break-stick strategy.
- It ensures that each node age is younger than its parent's age, but does not check
- the consistency of ages against the fossil constraints. The use of soft bounds means that
- any ages are possible, even if the chain may start from a poor place.
- In the case of shared/mirrored node ages due to gene duplication, it is not so clear whether
- conflicts may still arise.
-*/
- int s = stree.nspecies, i, j, k, k1, ir, ncorrections, driver;
- int maxdepth = 0, from, to, depth;
- double *p, a, b, d, *r, agefrom, ageto;
+ /* This sets ages for nodes with bounds first. Then it goes through a loop looking for nodes
+ with the largest min calibration, and another loop looking for the longest path, each
+ case populating node ages on the path until reaching an ancestral node with assigned age or
+ with max bound.
+ In the case of shared/mirrored node ages due to gene duplication, conflicts may still arise,
+ so that a loop of corrections is used.
+
+ stree.duplication:
+ stree.nodes[i].label = 0: usual node, which may and may not have calibration.
+ stree.nodes[j].label =-1: node i is the driver node, whose age is shared by other nodes.
+ It may and may not have calibration.
+ stree.nodes[j].label = i: node j shares the age of node i, and doesn't have calibration.
+ */
+ int s = stree.nspecies, s21 = s * 2 - 1, root = stree.root;
+ int i, j, k, k1, ir, ncorrections, driver, from, to, depth, maxdepth;
+ double *p, a, b, d, tbpath[2], *tmin, *tmax, *r;
+ char *pptable, *flag;
+ int debug = 1;
if (com.TipDate && stree.nodes[stree.root].fossil != BOUND_F)
error2("\nTipDate model requires bounds on the root age..\n");
- /* set up ages of calibration nodes */
- for (i = s; i < s * 2 - 1; i++) stree.nodes[i].age = -1;
- for (i = s; i < s*2-1; i++) {
- if (stree.nodes[i].fossil == 0) continue;
- p = stree.nodes[i].pfossil;
+ /* set up pop-pop table of ancestor-descendent relationships. */
+ pptable = (char*)malloc(s21*(s21 + 1) * sizeof(char));
+ if (pptable == NULL) error2("oom pptable");
+ flag = pptable + s21*s21;
+ memset(pptable, 0, s21*(s21 + 1) * sizeof(char));
+ tmin = (double*)malloc(s21 * 2 * sizeof(double));
+ if (tmin == NULL) error2("oom tmin");
+ memset(tmin, 0, s21 * 2 * sizeof(double));
+ tmax = tmin + s21;
+ r = (double*)malloc(s * sizeof(double));
+ if (r == NULL) error2("oom r");
- switch(stree.nodes[i].fossil) {
- case (LOWER_F): stree.nodes[i].age = p[0] * (1.05 + 0.1*rndu()); break;
- case (UPPER_F): stree.nodes[i].age = p[1] * (0.6 + 0.4*rndu()); break;
- case (BOUND_F): stree.nodes[i].age = p[0] + (p[1] - p[0])*(0.2 + rndu()*1.6); break;
- case (GAMMA_F): stree.nodes[i].age = p[0] / p[1] * (0.7 + rndu()*0.6); break;
- case (SKEWN_F): case(SKEWT_F):
- d = p[2] / sqrt(1 + p[2] * p[2]);
- a = p[0] + p[1] * d*sqrt(2 / Pi);
- stree.nodes[i].age = a * (0.6 + 0.4*rndu());
- break;
- case (S2N_F):
- d = p[3] / sqrt(1 + p[3] * p[3]);
- a = (p[1] + p[2] * d*sqrt(2 / Pi)); /* mean of SN 1 */
- d = p[6] / sqrt(1 + p[6] * p[6]);
- b = (p[4] + p[5] * d*sqrt(2 / Pi)); /* mean of SN 2 */
- stree.nodes[i].age = (p[0] * a + b) * (0.6 + 0.4*rndu());
- break;
+ for (i = 0; i < s21; i++) pptable[i*s21 + i] = pptable[i*s21 + root] = (char)1;
+ for (i = 0; i < s21; i++) {
+ for (j = i; j != root; j = stree.nodes[j].father)
+ pptable[i*s21 + j] = (char)1;
+ }
+ for (i = s; i < s21; i++) {
+ if (stree.nodes[i].label != -1) continue; /* -1 means driver */
+ for (j = s; j < s21; j++) {
+ if (stree.nodes[j].label == i) /* if j mirrors i, ancestors of j are also ancestors of i. */
+ for (k = j; k != root && pptable[i*s21 + k] == 0; k = stree.nodes[k].father) {
+ pptable[i*s21 + k] = (char)1;
+ }
}
-
- /* copy nodes[i].age to all mirrored nodes */
- driver = (stree.nodes[i].label >= s ? stree.nodes[i].label : (stree.nodes[i].label == -1 ? i : 0));
- if (driver) {
- for (k = s; k < s * 2 - 1; k++) {
- if (k!=i && (k==driver || stree.nodes[k].label == driver))
- stree.nodes[k].age = stree.nodes[i].age;
+ /* if j mirrors i, copy ancestors of i to j */
+ for (j = s; j < s21; j++) {
+ if (stree.nodes[j].label == i)
+ memcpy(pptable + j*s21 + s, pptable + i*s21 + s, (s - 1) * sizeof(char));
+ }
+ }
+ if (debug && s<20) {
+ printf("\n\nSetting up initial node ages...\npop-pop table:\n ");
+ for (j = 0; j < s21; j++) printf(" %2d", j);
+ for (i = 0; i < s21; i++) {
+ printf("\n%2d: ", i);
+ for (j = 0; j < s21; j++) printf(" %2d", pptable[i*s21 + j]);
+ if (i >= s) {
+ printf(" %3s ", (stree.nodes[i].fossil ? fossils[stree.nodes[i].fossil] : ""));
+ printf(" label = %2d, ", stree.nodes[i].label);
+ if (stree.nodes[i].label == -1) printf("driver");
+ else if (stree.nodes[i].label > 0) printf("mirror");
}
}
}
- /* check for consistency among calibration and mirrored nodes, and correct if needed. */
- for (ir = 0; ir < 100; ir++) {
- ncorrections = 0;
- for (i = 0; i < s * 2 - 1; i++) {
- if (stree.nodes[i].age == -1) continue;
- /* j is calibration node ancestral to i. */
- for (j = stree.nodes[i].father; j != -1; j = stree.nodes[j].father)
- if (stree.nodes[j].age > 0) break;
- if (j != -1 && stree.nodes[j].age < stree.nodes[i].age) {
- stree.nodes[j].age = stree.nodes[i].age * (1.001 + 0.01*rndu());
- ncorrections++;
- /* copy nodes[j].age to all mirrored nodes */
- driver = (stree.nodes[j].label >= s ? stree.nodes[j].label : (stree.nodes[j].label == -1 ? j : 0));
- if (driver) {
- for (k = s; k < s * 2 - 1; k++) {
- if (k != j && (k == driver || stree.nodes[k].label == driver))
- stree.nodes[k].age = stree.nodes[j].age;
- }
- }
+ /* retrieve tmin & rmax from calibrations */
+ for (i = s; i < s21; i++) stree.nodes[i].age = -1;
+ for (i = s; i < s21; i++) {
+ if (stree.nodes[i].fossil == 0) continue;
+ p = stree.nodes[i].pfossil;
+
+ switch (stree.nodes[i].fossil) {
+ case (LOWER_F): tmin[i] = p[0]; break;
+ case (UPPER_F): tmax[i] = p[1]; break;
+ case (BOUND_F): tmin[i] = p[0]; tmax[i] = p[1]; break;
+ case (GAMMA_F):
+ tmin[i] = QuantileGamma(0.025, p[0], p[1]); tmax[i] = QuantileGamma(0.975, p[0], p[1]);
+ break;
+ case (SKEWN_F):
+ case (SKEWT_F):
+ case (S2N_F):
+ error2("implement SkewN, Skewt, S2N now");
+ }
+ }
+ for (i = s; i < s21; i++) {
+ for (j = s; j < s21; j++)
+ if (j != i && pptable[i*s21 + j] && tmax[j] > 0 && tmin[i] > tmax[j]) {
+ printf("\n\ndaughter node %3d minimun age %9.4f \nmother node %3d max age %9.4f\n", i + 1, tmin[i], j + 1, tmax[j]);
+ error2("strange calibration?");
+ }
+ }
+ for (i = s; i < s21; i++) {
+ if (tmin[i]) /* remove duplicated min bounds for ancestors of i */
+ for (j = s; j < s21; j++)
+ if (j != i && pptable[i*s21 + j] && tmin[j] <= tmin[i])
+ tmin[j] = 0;
+ if (tmax[i]) /* remove duplicated max bounds for descendents of i */
+ for (j = s; j < s21; j++) {
+ if (j != i && pptable[j*s21 + i] && tmax[j] >= tmax[i])
+ tmax[j] = 0;
}
+ }
+ if (debug) {
+ printf("\n\nbounds on nodes after removing duplicated min and max bounds");
+ for (i = s; i < s21; i++) {
+ if(tmin[i] || tmax[i])
+ printf("\n%2d: (%8.4f, %8.4f)", i+1, tmin[i], tmax[i]);
}
- if (ncorrections == 0) break;
}
- if (ir == 100)
- puts("i did 100 rounds, but we need more!");
+ /* generate ages for bounded nodes if possible. */
+ for (i = s; i < s21; i++) {
+ if (tmin[i] && tmax[i])
+ stree.nodes[i].age = tmin[i] + (tmax[i] - tmin[i])*rndu();
+ }
+ if (tmax[root] == 0) error2("we need a maximum-age bound on root??");
- /* look for the deepest path from a node to its ancestor, and assign ages to all nodes inbetween. */
- r = (double*)malloc(s*sizeof(double));
- if (r == NULL) error2("oom r");
+ /* look for largest unused tmin, and assign ages on path from node to ancestor with age or with tmax. */
+ for (ir = 0; ir < s; ir++) {
+ tbpath[0] = 0;
+ for (i = s; i < s21; i++) { /* find max tmin and store in tbpath[0]. */
+ if (stree.nodes[i].age == -1 && tbpath[0] < tmin[i]) {
+ tbpath[0] = tmin[i]; from = i;
+ }
+ }
+ if (tbpath[0] == 0) break;
+ maxdepth = 1;
+ for (j = stree.nodes[from].father; j != -1; j = stree.nodes[j].father, maxdepth++) {
+ if (stree.nodes[j].age > 0)
+ break;
+ else if (tmax[j] > 0) {
+ maxdepth++;
+ break;
+ }
+ }
+ to = j;
+ tbpath[1] = (stree.nodes[to].age > 0 ? stree.nodes[to].age : tmax[to]);
- for (ir = 0; ir < s * 2 - 1; ir++) {
- for (i = 0, maxdepth = 0; i < s * 2 - 1; i++) {
- if (stree.nodes[i].age == -1) continue;
- depth = 0;
- for (j = stree.nodes[i].father; j != -1; j = stree.nodes[j].father) {
- if (stree.nodes[j].age > 0) break;
- depth++;
+ for (j = 0; j < maxdepth; j++) r[j] = rndu();
+ if (maxdepth>1) qsort(r, (size_t)maxdepth, sizeof(double), comparedouble);
+ for (j = from, k = 0; k<maxdepth; j = stree.nodes[j].father, k++) {
+ stree.nodes[j].age = tbpath[0] + (tbpath[1] - tbpath[0])*r[k];
+ driver = (stree.nodes[j].label >= s ? stree.nodes[j].label : (stree.nodes[j].label == -1 ? j : 0));
+ if (driver) {
+ for (k1 = s; k1 < s21; k1++) {
+ if (k1 != j && (k1 == driver || stree.nodes[k1].label == driver))
+ stree.nodes[k1].age = stree.nodes[j].age;
+ }
+ }
+ }
+ printf("\n*** min-age round %2d: from nodes %3d (%9.5f) to %3d, %3d nodes ", ir+1, from+1, stree.nodes[from].age, to+1, maxdepth);
+ if(debug && s<200) printStree();
+ }
+
+ /* look for longest path, and assign ages on path from node to ancestor with age or with tmax. */
+ for (ir = 0; ir < s; ir++) {
+ maxdepth = 0;
+ for (i = s; i < s21; i++) {
+ if (stree.nodes[i].age > 0) continue;
+ for (j = stree.nodes[i].father, depth = 1; j != -1; j = stree.nodes[j].father, depth++) {
+ if (stree.nodes[j].age > 0)
+ break;
+ else if (tmax[j] > 0) {
+ depth++;
+ break;
+ }
}
if (maxdepth < depth) {
maxdepth = depth; from = i; to = j;
}
}
if (maxdepth == 0) break; /* all ages are assigned. */
+ int *sons = stree.nodes[from].sons;
+ tbpath[0] = max2(stree.nodes[sons[0]].age, stree.nodes[sons[1]].age);
+ tbpath[1] = (stree.nodes[to].age > 0 ? stree.nodes[to].age : tmax[to]);
+
for (j = 0; j < maxdepth; j++) r[j] = rndu();
- qsort(r, (size_t)maxdepth, sizeof(double), comparedouble);
- agefrom = stree.nodes[from].age;
- ageto = stree.nodes[to].age;
- for (j = stree.nodes[from].father, k = 0; j != to; j = stree.nodes[j].father, k++) {
- stree.nodes[j].age = agefrom + (ageto - agefrom)*r[k];
+ if(maxdepth>1) qsort(r, (size_t)maxdepth, sizeof(double), comparedouble);
+ for (j = from, k = 0; k<maxdepth; j = stree.nodes[j].father, k++) {
+ stree.nodes[j].age = tbpath[0] + (tbpath[1] - tbpath[0])*r[k];
driver = (stree.nodes[j].label >= s ? stree.nodes[j].label : (stree.nodes[j].label == -1 ? j : 0));
if (driver) {
- for (k1 = s; k1 < s * 2 - 1; k1++) {
+ for (k1 = s; k1 < s21; k1++) {
if (k1 != j && (k1 == driver || stree.nodes[k1].label == driver))
stree.nodes[k1].age = stree.nodes[j].age;
}
}
}
+ printf("\n*** longest-path round %2d: from nodes %3d (%9.5f) to %3d, %3d nodes ", ir + 1, from + 1, stree.nodes[from].age, to + 1, maxdepth);
+ if (debug && s<200) printStree();
}
-
/* check for consistency for all nodes, and correct if needed. */
for (ir = 0; ir < 100; ir++) {
ncorrections = 0;
- for (i = 0; i < s * 2 - 1; i++) {
+ for (i = 0; i < s21; i++) {
j = stree.nodes[i].father;
if (j == -1 || stree.nodes[i].age < stree.nodes[j].age) continue;
+ printf("\n*** correction round %2d nodes %2d & %2d, ages %9.5f %9.5f", ir+1, i+1, j+1, stree.nodes[i].age, stree.nodes[j].age);
ncorrections++;
stree.nodes[j].age = stree.nodes[i].age * (1.001 + 0.01*rndu());
/* copy nodes[j].age to all mirrored nodes */
driver = (stree.nodes[j].label >= s ? stree.nodes[j].label : (stree.nodes[j].label == -1 ? j : 0));
if (driver) {
- for (k = s; k < s * 2 - 1; k++) {
+ for (k = s; k < s21; k++) {
if (k != j && (k == driver || stree.nodes[k].label == driver))
stree.nodes[k].age = stree.nodes[j].age;
}
@@ -1724,10 +1809,8 @@ int GetInitialsTimes(void)
}
if (ncorrections == 0) break;
}
- if (ir == 100)
- puts("i did 100 rounds, but we need more!");
- free(r);
+ free(pptable); free(tmin); free(r);
return (0);
}
@@ -3808,14 +3891,14 @@ int DescriptiveStatisticsSimpleMCMCTREE(FILE *fout, char infile[], int nbin)
}
}
- printf("\n\nPosterior mean (95%% Equal-tail CI) (95%% HPD CI) HPD-CI-width\n\n");
+ printf("\n\nPosterior means (95%% Equal-tail CI) (95%% HPD CI) HPD-CI-width\n\n");
for (j = SkipC1; j < p; j++) {
printf("%-10s ", varstr[j]);
- printf("%10.4f (%6.4f, %6.4f) (%6.4f, %6.4f) %6.4f", mean[j], x025[j], x975[j], xHPD025[j], xHPD975[j], xHPD975[j] - xHPD025[j]);
+ printf("%10.4f (%7.4f, %7.4f) (%7.4f, %7.4f) %7.4f", mean[j], x025[j], x975[j], xHPD025[j], xHPD975[j], xHPD975[j] - xHPD025[j]);
if (j < SkipC1 + stree.nspecies - 1) {
printf(" (Jnode %2d)", 2 * stree.nspecies - 1 - 1 - j + SkipC1);
if (com.TipDate)
- printf(" time: %6.2f (%5.2f, %5.2f)", com.TipDate - mean[j] * com.TipDate_TimeUnit,
+ printf(" time: %7.3f (%6.3f, %6.3f)", com.TipDate - mean[j] * com.TipDate_TimeUnit,
com.TipDate - x975[j] * com.TipDate_TimeUnit, com.TipDate - x025[j] * com.TipDate_TimeUnit);
}
printf("\n");
@@ -3827,11 +3910,11 @@ int DescriptiveStatisticsSimpleMCMCTREE(FILE *fout, char infile[], int nbin)
fprintf(fout, "\n\nPosterior mean (95%% Equal-tail CI) (95%% HPD CI) HPD-CI-width\n\n");
for (j = SkipC1; j < p; j++) {
fprintf(fout, "%-10s ", varstr[j]);
- fprintf(fout, "%10.4f (%6.4f, %6.4f) (%6.4f, %6.4f) %6.4f", mean[j], x025[j], x975[j], xHPD025[j], xHPD975[j], xHPD975[j] - xHPD025[j]);
+ fprintf(fout, "%10.4f (%7.4f, %7.4f) (%7.4f, %7.4f) %7.4f", mean[j], x025[j], x975[j], xHPD025[j], xHPD975[j], xHPD975[j] - xHPD025[j]);
if (j < SkipC1 + stree.nspecies - 1) {
fprintf(fout, " (Jnode %2d)", 2 * stree.nspecies - 1 - 1 - j + SkipC1);
if (com.TipDate)
- fprintf(fout, " time: %6.2f (%5.2f, %5.2f)", com.TipDate - mean[j] * com.TipDate_TimeUnit,
+ fprintf(fout, " time: %7.3f (%6.3f, %6.3f)", com.TipDate - mean[j] * com.TipDate_TimeUnit,
com.TipDate - x975[j] * com.TipDate_TimeUnit, com.TipDate - x025[j] * com.TipDate_TimeUnit);
}
fprintf(fout, "\n");
@@ -3867,7 +3950,7 @@ int MCMC(FILE* fout)
printf("\n%d burnin, sampled every %d, %d samples\n",
mcmc.burnin, mcmc.sampfreq, mcmc.nsample);
if (mcmc.usedata) puts("Approximating posterior");
- else puts("Approximating prior");
+ else puts("Approximating prior");
printf("(Settings: cleandata=%d print=%d saveconP=%d)\n",
com.cleandata, mcmc.print, mcmc.saveconP);
@@ -3877,9 +3960,10 @@ int MCMC(FILE* fout)
for (j = 0; j < mcmc.nsteplength; j++)
mcmc.steplength[j] = 0.001 + 0.1*rndu();
+ /*
printf("\nsteplength values: ");
matout2(F0, mcmc.steplength, 1, mcmc.nsteplength, 9, 5);
-
+ */
printStree();
x = (double*)malloc(com.np * 2 * sizeof(double));
=====================================
src/paml.h
=====================================
@@ -19,6 +19,14 @@
#define FOR(i,n) for(i=0; i<n; i++)
#define FPN(file) fputc('\n', file)
#define F0 stdout
+#if !defined(MAX)
+#define MAX(a,b) (((a) > (b)) ? (a) : (b))
+#endif
+#if !defined(MIN)
+#define MIN(a,b) (((a) < (b)) ? (a) : (b))
+#endif
+#define SQUARE(a) ((a)*(a))
+
#define min2(a,b) ((a)<(b)?(a):(b))
#define max2(a,b) ((a)>(b)?(a):(b))
#define swap2(a,b,y) { y=a; a=b; b=y; }
@@ -386,6 +394,6 @@ enum {PrBranch=1, PrNodeNum=2, PrLabel=4, PrNodeStr=8, PrAge=16, PrOmega=32} Out
#define PAML_RELEASE 0
-#define pamlVerStr "paml version 4.9i, September 2018"
+#define pamlVerStr "paml version 4.9j, October 2019"
#endif
=====================================
src/pll_avx.c deleted
=====================================
@@ -1,444 +0,0 @@
-/*
- Copyright (C) 2015 Tomas Flouri
-
- This program is free software: you can redistribute it and/or modify
- it under the terms of the GNU Affero General Public License as
- published by the Free Software Foundation, either version 3 of the
- License, or (at your option) any later version.
-
- This program is distributed in the hope that it will be useful,
- but WITHOUT ANY WARRANTY; without even the implied warranty of
- MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
- GNU Affero General Public License for more details.
-
- You should have received a copy of the GNU Affero General Public License
- along with this program. If not, see <http://www.gnu.org/licenses/>.
-
- Contact: Tomas Flouri <Tomas.Flouri at h-its.org>,
- Exelixis Lab, Heidelberg Instutute for Theoretical Studies
- Schloss-Wolfsbrunnenweg 35, D-69118 Heidelberg, Germany
-*/
-
-#if defined(_MSC_VER)
-static __declspec(align(32)) double lmatrix[4*4];
-static __declspec(align(32)) double rmatrix[4*4];
-#else
-static double lmatrix[4*4] __attribute__ ((aligned (32)));
-static double rmatrix[4*4] __attribute__ ((aligned (32)));
-#endif
-
-void pmat_jc69(double * pmatrix, double t)
-{
- if (t < -0.0001)
- printf ("\nt = %.5f in pijJC69", t);
-
- if (t < 1e-100)
- {
- pmatrix[0] = 1;
- pmatrix[1] = 0;
- pmatrix[2] = 0;
- pmatrix[3] = 0;
-
- pmatrix[4] = 0;
- pmatrix[5] = 1;
- pmatrix[6] = 0;
- pmatrix[7] = 0;
-
- pmatrix[8] = 0;
- pmatrix[9] = 0;
- pmatrix[10] = 1;
- pmatrix[11] = 0;
-
- pmatrix[12] = 0;
- pmatrix[13] = 0;
- pmatrix[14] = 0;
- pmatrix[15] = 1;
- }
- else
- {
- double a = (1 + 3*exp(-4*t/3) ) / 4;
- double b = (1 - a) / 3;
-
- pmatrix[0] = a;
- pmatrix[1] = b;
- pmatrix[2] = b;
- pmatrix[3] = b;
-
- pmatrix[4] = b;
- pmatrix[5] = a;
- pmatrix[6] = b;
- pmatrix[7] = b;
-
- pmatrix[8] = b;
- pmatrix[9] = b;
- pmatrix[10] = a;
- pmatrix[11] = b;
-
- pmatrix[12] = b;
- pmatrix[13] = b;
- pmatrix[14] = b;
- pmatrix[15] = a;
- }
-}
-
-static void tip_tip(int lchild, int rchild, double * clv)
-{
- int sites = com.npatt;
- int states = 4;
- int n,i;
-
- __m256d ymm0,ymm1,ymm2;
-
- pmat_jc69(lmatrix, nodes[lchild].branch);
- pmat_jc69(rmatrix, nodes[rchild].branch);
-
- for (n = 0; n < sites; ++n)
- {
- double * lmat = lmatrix + com.z[lchild][n] * states;
- double * rmat = rmatrix + com.z[rchild][n] * states;
-
- // for (i = 0; i < states; ++i)
- // clv[i] = lmat[i] * rmat[i];
-
- ymm0 = _mm256_load_pd(lmat);
- ymm1 = _mm256_load_pd(rmat);
- ymm2 = _mm256_mul_pd(ymm0,ymm1);
-
- _mm256_store_pd(clv,ymm2);
-
- clv += states;
- }
-}
-
-static void tip_tip_amb(int lchild, int rchild, double * clv)
-{
- int sites = com.npatt;
- int states = 4;
- int n,k;
-
- double lvec[4];
- double rvec[4];
-
- pmat_jc69(lmatrix, nodes[lchild].branch);
- pmat_jc69(rmatrix, nodes[rchild].branch);
-
- __m256d xmm0,xmm1,ymm0,ymm1;
-
- for (n = 0; n < sites; ++n)
- {
- double * lmat = lmatrix + (CharaMap[com.z[lchild][n]][0] << 2); // *states;
- double * rmat = rmatrix + (CharaMap[com.z[rchild][n]][0] << 2); // *states;
-
- xmm0 = _mm256_load_pd(lmat);
- for (k = 1; k < nChara[com.z[lchild][n]]; ++k)
- {
- lmat = lmatrix + (CharaMap[com.z[lchild][n]][k] << 2); // *states;
- xmm1 = _mm256_load_pd(lmat);
- xmm0 = _mm256_add_pd(xmm0,xmm1);
- }
-
- ymm0 = _mm256_load_pd(rmat);
- for (k = 1; k < nChara[com.z[rchild][n]]; ++k)
- {
- rmat = rmatrix + (CharaMap[com.z[rchild][n]][k] << 2); // *states;
- ymm1 = _mm256_load_pd(rmat);
- ymm0 = _mm256_add_pd(ymm0,ymm1);
- }
-
- xmm1 = _mm256_mul_pd(xmm0,ymm0);
- _mm256_store_pd(clv,xmm1);
-
- clv += states;
- }
-}
-
-static void tip_inner(int lchild, int rchild, double * clv)
-{
- int sites = com.npatt;
- int states = 4;
- int h;
-
- pmat_jc69(lmatrix, nodes[lchild].branch);
- pmat_jc69(rmatrix, nodes[rchild].branch);
-
- double * rclv = nodes[rchild].conP;
-
- __m256d ymm0,ymm1,ymm2,ymm3,ymm4,ymm5,ymm6,ymm7;
- __m256d xmm0,xmm4;
-
- for (h = 0; h < sites; ++h)
- {
- double * lmat = lmatrix + com.z[lchild][h] * states;
- double * rmat = rmatrix;
-
- xmm4 = _mm256_load_pd(lmat);
-
- ymm4 = _mm256_load_pd(rmat);
- ymm5 = _mm256_load_pd(rclv);
- ymm0 = _mm256_mul_pd(ymm4,ymm5);
-
- rmat += states;
-
- ymm4 = _mm256_load_pd(rmat);
- ymm1 = _mm256_mul_pd(ymm4,ymm5);
-
- rmat += states;
-
- ymm4 = _mm256_load_pd(rmat);
- ymm2 = _mm256_mul_pd(ymm4,ymm5);
-
- rmat += states;
-
- ymm4 = _mm256_load_pd(rmat);
- ymm3 = _mm256_mul_pd(ymm4,ymm5);
-
- /* compute y */
- ymm4 = _mm256_unpackhi_pd(ymm0,ymm1);
- ymm5 = _mm256_unpacklo_pd(ymm0,ymm1);
-
- ymm6 = _mm256_unpackhi_pd(ymm2,ymm3);
- ymm7 = _mm256_unpacklo_pd(ymm2,ymm3);
-
- ymm0 = _mm256_add_pd(ymm4,ymm5);
- ymm1 = _mm256_add_pd(ymm6,ymm7);
-
- ymm2 = _mm256_permute2f128_pd(ymm0,ymm1, _MM_SHUFFLE(0,2,0,1));
- ymm3 = _mm256_blend_pd(ymm0,ymm1,12);
- ymm4 = _mm256_add_pd(ymm2,ymm3);
-
- /* compute x*y */
- xmm0 = _mm256_mul_pd(xmm4,ymm4);
-
- _mm256_store_pd(clv, xmm0);
-
- clv += states;
- rclv += states;
- }
-}
-
-static void tip_inner_amb(int lchild, int rchild, double * clv)
-{
- int sites = com.npatt;
- int states = 4;
- int h,j,k;
- double y;
- double x;
-
- pmat_jc69(lmatrix, nodes[lchild].branch);
- pmat_jc69(rmatrix, nodes[rchild].branch);
-
- __m256d ymm0,ymm1,ymm2,ymm3,ymm4,ymm5,ymm6,ymm7;
- __m256d xmm0,xmm1;
-
- double * rclv = nodes[rchild].conP;
-
- for (h = 0; h < sites; ++h)
- {
- double * lmat = lmatrix + (CharaMap[com.z[lchild][h]][0] << 2); // *states;
- double * rmat = rmatrix;
-
- /* compute term for left child (tip) */
- xmm0 = _mm256_load_pd(lmat);
- for (k = 1; k < nChara[com.z[lchild][h]]; ++k)
- {
- lmat = lmatrix + (CharaMap[com.z[lchild][h]][k] << 2); // *states;
- xmm1 = _mm256_load_pd(lmat);
- xmm0 = _mm256_add_pd(xmm0,xmm1);
- }
-
- /* compute term for right child (inner) */
- ymm4 = _mm256_load_pd(rmat);
- ymm5 = _mm256_load_pd(rclv);
- ymm0 = _mm256_mul_pd(ymm4,ymm5);
-
- rmat += states;
-
- ymm4 = _mm256_load_pd(rmat);
- ymm1 = _mm256_mul_pd(ymm4,ymm5);
-
- rmat += states;
-
- ymm4 = _mm256_load_pd(rmat);
- ymm2 = _mm256_mul_pd(ymm4,ymm5);
-
- rmat += states;
-
- ymm4 = _mm256_load_pd(rmat);
- ymm3 = _mm256_mul_pd(ymm4,ymm5);
-
- ymm4 = _mm256_unpackhi_pd(ymm0,ymm1);
- ymm5 = _mm256_unpacklo_pd(ymm0,ymm1);
-
- ymm6 = _mm256_unpackhi_pd(ymm2,ymm3);
- ymm7 = _mm256_unpacklo_pd(ymm2,ymm3);
-
- ymm0 = _mm256_add_pd(ymm4,ymm5);
- ymm1 = _mm256_add_pd(ymm6,ymm7);
-
- ymm2 = _mm256_permute2f128_pd(ymm0,ymm1, _MM_SHUFFLE(0,2,0,1));
- ymm3 = _mm256_blend_pd(ymm0,ymm1,12);
- ymm4 = _mm256_add_pd(ymm2,ymm3);
-
- ymm0 = _mm256_mul_pd(xmm0,ymm4);
-
- _mm256_store_pd(clv,ymm0);
-
- clv += states;
- rclv += states;
- }
-
-}
-
-static void inner_inner(int lchild, int rchild, double * clv)
-{
- int sites = com.npatt;
- int states = 4;
- int h;
-
- __m256d ymm0,ymm1,ymm2,ymm3,ymm4,ymm5,ymm6,ymm7;
- __m256d xmm0,xmm1,xmm2,xmm3,xmm4,xmm5,xmm6,xmm7;
-
- pmat_jc69(lmatrix, nodes[lchild].branch);
- pmat_jc69(rmatrix, nodes[rchild].branch);
-
- double * lclv = nodes[lchild].conP;
- double * rclv = nodes[rchild].conP;
-
- for (h = 0; h < sites; ++h)
- {
- double * lmat = lmatrix;
- double * rmat = rmatrix;
-
- /* compute vector of x */
- xmm4 = _mm256_load_pd(lmat);
- xmm5 = _mm256_load_pd(lclv);
- xmm0 = _mm256_mul_pd(xmm4,xmm5);
-
- ymm4 = _mm256_load_pd(rmat);
- ymm5 = _mm256_load_pd(rclv);
- ymm0 = _mm256_mul_pd(ymm4,ymm5);
-
- lmat += states;
- rmat += states;
-
- xmm4 = _mm256_load_pd(lmat);
- xmm1 = _mm256_mul_pd(xmm4,xmm5);
-
- ymm4 = _mm256_load_pd(rmat);
- ymm1 = _mm256_mul_pd(ymm4,ymm5);
-
- lmat += states;
- rmat += states;
-
- xmm4 = _mm256_load_pd(lmat);
- xmm2 = _mm256_mul_pd(xmm4,xmm5);
-
- ymm4 = _mm256_load_pd(rmat);
- ymm2 = _mm256_mul_pd(ymm4,ymm5);
-
- lmat += states;
- rmat += states;
-
- xmm4 = _mm256_load_pd(lmat);
- xmm3 = _mm256_mul_pd(xmm4,xmm5);
-
- ymm4 = _mm256_load_pd(rmat);
- ymm3 = _mm256_mul_pd(ymm4,ymm5);
-
- /* compute x */
- xmm4 = _mm256_unpackhi_pd(xmm0,xmm1);
- xmm5 = _mm256_unpacklo_pd(xmm0,xmm1);
-
- xmm6 = _mm256_unpackhi_pd(xmm2,xmm3);
- xmm7 = _mm256_unpacklo_pd(xmm2,xmm3);
-
- xmm0 = _mm256_add_pd(xmm4,xmm5);
- xmm1 = _mm256_add_pd(xmm6,xmm7);
-
- xmm2 = _mm256_permute2f128_pd(xmm0,xmm1, _MM_SHUFFLE(0,2,0,1));
- xmm3 = _mm256_blend_pd(xmm0,xmm1,12);
- xmm4 = _mm256_add_pd(xmm2,xmm3);
-
- /* compute y */
- ymm4 = _mm256_unpackhi_pd(ymm0,ymm1);
- ymm5 = _mm256_unpacklo_pd(ymm0,ymm1);
-
- ymm6 = _mm256_unpackhi_pd(ymm2,ymm3);
- ymm7 = _mm256_unpacklo_pd(ymm2,ymm3);
-
- ymm0 = _mm256_add_pd(ymm4,ymm5);
- ymm1 = _mm256_add_pd(ymm6,ymm7);
-
- ymm2 = _mm256_permute2f128_pd(ymm0,ymm1, _MM_SHUFFLE(0,2,0,1));
- ymm3 = _mm256_blend_pd(ymm0,ymm1,12);
- ymm4 = _mm256_add_pd(ymm2,ymm3);
-
- /* compute x*y */
- xmm0 = _mm256_mul_pd(xmm4,ymm4);
-
- _mm256_store_pd(clv, xmm0);
-
- clv += states;
- lclv += states;
- rclv += states;
- }
-}
-
-int ConditonalPNode (int inode)
-{
- int states = 4;
- int sites = com.npatt;
- int i;
- int child, lchild, rchild;
-
- /* recursive call the ConditionalPNode on all children of the current node */
- for (i = 0; i < nodes[inode].nson; ++i)
- {
- /* get id of i-th child of current node */
- child = nodes[inode].sons[i];
-
- /* if child is an inner node then recursively compute its conditional
- probabilities vector */
- if (nodes[child].nson > 0 && (!mcmc.saveconP || !com.oldconP[child]))
- ConditonalPNode(nodes[inode].sons[i]);
- }
-
- /* initialize CLV entries of current node to 1 */
- int n = sites * states;
- for (i = 0; i < n; ++i)
- nodes[inode].conP[i] = 1;
-
- if (nodes[inode].nson == 0) return (0);
-
- lchild = nodes[inode].sons[0];
- rchild = nodes[inode].sons[1];
-
- int ltip = (nodes[lchild].nson == 0);
- int rtip = (nodes[rchild].nson == 0);
-
- if (ltip && rtip)
- {
- if (com.cleandata)
- tip_tip(lchild,rchild,nodes[inode].conP);
- else
- tip_tip_amb(lchild,rchild,nodes[inode].conP);
- }
- else if (ltip)
- {
- if (com.cleandata)
- tip_inner(lchild,rchild,nodes[inode].conP);
- else
- tip_inner_amb(lchild,rchild,nodes[inode].conP);
- }
- else if (rtip)
- {
- if (com.cleandata)
- tip_inner(rchild,lchild,nodes[inode].conP);
- else
- tip_inner_amb(rchild,lchild,nodes[inode].conP);
- }
- else
- inner_inner(lchild,rchild,nodes[inode].conP);
-
- return (0);
-}
=====================================
src/pll_sse.c deleted
=====================================
@@ -1,608 +0,0 @@
-/*
- Copyright (C) 2015 Tomas Flouri
-
- This program is free software: you can redistribute it and/or modify
- it under the terms of the GNU Affero General Public License as
- published by the Free Software Foundation, either version 3 of the
- License, or (at your option) any later version.
-
- This program is distributed in the hope that it will be useful,
- but WITHOUT ANY WARRANTY; without even the implied warranty of
- MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
- GNU Affero General Public License for more details.
-
- You should have received a copy of the GNU Affero General Public License
- along with this program. If not, see <http://www.gnu.org/licenses/>.
-
- Contact: Tomas Flouri <Tomas.Flouri at h-its.org>,
- Exelixis Lab, Heidelberg Instutute for Theoretical Studies
- Schloss-Wolfsbrunnenweg 35, D-69118 Heidelberg, Germany
-*/
-
-#if defined(_MSC_VER)
-static __declspec(align(16)) double lmatrix[4*4];
-static __declspec(align(16)) double rmatrix[4*4];
-#else
-static double lmatrix[4*4] __attribute__ ((aligned (16)));
-static double rmatrix[4*4] __attribute__ ((aligned (16)));
-#endif
-
-void pmat_jc69(double * pmatrix, double t)
-{
- if (t < -0.0001)
- printf ("\nt = %.5f in pijJC69", t);
-
- if (t < 1e-100)
- {
- pmatrix[0] = 1;
- pmatrix[1] = 0;
- pmatrix[2] = 0;
- pmatrix[3] = 0;
-
- pmatrix[4] = 0;
- pmatrix[5] = 1;
- pmatrix[6] = 0;
- pmatrix[7] = 0;
-
- pmatrix[8] = 0;
- pmatrix[9] = 0;
- pmatrix[10] = 1;
- pmatrix[11] = 0;
-
- pmatrix[12] = 0;
- pmatrix[13] = 0;
- pmatrix[14] = 0;
- pmatrix[15] = 1;
- }
- else
- {
- double a = (1 + 3*exp(-4*t/3) ) / 4;
- double b = (1 - a) / 3;
-
- pmatrix[0] = a;
- pmatrix[1] = b;
- pmatrix[2] = b;
- pmatrix[3] = b;
-
- pmatrix[4] = b;
- pmatrix[5] = a;
- pmatrix[6] = b;
- pmatrix[7] = b;
-
- pmatrix[8] = b;
- pmatrix[9] = b;
- pmatrix[10] = a;
- pmatrix[11] = b;
-
- pmatrix[12] = b;
- pmatrix[13] = b;
- pmatrix[14] = b;
- pmatrix[15] = a;
- }
-}
-
-static void tip_tip(int lchild, int rchild, double * clv)
-{
- int sites = com.npatt;
- int states = 4;
- int n,i;
-
- pmat_jc69(lmatrix, nodes[lchild].branch);
- pmat_jc69(rmatrix, nodes[rchild].branch);
-
- __m128d xmm0,xmm1,xmm2;
-
- for (n = 0; n < sites; ++n)
- {
- double * lmat = lmatrix + (com.z[lchild][n] << 2); // *states;
- double * rmat = rmatrix + (com.z[rchild][n] << 2); // *states;
-
- /*
-
- Vectorization of the following code snippet
- (only applicable to JC69)
-
- for (i = 0; i < states; ++i)
- clv[i] = lmat[i] * rmat[i];
-
- */
-
- xmm0 = _mm_load_pd(lmat);
- xmm1 = _mm_load_pd(rmat);
- xmm2 = _mm_mul_pd(xmm0,xmm1);
-
- _mm_store_pd(clv,xmm2);
-
- xmm0 = _mm_load_pd(lmat+2);
- xmm1 = _mm_load_pd(rmat+2);
- xmm2 = _mm_mul_pd(xmm0,xmm1);
-
- _mm_store_pd(clv+2,xmm2);
-
- clv += states;
- }
-}
-
-static void tip_tip_amb(int lchild, int rchild, double * clv)
-{
- int sites = com.npatt;
- int states = 4;
- int n,k;
-
- double lvec[4];
- double rvec[4];
-
- pmat_jc69(lmatrix, nodes[lchild].branch);
- pmat_jc69(rmatrix, nodes[rchild].branch);
-
- __m128d xmm0,xmm1,xmm2,xmm3;
- __m128d ymm0,ymm1,ymm2,ymm3;
-
- for (n = 0; n < sites; ++n)
- {
- double * lmat = lmatrix + (CharaMap[com.z[lchild][n]][0] << 2); // *states;
- double * rmat = rmatrix + (CharaMap[com.z[rchild][n]][0] << 2); // *states;
-
- xmm0 = _mm_load_pd(lmat);
- xmm1 = _mm_load_pd(lmat+2);
- for (k = 1; k < nChara[com.z[lchild][n]]; ++k)
- {
- lmat = lmatrix + (CharaMap[com.z[lchild][n]][k] << 2); // *states;
- xmm2 = _mm_load_pd(lmat);
- xmm3 = _mm_load_pd(lmat+2);
- xmm0 = _mm_add_pd(xmm0,xmm2);
- xmm1 = _mm_add_pd(xmm1,xmm3);
- }
-
- ymm0 = _mm_load_pd(rmat);
- ymm1 = _mm_load_pd(rmat+2);
- for (k = 1; k < nChara[com.z[rchild][n]]; ++k)
- {
- rmat = rmatrix + (CharaMap[com.z[rchild][n]][k] << 2); // *states;
- ymm2 = _mm_load_pd(rmat);
- ymm3 = _mm_load_pd(rmat+2);
- ymm0 = _mm_add_pd(ymm0,ymm2);
- ymm1 = _mm_add_pd(ymm1,ymm3);
- }
-
- xmm2 = _mm_mul_pd(xmm0,ymm0);
- xmm3 = _mm_mul_pd(xmm1,ymm1);
-
- _mm_store_pd(clv,xmm2);
- _mm_store_pd(clv+2,xmm3);
-
- clv += states;
- }
-}
-
-static void tip_inner(int lchild, int rchild, double * clv)
-{
- int sites = com.npatt;
- int states = 4;
- int h,j,k;
- double y;
-
- __m128d xmm0,xmm1,xmm2,xmm3,xmm4;
- __m128d xmm5,xmm6,xmm7,xmm8,xmm9;
-
- pmat_jc69(lmatrix, nodes[lchild].branch);
- pmat_jc69(rmatrix, nodes[rchild].branch);
-
- double * rclv = nodes[rchild].conP;
-
- for (h = 0; h < sites; ++h)
- {
- double * lmat = lmatrix + (com.z[lchild][h] << 2); // *states;
- double * rmat = rmatrix;
-
- /*
-
- Vectorization of the following code snippet
-
- for (j = 0; j < states; ++j)
- {
- y = 0;
- for (k = 0; k < states; ++k)
- y += rmat[k] * rclv[k];
-
- rmat += states;
-
- clv[j] = y*lmat[j];
- }
- */
-
- xmm0 = _mm_load_pd(rclv); /* needed */
- xmm2 = _mm_load_pd(rmat);
- xmm3 = _mm_mul_pd(xmm0,xmm2);
-
- xmm1 = _mm_load_pd(rclv+2); /* needed */
- xmm2 = _mm_load_pd(rmat+2);
- xmm4 = _mm_mul_pd(xmm1,xmm2);
-
- /* calculate (b1*d1 + b2*d2 | b3*d3 + b4*d4) */
- xmm2 = _mm_hadd_pd(xmm3,xmm4); /* needed (1) */
-
- rmat += states;
-
- xmm3 = _mm_load_pd(rmat);
- xmm4 = _mm_mul_pd(xmm0,xmm3);
-
- xmm5 = _mm_load_pd(rmat+2);
- xmm6 = _mm_mul_pd(xmm1,xmm5);
-
- /* calculate (b5*d1 + b6*d2 | b7*d3 + b8*d4) */
- xmm3 = _mm_hadd_pd(xmm4,xmm6); /* needed (2) */
-
- rmat += states;
-
- xmm4 = _mm_load_pd(rmat);
- xmm5 = _mm_mul_pd(xmm0,xmm4);
-
- xmm6 = _mm_load_pd(rmat+2);
- xmm7 = _mm_mul_pd(xmm1,xmm6);
-
- /* calculate (b9*d1 + b10*d2 | b11*d3 + b12*d4) */
- xmm4 = _mm_hadd_pd(xmm5,xmm7); /* needed (3) */
-
- rmat += states;
-
- xmm5 = _mm_load_pd(rmat);
- xmm6 = _mm_mul_pd(xmm0,xmm5);
-
- xmm7 = _mm_load_pd(rmat+2);
- xmm8 = _mm_mul_pd(xmm1,xmm7);
-
- /* calculate (b13*d1 + b14*d2 | b15*d3 + b16*d4) */
- xmm5 = _mm_hadd_pd(xmm6,xmm8); /* needed (4) */
-
- /* calculate (b1*d1 + b2*d2 + b3*d3 + b4*d3 |
- b5*d1 + b6*d2 + b7*d3 + b8*d4 ) */
- xmm0 = _mm_hadd_pd(xmm2,xmm3);
-
- /* calculate (b9*d1 + b10*d2 + b11*d3 + b12*d3 |
- b13*d1 + b14*d2 + b15*d3 + b16*d4 ) */
- xmm1 = _mm_hadd_pd(xmm4,xmm5);
-
- xmm2 = _mm_load_pd(lmat);
- xmm3 = _mm_load_pd(lmat+2);
- xmm4 = _mm_mul_pd(xmm0,xmm2);
- xmm5 = _mm_mul_pd(xmm1,xmm3);
-
- _mm_store_pd(clv,xmm4);
- _mm_store_pd(clv+2,xmm5);
-
- clv += states;
- rclv += states;
- }
-}
-
-static void tip_inner_amb(int lchild, int rchild, double * clv)
-{
- int sites = com.npatt;
- int states = 4;
- int h,k;
-
- pmat_jc69(lmatrix, nodes[lchild].branch);
- pmat_jc69(rmatrix, nodes[rchild].branch);
-
- __m128d xmm0,xmm1,xmm2,xmm3;
- __m128d ymm2,ymm3,ymm4,ymm5,ymm6,ymm7,ymm8,ymm9;
-
- double * rclv = nodes[rchild].conP;
-
- for (h = 0; h < sites; ++h)
- {
- double * lmat = lmatrix + (CharaMap[com.z[lchild][h]][0] << 2); // *states;
- double * rmat = rmatrix;
-
- /* compute term for left child (tip) */
- xmm0 = _mm_load_pd(lmat);
- xmm1 = _mm_load_pd(lmat+2);
- for (k = 1; k < nChara[com.z[lchild][h]]; ++k)
- {
- lmat = lmatrix + (CharaMap[com.z[lchild][h]][k] << 2); // *states;
- xmm2 = _mm_load_pd(lmat);
- xmm3 = _mm_load_pd(lmat+2);
- xmm0 = _mm_add_pd(xmm0,xmm2);
- xmm1 = _mm_add_pd(xmm1,xmm3);
- }
-
- /* compute term for right child (inner) */
- ymm3 = _mm_load_pd(rclv); /* needed */
- ymm4 = _mm_load_pd(rmat);
- ymm6 = _mm_mul_pd(ymm3,ymm4);
-
- ymm4 = _mm_load_pd(rclv+2); /* needed */
- ymm7 = _mm_load_pd(rmat+2);
- ymm8 = _mm_mul_pd(ymm4,ymm7);
-
- /* calculate (b1*d1 + b2*d2 | b3*d3 + b4*d4) */
- ymm5 = _mm_hadd_pd(ymm6,ymm8); /* needed (2) */
-
- rmat += states;
-
- ymm7 = _mm_load_pd(rmat);
- ymm8 = _mm_mul_pd(ymm3,ymm7);
-
- ymm7 = _mm_load_pd(rmat+2);
- ymm9 = _mm_mul_pd(ymm4,ymm7);
-
- /* calculate (b5*d1 + b6*d2 | b7*d3 + b8*d4) */
- ymm7 = _mm_hadd_pd(ymm8,ymm9); /* needed (4) */
-
- ymm2 = _mm_hadd_pd(ymm5,ymm7);
- ymm9 = _mm_mul_pd(xmm0,ymm2);
- _mm_store_pd(clv, ymm9);
-
- rmat += states;
-
- ymm6 = _mm_load_pd(rmat);
- ymm7 = _mm_mul_pd(ymm3,ymm6);
-
- ymm6 = _mm_load_pd(rmat+2);
- ymm8 = _mm_mul_pd(ymm4,ymm6);
-
- /* calculate (b9*d1 + b10*d2 | b11*d3 + b12*d4) */
- ymm5 = _mm_hadd_pd(ymm7,ymm8); /* needed (2) */
-
- rmat += states;
-
- ymm7 = _mm_load_pd(rmat);
- ymm8 = _mm_mul_pd(ymm3,ymm7);
-
- ymm7 = _mm_load_pd(rmat+2);
- ymm9 = _mm_mul_pd(ymm4,ymm7);
-
- /* calculate (b13*d1 + b14*d2 | b15*d3 + b16*d4) */
- ymm7 = _mm_hadd_pd(ymm8,ymm9); /* needed (4) */
-
- ymm2 = _mm_hadd_pd(ymm5,ymm7);
- ymm9 = _mm_mul_pd(xmm1,ymm2);
- _mm_store_pd(clv+2, ymm9);
-
- clv += states;
- rclv += states;
- }
-}
-
-static void inner_inner(int lchild, int rchild, double * clv)
-{
- int sites = com.npatt;
- int states = 4;
- int h,j,k;
- double x,y;
-
- pmat_jc69(lmatrix, nodes[lchild].branch);
- pmat_jc69(rmatrix, nodes[rchild].branch);
-
- double * lclv = nodes[lchild].conP;
- double * rclv = nodes[rchild].conP;
-
- __m128d xmm0,xmm1,xmm2,xmm3,xmm4;
- __m128d xmm5,xmm6,xmm7,xmm8,xmm9;
-
- /*
-
- +-----+-----+-----+-----+ +-----+-----+-----+-----+
- | a1 | a2 | a3 | a4 | | b1 | b2 | b3 | b4 |
- +-----+-----+-----+-----+ +-----+-----+-----+-----+
- | a5 | a6 | a7 | a8 | | b5 | b6 | b7 | b8 |
- +-----+-----+-----+-----+ +-----+-----+-----+-----+
- | a9 | a10 | a11 | a12 | | b9 | b10 | b11 | b12 |
- +-----+-----+-----+-----+ +-----+-----+-----+-----+
- | a13 | a14 | a15 | a16 | | b13 | b14 | b15 | b16 |
- +-----+-----+-----+-----+ +-----+-----+-----+-----+
-
- x x
-
- +----+----+----+----+ +----+----+----+----+
- | c1 | c2 | c3 | c4 | | d1 | d2 | d3 | d4 |
- +----+----+----+----+ +----+----+----+----+
-
- */
-
- for (h = 0; h < sites; ++h)
- {
- double * lmat = lmatrix;
- double * rmat = rmatrix;
-
-
- /*
-
- Vectorization of the following code snippet
-
- for (j = 0; j < states; ++j)
- {
- x = 0;
- y = 0;
- for (k = 0; k < states; ++k)
- {
- x += lmat[k] * lclv[k];
- y += rmat[k] * rclv[k];
- }
-
- lmat += states;
- rmat += states;
-
- clv[j] = x*y;
- }
-
- */
-
- /* compute left */
- xmm0 = _mm_load_pd(lclv); /* needed */
- xmm2 = _mm_load_pd(lmat);
- xmm3 = _mm_mul_pd(xmm0,xmm2);
-
- xmm1 = _mm_load_pd(lclv+2); /* needed */
- xmm2 = _mm_load_pd(lmat+2);
- xmm4 = _mm_mul_pd(xmm1,xmm2);
-
- /* calculate (a1*c1 + a2*c2 | a3*c3 + a4*c4) */
- xmm2 = _mm_hadd_pd(xmm3,xmm4); /* needed (1) */
-
- /* compute right */
- xmm3 = _mm_load_pd(rclv); /* needed */
- xmm4 = _mm_load_pd(rmat);
- xmm6 = _mm_mul_pd(xmm3,xmm4);
-
- xmm4 = _mm_load_pd(rclv+2); /* needed */
- xmm7 = _mm_load_pd(rmat+2);
- xmm8 = _mm_mul_pd(xmm4,xmm7);
-
- /* calculate (b1*d1 + b2*d2 | b3*d3 + b4*d4) */
- xmm5 = _mm_hadd_pd(xmm6,xmm8); /* needed (2) */
-
- rmat += states;
- lmat += states;
-
- /* compute left */
- xmm6 = _mm_load_pd(lmat);
- xmm7 = _mm_mul_pd(xmm0,xmm6);
-
- xmm6 = _mm_load_pd(lmat+2);
- xmm8 = _mm_mul_pd(xmm1,xmm6);
-
- /* calculate (a5*c1 + a6*c2 | a7*c3 + a8*c4) */
- xmm6 = _mm_hadd_pd(xmm7,xmm8); /* needed (3) */
-
- /* compute right */
- xmm7 = _mm_load_pd(rmat);
- xmm8 = _mm_mul_pd(xmm3,xmm7);
-
- xmm7 = _mm_load_pd(rmat+2);
- xmm9 = _mm_mul_pd(xmm4,xmm7);
-
- /* calculate (b5*d1 + b6*d2 | b7*d3 + b8*d4) */
- xmm7 = _mm_hadd_pd(xmm8,xmm9); /* needed (4) */
-
- xmm8 = _mm_hadd_pd(xmm2,xmm6);
- xmm2 = _mm_hadd_pd(xmm5,xmm7);
- xmm9 = _mm_mul_pd(xmm8,xmm2);
- _mm_store_pd(clv, xmm9);
-
- rmat += states;
- lmat += states;
-
- /* compute left */
- xmm5 = _mm_load_pd(lmat);
- xmm6 = _mm_mul_pd(xmm0,xmm5);
-
- xmm5 = _mm_load_pd(lmat+2);
- xmm7 = _mm_mul_pd(xmm1,xmm5);
-
- /* calculate (a9*c1 + a10*c2 | a11*c3 + a12*c4) */
- xmm2 = _mm_hadd_pd(xmm6,xmm7); /* needed (1) */
-
- /* compute right */
- xmm6 = _mm_load_pd(rmat);
- xmm7 = _mm_mul_pd(xmm3,xmm6);
-
- xmm6 = _mm_load_pd(rmat+2);
- xmm8 = _mm_mul_pd(xmm4,xmm6);
-
- /* calculate (b9*d1 + b10*d2 | b11*d3 + b12*d4) */
- xmm5 = _mm_hadd_pd(xmm7,xmm8); /* needed (2) */
-
- rmat += states;
- lmat += states;
-
- /* compute left */
- xmm6 = _mm_load_pd(lmat);
- xmm7 = _mm_mul_pd(xmm0,xmm6);
-
- xmm6 = _mm_load_pd(lmat+2);
- xmm8 = _mm_mul_pd(xmm1,xmm6);
-
- /* calculate (a13*c1 + a14*c2 | a15*c3 + a16*c4) */
- xmm6 = _mm_hadd_pd(xmm7,xmm8); /* needed (3) */
-
- /* compute right */
- xmm7 = _mm_load_pd(rmat);
- xmm8 = _mm_mul_pd(xmm3,xmm7);
-
- xmm7 = _mm_load_pd(rmat+2);
- xmm9 = _mm_mul_pd(xmm4,xmm7);
-
- /* calculate (b13*d1 + b14*d2 | b15*d3 + b16*d4) */
- xmm7 = _mm_hadd_pd(xmm8,xmm9); /* needed (4) */
-
- xmm8 = _mm_hadd_pd(xmm2,xmm6);
- xmm2 = _mm_hadd_pd(xmm5,xmm7);
- xmm9 = _mm_mul_pd(xmm8,xmm2);
- _mm_store_pd(clv+2, xmm9);
-
- clv += states;
- lclv += states;
- rclv += states;
- }
-}
-
-int ConditonalPNode (int inode)
-{
- int states = 4;
- int sites = com.npatt;
- int i;
- int child, lchild, rchild;
-
- /* recursive call the ConditionalPNode on all children of the current node */
- for (i = 0; i < nodes[inode].nson; ++i)
- {
- /* get id of i-th child of current node */
- child = nodes[inode].sons[i];
-
- /* if child is an inner node then recursively compute its conditional
- probabilities vector */
- if (nodes[child].nson > 0 && (!mcmc.saveconP || !com.oldconP[child]))
- ConditonalPNode(nodes[inode].sons[i]);
- }
-
- /* initialize CLV entries of current node to 1 */
- int n = sites * states;
- for (i = 0; i < n; ++i)
- nodes[inode].conP[i] = 1;
-
- if (nodes[inode].nson == 0) return (0);
-
- lchild = nodes[inode].sons[0];
- rchild = nodes[inode].sons[1];
-
- int ltip = (nodes[lchild].nson == 0);
- int rtip = (nodes[rchild].nson == 0);
-
- if (ltip && rtip)
- {
- if (com.cleandata)
- tip_tip(lchild,rchild,nodes[inode].conP);
- else
- {
- //assert(0);
- tip_tip_amb(lchild,rchild,nodes[inode].conP);
- }
- }
- else if (ltip)
- {
- if (com.cleandata)
- tip_inner(lchild,rchild,nodes[inode].conP);
- else
- {
- tip_inner_amb(lchild,rchild,nodes[inode].conP);
- //assert(0);
- }
- }
- else if (rtip)
- {
- if (com.cleandata)
- tip_inner(rchild,lchild,nodes[inode].conP);
- else
- {
- //assert(0);
- tip_inner_amb(rchild,lchild,nodes[inode].conP);
- }
- }
-
- else
- inner_inner(lchild,rchild,nodes[inode].conP);
-
- return (0);
-}
=====================================
src/tools.c
=====================================
@@ -1756,7 +1756,8 @@ double rndLaplace(void) {
double rndt2(void)
{
- /* Standard Student's t_2 variate, with d.f. = 2. t2 has mean 0 and variance infinity. */
+ /* Standard Student's t_2 variate, with d.f. = 2. t2 has mean 0 and variance infinity.
+ */
double u, t2;
u = 2 * rndu() - 1;
@@ -1772,9 +1773,9 @@ double rndt4(void)
This has variance 1, and is the standard t4 variate divided by sqrt(2).
The standard t4 variate has variance 2.
*/
- double u, v, w, c2, r2, t4, sqrt2 = 0.707106781;
+ double u, v, w, c2, r2, t4, sqrt2 = 0.7071067811865475244;
- for (; ; ) {
+ for ( ; ; ) {
u = 2 * rndu() - 1;
v = 2 * rndu() - 1;
w = u*u + v*v;
@@ -4525,7 +4526,7 @@ double Binomial(double n, int k, double *scale)
/* calculates (n choose k), where n is any real number, and k is integer.
If(*scale!=0) the result should be c+exp(*scale).
*/
- double c = 1, i, large = 1e99;
+ double c = 1, i, large = 1e200;
*scale = 0;
if ((int)k != k)
@@ -5676,12 +5677,12 @@ int DescriptiveStatistics(FILE *fout, char infile[], int nbin, int propternary,
for (j = 0; j < p; j++)
fscanf(fin, "%lf", &data[j*n + i]);
fclose(fin);
-
+/*
if(p>1) {
printf("\nGreat offer! I can smooth a few 2-D densities for free. How many do you want? ");
scanf("%d", &nf2d);
}
-
+*/
if (nf2d > MAXNF2D) error2("I don't want to do that many!");
for (i = 0; i < nf2d; i++) {
printf("pair #%d (e.g., type 1 3 to use variables #1 and #3)? ", i + 1);
@@ -5735,6 +5736,8 @@ int DescriptiveStatistics(FILE *fout, char infile[], int nbin, int propternary,
fprintf(fout, "\n\n");
fflush(fout);
+return(0);
+
fprintf(fout, "\nCorrelation matrix");
for (j = SkipColumns; j < p; j++) {
fprintf(fout, "\n%-8s ", varstr[j]);
=====================================
src/treesub.c
=====================================
@@ -936,7 +936,7 @@ int ReadSeq(FILE *fout, FILE *fseq, int cleandata, int locus)
/***** GIBBON, 2017.10.8 **/
- //return(0);
+ /* return(0); */
if (!com.readpattern) {
PatternWeight();
@@ -2907,9 +2907,9 @@ int GetTreeFileType(FILE *ftree, int *ntree, int *pauptree, int shortform)
int PopPaupTree(FILE *ftree);
int PopPaupTree(FILE *ftree)
{
- /* This reads out the string in front of the tree in the nexus format,
- typically "tree PAUP_1 = [&U]" with "[&U]" optional
- */
+/* This reads out the string in front of the tree in the nexus format,
+ typically "tree PAUP_1 = [&U]" with "[&U]" optional
+*/
int ch;
for (; ;) {
@@ -2927,12 +2927,12 @@ int PopPaupTree(FILE *ftree)
void DownTreeCladeLabel(int inode, int label)
{
- /* This goes down the tree to change $ labels (stored in nodes[].label2) into
- # labels (stored in nodes[].label). To deal with nested clade labels,
- branches within a clade are labeled by negative numbers initially, and
- converted to positive labels at the end of the algorithm.
- nodes[].label and nodes[].label2 are initialized to -1 before this routine is called.
- */
+/* This goes down the tree to change $ labels (stored in nodes[].label2) into
+ # labels (stored in nodes[].label). To deal with nested clade labels,
+ branches within a clade are labeled by negative numbers initially, and
+ converted to positive labels at the end of the algorithm.
+ nodes[].label and nodes[].label2 are initialized to -1 before this routine is called.
+*/
int i;
if (nodes[inode].label2 != -1)
@@ -3130,7 +3130,7 @@ int ReadTreeN(FILE *ftree, int *haslength, int copyname, int popline)
else { /* label or annotation (node can be tip or internal node) */
if (strchr(quote[0], ch)) {
k = ReadUntil(ftree, line, quote[1], lline);
- ch = fgetc(ftree); /* pop off '" */
+ ch = fgetc(ftree); /* pop off ' " */
}
else {
ungetc(ch, ftree);
@@ -3140,8 +3140,10 @@ int ReadTreeN(FILE *ftree, int *haslength, int copyname, int popline)
if (nodes[inode].annotation == NULL) error2("oom annotation");
strcpy(nodes[inode].annotation, line);
/* This line is used in MCcoal. ProcessNodeAnnotation is used to process node annotation elsewhere. */
- if (line[0] == '#')
+ if (line[0] == '#') /* branch labels used in baseml and codeml */
sscanf(line + 1, "%lf", &nodes[inode].label);
+ else if (line[0] == '$') /* clade labels used to label branches in baseml and codeml */
+ sscanf(line + 1, "%lf", &nodes[inode].label2);
}
} /* for( ; ; ) */
@@ -3195,9 +3197,15 @@ int OutSubTreeN(FILE *fout, int inode, int spnames, int printopt)
}
if ((printopt & PrNodeNum) && nodes[inode].nson)
fprintf(fout, " %d ", inode + 1);
+
+#if (defined BPP)
+ if ((printopt & PrNodeStr) && inode < com.ns && nodes[inode].label > 0)
+ fprintf(fout, " [&theta=%.6f]", nodes[inode].label);
+#else
if ((printopt & PrLabel) && nodes[inode].label > 0)
fprintf(fout, " #%.6f", nodes[inode].label);
/* fprintf(fout, " [&label=%.6f]", nodes[inode].label); */
+#endif
if ((printopt & PrAge) && nodes[inode].age)
fprintf(fout, " @%.6f", nodes[inode].age);
@@ -8544,10 +8552,9 @@ int ProcessNodeAnnotation(int *haslabel)
stree.nodes[i].fossil = j;
pch2 = strchr(pch, braces[0]);
if (pch2 == NULL) pch2 = strchr(pch, braces[1]);
- if (pch2)
- pch = pch2 + 1;
- else
- error2("did not find left brace");
+ if (pch2 == NULL)
+ error2("did not find left brace");
+ pch = pch2 + 1;
switch (j) {
case (LOWER_F):
@@ -8693,7 +8700,8 @@ int ProcessNodeAnnotation(int *haslabel)
for (i = 0, com.nbtype = 0; i < tree.nnode; i++) {
if (i == tree.root) continue;
j = (int)nodes[i].label;
- if (j + 1 > com.nbtype) com.nbtype = j + 1;
+ if (j + 1 > com.nbtype)
+ com.nbtype = j + 1;
if (j<0 || j>tree.nbranch - 1)
error2("branch label in the tree (note labels start from 0 and are consecutive)");
}
@@ -8953,13 +8961,17 @@ void copySptree(void)
void printStree(void)
{
- int i, j, k;
+ int i, j, k, maxlname=10;
+
+ for (i = 0; i < stree.nspecies; i++)
+ if ((k = (int)strlen(stree.nodes[i].name)) > maxlname)
+ maxlname = k;
printf("\n************\nSpecies tree\nns = %d nnode = %d", stree.nspecies, stree.nnode);
- printf("\n%7s%7s %-16s %8s %8s%16s\n", "father", "node", "name", "time", "sons", "fossil");
+ printf("\n%7s%7s %-*s %9s %8s%16s\n", "father", "node", maxlname+2, "name", "time", "sons", "fossil");
for (i = 0; i < stree.nnode; i++) {
- printf("%7d%7d %-20s %7.5f",
- stree.nodes[i].father + 1, i + 1, stree.nodes[i].name, stree.nodes[i].age);
+ printf("%7d%7d %-*s %8.5f",
+ stree.nodes[i].father + 1, i + 1, maxlname+6, stree.nodes[i].name, stree.nodes[i].age);
if (stree.nodes[i].nson)
printf(" (%2d %2d)", stree.nodes[i].sons[0] + 1, stree.nodes[i].sons[1] + 1);
View it on GitLab: https://salsa.debian.org/med-team/paml/compare/99905a9940d96e9f7193661cc705ffb23d41ac77...3eb602618251ca7de5fea6e246c37ad51357515a
--
View it on GitLab: https://salsa.debian.org/med-team/paml/compare/99905a9940d96e9f7193661cc705ffb23d41ac77...3eb602618251ca7de5fea6e246c37ad51357515a
You're receiving this email because of your account on salsa.debian.org.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://alioth-lists.debian.net/pipermail/debian-med-commit/attachments/20191110/a7ff10ce/attachment-0001.html>
More information about the debian-med-commit
mailing list