[med-svn] [bppsuite] 01/03: New upstream version 2.3.2
Julien Dutheil
jdutheil-guest at moszumanska.debian.org
Wed Feb 7 11:05:30 UTC 2018
This is an automated email from the git hooks/post-receive script.
jdutheil-guest pushed a commit to branch master
in repository bppsuite.
commit f51eb03ed045b68ef9f112a0920aeba6b482b452
Author: Julien Y. Dutheil <dutheil at evolbio.mpg.de>
Date: Wed Feb 7 11:16:04 2018 +0100
New upstream version 2.3.2
---
ChangeLog | 3 +
Examples/PopStats/PopStats.bpp | 1 +
Examples/PopStats/alignment.phy | 11 +
README.md | 82 +++++++
bppSuite/CMakeLists.txt | 2 +-
bppSuite/bppDist.cpp | 74 +++----
bppSuite/bppML.cpp | 49 ++++-
bppSuite/bppPopStats.cpp | 253 ++++++++++++++++++++--
bppsuite.spec | 2 +-
buildBin.sh | 2 +-
doc/bppsuite.texi | 462 +++++++++++++++++++---------------------
11 files changed, 636 insertions(+), 305 deletions(-)
diff --git a/ChangeLog b/ChangeLog
index 69e7267..9245387 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,6 @@
+06/11/17 Julien Dutheil
+* Added estimation of kappa to bppPopStats + bugs fixed
+
06/06/17 -*- Version 2.3.1 -*-
10/05/17 -*- Version 2.3.0 -*-
diff --git a/Examples/PopStats/PopStats.bpp b/Examples/PopStats/PopStats.bpp
index 9a08535..a97ceee 100644
--- a/Examples/PopStats/PopStats.bpp
+++ b/Examples/PopStats/PopStats.bpp
@@ -1,5 +1,6 @@
input.sequence.file.ingroup = alignment.phy
input.sequence.format.ingroup = Phylip(order=interleaved, type=classic)
+estimate.kappa = yes
pop.stats= \
SiteFrequencies,\
Watterson75,\
diff --git a/Examples/PopStats/alignment.phy b/Examples/PopStats/alignment.phy
new file mode 100644
index 0000000..b129d3e
--- /dev/null
+++ b/Examples/PopStats/alignment.phy
@@ -0,0 +1,11 @@
+ 10 40
+3 TCTGATAGTGACGTCACCGGCGTTTATTACCCTACGTTCA
+4 TATGGCAGTGACGTCATCGGCGTCAATTCCCCGACATTCA
+9 TATGGCAGTGACGTCATCGGCGTCAATTCCCCGACATTCA
+6 TATGGCAGTGACGTCATCGGCGTCAATTCCCCGACATTCA
+10 TATGGCAGTGACGTCATCGGCGTCAATTCCCCGACCTTCA
+1 TATGGCAGTGACGTCATCGGCGTCAATTCCCCGACCTTCA
+5 TATGGCAGTGACGTCATCGGCGTCAATTCCCCGACCTTCA
+8 TATGGCAGTGACGTCATCGGCGTCAATTCCCCGACCTTCA
+2 GATGGCAGTGACGTCTTCGGCGTCAATTCCCCGACATTGA
+7 GATGGCAGTGACGTCTTCGGCGTCAATTCCCCGACATTCA
diff --git a/README.md b/README.md
new file mode 100644
index 0000000..520e0ab
--- /dev/null
+++ b/README.md
@@ -0,0 +1,82 @@
+# BppSuite presentation
+
+BppSuite is a suite of ready-to-use programs for phylogenetic and sequence analysis.
+
+## Installation
+
+### Standalone executables
+
+Standalone executables are available for [linux64](http://biopp.univ-montp2.fr/repos/exe/lin64/).
+
+[//]: [win32](http://biopp.univ-montp2.fr/repos/exe/win32/), [win64](http://biopp.univ-montp2.fr/repos/exe/win64/) and [Mac](http://biopp.univ-montp2.fr/repos/exe/mac/)
+
+### From source files
+
+#### Get the sources
+
+This is done with <tt>git</tt>, for example in directory <tt>$bpp_dir</tt>:
+
+<h5>
+<pre>
+cd $bpp_dir
+git clone https://github.com/BioPP/bppsuite
+</pre>
+</h5>
+
+#### Compiling
+
+Bio++ libraries need to be installed beforehand, for example in <tt>$bpp_dir</tt>. The needed libraries are [bpp-core](https://github.com/BioPP/bpp-core), [bpp-seq](https://github.com/BioPP/bpp-seq), [bpp-phyl](https://github.com/BioPP/bpp-phyl), [bpp-popgen](https://github.com/BioPP/bpp-popgen).
+
+After, you proceed:
+
+<h5>
+<pre>
+cd bppsuite
+cmake -DCMAKE_INSTALL_PREFIX=$bpp_dir ./ # prepare compilation
+make # compile
+make install # move files to the installation directory (this will create a $bpp_dir/bin/ directory)
+</pre>
+</h5>
+
+That's it ! The executables are now installed in <tt>$bpp_dir/bin</tt>.
+
+Without the option <tt>-DCMAKE_INSTALL_PREFIX=$bpp_dir</tt>, the standard <tt>/usr/local</tt> directory will be used, and the executables installed in <tt>/usr/local/bin</tt>, a location which requires superuser access rights.
+
+## Usage
+
+Bppsuite executables should know where the dynamic libraries are. A way to check it is the command:
+
+<h5>
+<pre>
+ldd $bpp_dir$/bin/bppml
+</pre>
+</h5>
+
+To configure this, set in the shell environment variable :
+
+<h5>
+<pre>
+export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$bpp_dir
+</pre>
+</h5>
+
+(and source the configuration file or relog).
+
+## Documentation
+
+You can also generate the pdf documentation by typing :
+
+<h5>
+<pre>
+make pdf
+</pre>
+</h5>
+
+### Examples
+
+Many examples are available in the subdirectory of <tt>Examples</tt>.
+
+### Documentation
+
+Documentation can be found at http://github.com/bppsuite/releases in pdf or html.
+
\ No newline at end of file
diff --git a/bppSuite/CMakeLists.txt b/bppSuite/CMakeLists.txt
index ca6cb7d..62be041 100644
--- a/bppSuite/CMakeLists.txt
+++ b/bppSuite/CMakeLists.txt
@@ -38,7 +38,7 @@ foreach (target ${bppsuite-targets})
# Link (static or shared)
if (BUILD_STATIC)
target_link_libraries (${target} ${BPP_LIBS_STATIC})
- set_target_properties (${target} LINK_SEARCH_END_STATIC TRUE)
+ set_target_properties (${target} PROPERTIES LINK_SEARCH_END_STATIC TRUE)
else (BUILD_STATIC)
target_link_libraries (${target} ${BPP_LIBS_SHARED})
endif (BUILD_STATIC)
diff --git a/bppSuite/bppDist.cpp b/bppSuite/bppDist.cpp
index 079dd49..22126a0 100644
--- a/bppSuite/bppDist.cpp
+++ b/bppSuite/bppDist.cpp
@@ -129,7 +129,7 @@ int main(int args, char ** argv)
TransitionModel* model = PhylogeneticsApplicationTools::getTransitionModel(alphabet, gCode.get(), sites, bppdist.getParams());
- DiscreteDistribution* rDist = 0;
+ DiscreteDistribution* rDist = 0;
if (model->getNumberOfStates() > model->getAlphabet()->getSize())
{
//Markov-modulated Markov model!
@@ -137,7 +137,7 @@ int main(int args, char ** argv)
}
else
{
- rDist = PhylogeneticsApplicationTools::getRateDistribution(bppdist.getParams());
+ rDist = PhylogeneticsApplicationTools::getRateDistribution(bppdist.getParams());
}
DistanceEstimation distEstimation(model, rDist, sites, 1, false);
@@ -177,33 +177,33 @@ int main(int args, char ** argv)
else if (type == "iterations") type = OptimizationTools::DISTANCEMETHOD_ITERATIONS;
else throw Exception("Unknown parameter estimation procedure '" + type + "'.");
- unsigned int optVerbose = ApplicationTools::getParameter<unsigned int>("optimization.verbose", bppdist.getParams(), 2);
-
- string mhPath = ApplicationTools::getAFilePath("optimization.message_handler", bppdist.getParams(), false, false);
- OutputStream* messenger =
- (mhPath == "none") ? 0 :
- (mhPath == "std") ? ApplicationTools::message :
- new StlOutputStream(new ofstream(mhPath.c_str(), ios::out));
- ApplicationTools::displayResult("Message handler", mhPath);
+ unsigned int optVerbose = ApplicationTools::getParameter<unsigned int>("optimization.verbose", bppdist.getParams(), 2);
+
+ string mhPath = ApplicationTools::getAFilePath("optimization.message_handler", bppdist.getParams(), false, false);
+ OutputStream* messenger =
+ (mhPath == "none") ? 0 :
+ (mhPath == "std") ? ApplicationTools::message :
+ new StlOutputStream(new ofstream(mhPath.c_str(), ios::out));
+ ApplicationTools::displayResult("Message handler", mhPath);
- string prPath = ApplicationTools::getAFilePath("optimization.profiler", bppdist.getParams(), false, false);
- OutputStream* profiler =
- (prPath == "none") ? 0 :
- (prPath == "std") ? ApplicationTools::message :
- new StlOutputStream(new ofstream(prPath.c_str(), ios::out));
- if(profiler) profiler->setPrecision(20);
- ApplicationTools::displayResult("Profiler", prPath);
+ string prPath = ApplicationTools::getAFilePath("optimization.profiler", bppdist.getParams(), false, false);
+ OutputStream* profiler =
+ (prPath == "none") ? 0 :
+ (prPath == "std") ? ApplicationTools::message :
+ new StlOutputStream(new ofstream(prPath.c_str(), ios::out));
+ if(profiler) profiler->setPrecision(20);
+ ApplicationTools::displayResult("Profiler", prPath);
- // Should I ignore some parameters?
+ // Should I ignore some parameters?
ParameterList allParameters = model->getParameters();
allParameters.addParameters(rDist->getParameters());
- ParameterList parametersToIgnore;
+ ParameterList parametersToIgnore;
string paramListDesc = ApplicationTools::getStringParameter("optimization.ignore_parameter", bppdist.getParams(), "", "", true, false);
- bool ignoreBrLen = false;
+ bool ignoreBrLen = false;
StringTokenizer st(paramListDesc, ",");
- while (st.hasMoreToken())
+ while (st.hasMoreToken())
{
- try
+ try
{
string param = st.nextToken();
if (param == "BrLen")
@@ -217,19 +217,19 @@ int main(int args, char ** argv)
}
else ApplicationTools::displayWarning("Parameter '" + param + "' not found.");
}
- }
+ }
catch (ParameterNotFoundException& pnfe)
{
- ApplicationTools::displayError("Parameter '" + pnfe.getParameter() + "' not found, and so can't be ignored!");
- }
- }
-
- unsigned int nbEvalMax = ApplicationTools::getParameter<unsigned int>("optimization.max_number_f_eval", bppdist.getParams(), 1000000);
- ApplicationTools::displayResult("Max # ML evaluations", TextTools::toString(nbEvalMax));
-
- double tolerance = ApplicationTools::getDoubleParameter("optimization.tolerance", bppdist.getParams(), .000001);
- ApplicationTools::displayResult("Tolerance", TextTools::toString(tolerance));
-
+ ApplicationTools::displayError("Parameter '" + pnfe.getParameter() + "' not found, and so can't be ignored!");
+ }
+ }
+
+ unsigned int nbEvalMax = ApplicationTools::getParameter<unsigned int>("optimization.max_number_f_eval", bppdist.getParams(), 1000000);
+ ApplicationTools::displayResult("Max # ML evaluations", TextTools::toString(nbEvalMax));
+
+ double tolerance = ApplicationTools::getDoubleParameter("optimization.tolerance", bppdist.getParams(), .000001);
+ ApplicationTools::displayResult("Tolerance", TextTools::toString(tolerance));
+
//Here it is:
ofstream warn("warnings", ios::out);
ApplicationTools::warning = new StlOutputStreamWrapper(&warn);
@@ -277,18 +277,18 @@ int main(int args, char ** argv)
ParameterList parameters = model->getParameters();
for (unsigned int i = 0; i < parameters.size(); i++)
{
- ApplicationTools::displayResult(parameters[i].getName(), TextTools::toString(parameters[i].getValue()));
+ ApplicationTools::displayResult(parameters[i].getName(), TextTools::toString(parameters[i].getValue()));
}
parameters = rDist->getParameters();
for (unsigned int i = 0; i < parameters.size(); i++)
{
- ApplicationTools::displayResult(parameters[i].getName(), TextTools::toString(parameters[i].getValue()));
+ ApplicationTools::displayResult(parameters[i].getName(), TextTools::toString(parameters[i].getValue()));
}
// Write parameters to file:
- string parametersFile = ApplicationTools::getAFilePath("output.estimates", bppdist.getParams(), false, false);
+ string parametersFile = ApplicationTools::getAFilePath("output.estimates", bppdist.getParams(), false, false);
if (parametersFile != "none")
{
- ofstream out(parametersFile.c_str(), ios::out);
+ ofstream out(parametersFile.c_str(), ios::out);
parameters = model->getParameters();
for (unsigned int i = 0; i < parameters.size(); i++)
{
diff --git a/bppSuite/bppML.cpp b/bppSuite/bppML.cpp
index 7d4a1fd..e43f571 100644
--- a/bppSuite/bppML.cpp
+++ b/bppSuite/bppML.cpp
@@ -130,6 +130,9 @@ int main(int args, char** argv)
gCode.reset(SequenceApplicationTools::getGeneticCode(codonAlphabet->getNucleicAlphabet(), codeDesc));
}
+ //////////////////////////////////////////////
+ // DATA
+
VectorSiteContainer* allSites = SequenceApplicationTools::getSiteContainer(alphabet, bppml.getParams());
VectorSiteContainer* sites = SequenceApplicationTools::getSitesToAnalyse(*allSites, bppml.getParams(), "", true, false);
@@ -138,6 +141,10 @@ int main(int args, char** argv)
ApplicationTools::displayResult("Number of sequences", TextTools::toString(sites->getNumberOfSequences()));
ApplicationTools::displayResult("Number of sites", TextTools::toString(sites->getNumberOfSites()));
+
+ /////////////////////////////////////////
+ // TREE
+
// Get the initial tree
Tree* tree = 0;
string initTreeOpt = ApplicationTools::getStringParameter("init.tree", bppml.getParams(), "user", "", false, 1);
@@ -159,16 +166,6 @@ int main(int args, char** argv)
// but allow to check file existence before running optimization!
PhylogeneticsApplicationTools::writeTree(*tree, bppml.getParams());
- bool computeLikelihood = ApplicationTools::getBooleanParameter("compute.likelihood", bppml.getParams(), true, "", false, 1);
- if (!computeLikelihood)
- {
- delete alphabet;
- delete sites;
- delete tree;
- cout << "BppML's done. Bye." << endl;
- return 0;
- }
-
// Setting branch lengths?
string initBrLenMethod = ApplicationTools::getStringParameter("init.brlen.method", bppml.getParams(), "Input", "", true, 1);
string cmdName;
@@ -238,6 +235,24 @@ int main(int args, char** argv)
exit(0);
}
+
+ /////////////////////////
+ // MODEL & LIKELIHOOD
+
+ // Check if likelihood
+
+ bool computeLikelihood = ApplicationTools::getBooleanParameter("compute.likelihood", bppml.getParams(), true, "", false, 1);
+ if (!computeLikelihood)
+ {
+ delete alphabet;
+ delete sites;
+ delete tree;
+ cout << "BppML's done. Bye." << endl;
+ return 0;
+ }
+
+
+
DiscreteRatesAcrossSitesTreeLikelihood* tl;
string nhOpt = ApplicationTools::getStringParameter("nonhomogeneous", bppml.getParams(), "no", "", true, 1);
ApplicationTools::displayResult("Heterogeneous model", nhOpt);
@@ -250,6 +265,9 @@ int main(int args, char** argv)
SubstitutionModelSet* modelSet = 0;
DiscreteDistribution* rDist = 0;
+ ////////////
+ // If optimize topology
+
if (optimizeTopo || nbBS > 0)
{
if (nhOpt != "no")
@@ -270,6 +288,12 @@ int main(int args, char** argv)
else
throw Exception("Topology estimation with Mixed model not supported yet, sorry :(");
}
+
+ //////////////////////
+ // If not topology optimization
+
+
+ ///// homogeneous modeling
else if (nhOpt == "no")
{
model = PhylogeneticsApplicationTools::getTransitionModel(alphabet, gCode.get(), sites, bppml.getParams());
@@ -312,6 +336,9 @@ int main(int args, char** argv)
}
else throw Exception("Unknown recursion option: " + recursion);
}
+
+
+ ///// one per branch modeling
else if (nhOpt == "one_per_branch")
{
model = PhylogeneticsApplicationTools::getTransitionModel(alphabet, gCode.get(), sites, bppml.getParams());
@@ -381,6 +408,8 @@ int main(int args, char** argv)
}
else throw Exception("Unknown recursion option: " + recursion);
}
+
+ /////// hand made modeling
else if (nhOpt == "general")
{
modelSet = PhylogeneticsApplicationTools::getSubstitutionModelSet(alphabet, gCode.get(), sites, bppml.getParams());
diff --git a/bppSuite/bppPopStats.cpp b/bppSuite/bppPopStats.cpp
index b0e2589..237f871 100644
--- a/bppSuite/bppPopStats.cpp
+++ b/bppSuite/bppPopStats.cpp
@@ -56,8 +56,20 @@ using namespace std;
#include <Bpp/Seq/SiteTools.h>
#include <Bpp/Seq/CodonSiteTools.h>
#include <Bpp/Seq/Alphabet/Alphabet.h>
+#include <Bpp/Seq/Alphabet/AlphabetTools.h>
#include <Bpp/Seq/App/SequenceApplicationTools.h>
+// From bpp-phyl:
+#include <Bpp/Phyl/Tree.h>
+#include <Bpp/Phyl/Model/Nucleotide/K80.h>
+#include <Bpp/Phyl/Model/Codon/YN98.h>
+#include <Bpp/Phyl/Model/RateDistribution/ConstantRateDistribution.h>
+#include <Bpp/Phyl/Distance/DistanceEstimation.h>
+#include <Bpp/Phyl/Distance/BioNJ.h>
+#include <Bpp/Phyl/Likelihood/DRHomogeneousTreeLikelihood.h>
+#include <Bpp/Phyl/Likelihood/MarginalAncestralStateReconstruction.h>
+#include <Bpp/Phyl/App/PhylogeneticsApplicationTools.h>
+
// From bpp-popgen
#include <Bpp/PopGen/PolymorphismSequenceContainer.h>
#include <Bpp/PopGen/PolymorphismSequenceContainerTools.h>
@@ -182,7 +194,104 @@ int main(int args, char** argv)
}
ApplicationTools::displayResult("Number of sequences in ingroup", pscIn->getNumberOfSequences());
ApplicationTools::displayResult("Number of sequences in outgroup", pscOut.get() ? pscOut->getNumberOfSequences() : 0);
+
+ // Shall we estimate some parameters first?
+
+ bool estimateTsTv = ApplicationTools::getBooleanParameter("estimate.kappa", bpppopstats.getParams(), false, "", false, 1);
+ double kappa = 1;
+ double omega = -1;
+ bool estimateAncestor = ApplicationTools::getBooleanParameter("estimate.ancestor", bpppopstats.getParams(), false, "", false, 1);
+ if (estimateAncestor & ! pscOut)
+ throw Exception("Error: an outgroup sequence is needed for estimating ancestral states.");
+
+ bool fitModel = estimateTsTv || estimateAncestor;
+
+ // Fit a model for later use:
+ unique_ptr<Tree> tree;
+ unique_ptr<SubstitutionModel> model;
+ unique_ptr<DiscreteDistribution> rDist;
+ DRTreeLikelihood* treeLik = nullptr;
+ unique_ptr<Sequence> ancestralSequence;
+ if (fitModel) {
+ // Get the alignment:
+ bool sampleIngroup = ApplicationTools::getBooleanParameter("estimate.sample_ingroup", bpppopstats.getParams(), true);
+ size_t sampleIngroupSize = 0;
+ if (sampleIngroup) {
+ sampleIngroupSize = ApplicationTools::getParameter<size_t>("estimate.sample_ingroup.size", bpppopstats.getParams(), 10);
+ if (sampleIngroupSize > pscIn->getNumberOfSequences()) {
+ ApplicationTools::displayWarning("Sample size higher than number of sequence. No sampling performed.");
+ sampleIngroup = false;
+ }
+ }
+ unique_ptr<AlignedSequenceContainer> aln;
+ if (sampleIngroup) {
+ ApplicationTools::displayResult("Nb of ingroup sequences for model fitting", sampleIngroupSize);
+ aln.reset(new AlignedSequenceContainer(pscIn->getAlphabet()));
+ vector<string> selection(sampleIngroupSize);
+ RandomTools::getSample(pscIn->getSequencesNames(), selection, false);
+ SequenceContainerTools::getSelectedSequences(*pscIn, selection ,*aln);
+ } else {
+ aln.reset(new AlignedSequenceContainer(*pscIn));
+ }
+ if (pscOut) {
+ aln->addSequence(pscOut->getSequence(0)); //As for now, we only consider one sequence as outgroup, the first one.
+ }
+
+ // Get a tree:
+ string treeOpt = ApplicationTools::getStringParameter("input.tree.method", bpppopstats.getParams(), "bionj", "");
+ if (codonAlphabet) {
+ unique_ptr<FrequenciesSet> freqSet(new FixedCodonFrequenciesSet(gCode.get()));
+ model.reset(new YN98(gCode.get(), freqSet.release()));
+ } else {
+ model.reset(new K80(&AlphabetTools::DNA_ALPHABET));
+ } //Note: proteins not supported!
+ rDist.reset(new ConstantRateDistribution());
+ if (treeOpt == "user") {
+ tree.reset(PhylogeneticsApplicationTools::getTree(bpppopstats.getParams()));
+ } else if (treeOpt == "bionj") {
+ ApplicationTools::displayTask("Estimating distance matrix", true);
+ //DistanceEstimation distEstimation(model->clone(), rDist->clone(), aln.get(), 1, false);
+ //distEstimation.computeMatrix();
+ //unique_ptr<DistanceMatrix> matrix(distEstimation.getMatrix());
+ unique_ptr<DistanceMatrix> matrix(SiteContainerTools::computeSimilarityMatrix(*aln, true, SiteContainerTools::SIMILARITY_NOGAP, true));
+
+ ApplicationTools::displayTaskDone();
+ ApplicationTools::displayTask("Computing BioNJ tree", true);
+ BioNJ bionj(false, true);
+ bionj.setDistanceMatrix(*matrix);
+ bionj.computeTree();
+ ApplicationTools::displayTaskDone();
+ tree.reset(bionj.getTree());
+ } else {
+ throw Exception("Invalid input.tree.method. Should be either 'user' or 'bionj'.");
+ }
+
+ // Create a likelihood object:
+ treeLik = new DRHomogeneousTreeLikelihood(*tree, *aln, model.get(), rDist.get());
+ treeLik->initialize();
+ if (isinf(treeLik->getValue()))
+ throw Exception("Error: null likelihood. Possible cause: stop codon or numerical underflow (too many sequences).");
+ // Optimize parameters:
+ treeLik = dynamic_cast<DRTreeLikelihood*>(PhylogeneticsApplicationTools::optimizeParameters(treeLik, treeLik->getParameters(), bpppopstats.getParams(), "", true, true, 2));
+
+ // Get kappa:
+ if (estimateTsTv) {
+ kappa = model->getParameter("kappa").getValue();
+ ApplicationTools::displayResult("Transition / transversions ratio", kappa);
+ }
+ if (estimateAncestor) {
+ MarginalAncestralStateReconstruction asr(treeLik);
+ int outgroupId = tree->getLeafId(pscOut->getSequence(0).getName());
+ ancestralSequence.reset(asr.getAncestralSequenceForNode(tree->getFatherId(outgroupId)));
+ }
+ if (codonAlphabet) {
+ omega = model->getParameter("omega").getValue();
+ }
+ }
+ if (treeLik)
+ delete treeLik; //Not needed anymore.
+
// +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
// Compute statistics
@@ -312,7 +421,6 @@ int main(int args, char** argv)
*cLog << "fuLiFstarSegSit" << (toolCounter[cmdName] > 1 ? TextTools::toString(toolCounter[cmdName]) : "") << " = " << flFstar << endl;
}
}
-
// +-----------+
// | PiN / PiS |
// +-----------+
@@ -331,12 +439,53 @@ int main(int args, char** argv)
ApplicationTools::displayResult("#N:", nbN);
ApplicationTools::displayResult("#S:", nbS);
ApplicationTools::displayResult("PiN / PiS (corrected for #N and #S):", r);
+ if (fitModel) {
+ ApplicationTools::displayResult("Omega (YN98 model):", omega);
+ }
+
if (logFile != "none") {
*cLog << "# PiN and PiS" << endl;
*cLog << "PiN" << (toolCounter[cmdName] > 1 ? TextTools::toString(toolCounter[cmdName]) : "") << " = " << piN << endl;
*cLog << "PiS" << (toolCounter[cmdName] > 1 ? TextTools::toString(toolCounter[cmdName]) : "") << " = " << piS << endl;
*cLog << "NbN" << (toolCounter[cmdName] > 1 ? TextTools::toString(toolCounter[cmdName]) : "") << " = " << nbN << endl;
*cLog << "NbS" << (toolCounter[cmdName] > 1 ? TextTools::toString(toolCounter[cmdName]) : "") << " = " << nbS << endl;
+ if (fitModel) {
+ *cLog << "Omega" << (toolCounter[cmdName] > 1 ? TextTools::toString(toolCounter[cmdName]) : "") << " = " << omega << endl;
+ }
+ }
+ }
+
+ // +---------+
+ // | dN / dS |
+ // +---------+
+ else if (cmdName == "dN_dS")
+ {
+ if (!codonAlphabet) {
+ throw Exception("dN_dS can only be used with a codon alignment. Check the input alphabet!");
+ }
+ //Get consensus sequences:
+ unique_ptr<SiteContainer> alnIn(pscIn->toSiteContainer());
+ unique_ptr<SiteContainer> alnOut(pscOut->toSiteContainer());
+ unique_ptr<Sequence> consensusIn(SiteContainerTools::getConsensus(*alnIn, "consIn", true, false));
+ unique_ptr<Sequence> consensusOut(SiteContainerTools::getConsensus(*alnOut, "consOut", true, false));
+ unique_ptr<AlignedSequenceContainer> alnCons(new AlignedSequenceContainer(codonAlphabet));
+ alnCons->addSequence(*consensusIn);
+ alnCons->addSequence(*consensusOut);
+ unique_ptr<FrequenciesSet> freqSetDiv(new FixedCodonFrequenciesSet(gCode.get()));
+ YN98* modelDiv = new YN98(gCode.get(), freqSetDiv.release());
+ DiscreteDistribution* rDistDiv = new ConstantRateDistribution();
+ DistanceEstimation distEstimation(modelDiv, rDistDiv, alnCons.get(), 0, false);
+ distEstimation.setAdditionalParameters(modelDiv->getIndependentParameters());
+ distEstimation.computeMatrix();
+ unique_ptr<DistanceMatrix> matrix(distEstimation.getMatrix());
+ ApplicationTools::displayResult("Yang and Nielsen's Omega (dN/dS):", modelDiv->getParameter("omega").getValue());
+ ApplicationTools::displayResult("Yang and Nielsen's Kappa:", modelDiv->getParameter("kappa").getValue());
+ ApplicationTools::displayResult("Yang and Nielsen's Distance:", (*matrix)(1,0));
+ if (logFile != "none") {
+ *cLog << "# dN and dS (Yang and Nielsen's 1998 substitution model)" << endl;
+ *cLog << "OmegaDiv" << (toolCounter[cmdName] > 1 ? TextTools::toString(toolCounter[cmdName]) : "") << " = " << modelDiv->getParameter("omega").getValue() << endl;
+ *cLog << "KappaDiv" << (toolCounter[cmdName] > 1 ? TextTools::toString(toolCounter[cmdName]) : "") << " = " << modelDiv->getParameter("kappa").getValue() << endl;
+ *cLog << "DistanceDiv" << (toolCounter[cmdName] > 1 ? TextTools::toString(toolCounter[cmdName]) : "") << " = " << (*matrix)(1,0) << endl;
}
}
@@ -348,7 +497,7 @@ int main(int args, char** argv)
if (!codonAlphabet) {
throw Exception("MacDonald-Kreitman test can only be performed on a codon alignment. Check the input alphabet!");
}
- if (!pscOut.get()) {
+ if (!pscOut) {
throw Exception("MacDonald-Kreitman test requires at least one outgroup sequence.");
}
vector<unsigned int> mktable = SequenceStatistics::mkTable(*pscIn, *pscOut, *gCode);
@@ -375,30 +524,104 @@ int main(int args, char** argv)
if (path == "none") throw Exception("You must specify an ouptut file for CodonSiteStatistics");
ApplicationTools::displayResult("Site statistics output to:", path);
ofstream out(path.c_str(), ios::out);
- out << "Site\tIsComplete\tNbAlleles\tMinorAlleleFrequency\tMajorAlleleFrequency\tMinorAllele\tMajorAllele";
- bool outgroup = (psc->hasOutgroup() && pscOut->getNumberOfSequences() == 1);
+ out << "Site\tMissingDataFrequency\tNbAlleles\tMinorAlleleFrequency\tMajorAlleleFrequency\tMinorAllele\tMajorAllele";
+ out << "\tMeanNumberSynPos\tIsSynPoly\tIs4Degenerated\tPiN\tPiS";
+ bool outgroup = (pscOut && pscOut->getNumberOfSequences() == 1);
if (outgroup) {
out << "\tOutgroupAllele";
}
- out << "\tMeanNumberSynPos\tIsSynPoly\tIs4Degenerated\tPiN\tPiS" << endl;
+ if (estimateAncestor) {
+ out << "\tAncestralAllele";
+ }
+ if (outgroup) {
+ out << "\tMeanNumberSynPosDiv\tdN\tdS";
+ }
+ out << endl;
+
unique_ptr<SiteContainer> sites(pscIn->toSiteContainer());
for (size_t i = 0; i < sites->getNumberOfSites(); ++i) {
const Site& site = sites->getSite(i);
+ map<int, size_t> counts;
+ SymbolListTools::getCounts(site, counts);
+ size_t minFreq = site.size() + 1;
+ size_t maxFreq = 0;
+ int minState = -1;
+ int maxState = -1;
+ size_t nbAlleles = 0;
+ size_t nbMissing = 0;
+ for (map<int, size_t>::iterator it = counts.begin(); it != counts.end(); it++)
+ {
+ if (!alphabet->isUnresolved(it->first)
+ && !alphabet->isGap(it->first)) {
+ nbAlleles++;
+ if (it->second != 0) {
+ if (it->second < minFreq) {
+ minFreq = it->second;
+ minState = it->first;
+ }
+ if (it->second > maxFreq) {
+ maxFreq = it->second;
+ maxState = it->first;
+ }
+ }
+ } else {
+ nbMissing += it->second;
+ }
+ }
+
out << site.getPosition() << "\t";
- out << SiteTools::isComplete(site) << "\t";
- out << SiteTools::getNumberOfDistinctCharacters(site) << "\t";
- out << SiteTools::getMinorAlleleFrequency(site) << "\t";
- out << SiteTools::getMajorAlleleFrequency(site) << "\t";
- out << alphabet->intToChar(SiteTools::getMinorAllele(site)) << "\t";
- out << alphabet->intToChar(SiteTools::getMajorAllele(site)) << "\t";
- if (outgroup) {
- out << pscOut->getSequence(0).getChar(i) << "\t";
+ out << nbMissing << "\t";
+ out << nbAlleles << "\t";
+ out << minFreq << "\t";
+ out << maxFreq << "\t";
+ out << alphabet->intToChar(minState) << "\t";
+ out << alphabet->intToChar(maxState) << "\t";
+ if (estimateAncestor) {
+ out << CodonSiteTools::numberOfSynonymousPositions(ancestralSequence->getValue(i), *gCode, kappa) << "\t";
+ } else {
+ out << CodonSiteTools::meanNumberOfSynonymousPositions(site, *gCode, kappa) << "\t";
}
- out << CodonSiteTools::meanNumberOfSynonymousPositions(site, *gCode) << "\t";
out << CodonSiteTools::isSynonymousPolymorphic(site, *gCode) << "\t";
out << CodonSiteTools::isFourFoldDegenerated(site, *gCode) << "\t";
out << CodonSiteTools::piNonSynonymous(site, *gCode) << "\t";
- out << CodonSiteTools::piSynonymous(site, *gCode) << endl;
+ out << CodonSiteTools::piSynonymous(site, *gCode);
+ if (outgroup) {
+ out << "\t" << pscOut->getSequence(0).getChar(i);
+ }
+ if (estimateAncestor) {
+ out << "\t" << ancestralSequence->getChar(i);
+ }
+ if (outgroup) {
+ //Add divergence
+ int outgroupState = pscOut->getSequence(0)[i];
+ if (codonAlphabet->isUnresolved(outgroupState) || codonAlphabet->isGap(outgroupState)) {
+ out << "\tNA\tNA\tNA";
+ } else {
+ if (estimateAncestor) {
+ //This is the same value as for polymorphism, we add it for having consistent output format
+ out << "\t" << CodonSiteTools::numberOfSynonymousPositions(ancestralSequence->getValue(i), *gCode, kappa);
+ } else {
+ //Also average over outgroup (Note: minState and maxState are identical in this case)
+ out << "\t" << (CodonSiteTools::numberOfSynonymousPositions(outgroupState, *gCode, kappa) +
+ CodonSiteTools::numberOfSynonymousPositions(minState, *gCode, kappa)) / 2.;
+ }
+ if (nbAlleles == 1) {
+ //Compare with outgroup:
+ if (site[0] == outgroupState) {
+ out << "\t0\t0" << endl;
+ } else {
+ //This is a real substitution:
+ int nt = (int)CodonSiteTools::numberOfDifferences(outgroupState, minState, *codonAlphabet);
+ double ns = CodonSiteTools::numberOfSynonymousDifferences(outgroupState, minState, *gCode);
+ out << "\t" << ns << "\t" << (nt - ns);
+ }
+ } else {
+ //Site is polymorphic, this is not a substitution
+ out << "\t0\t0" << endl;
+ }
+ }
+ }
+ out << endl;
}
}
diff --git a/bppsuite.spec b/bppsuite.spec
index a10b3d2..a360e01 100644
--- a/bppsuite.spec
+++ b/bppsuite.spec
@@ -1,5 +1,5 @@
%define _basename bppsuite
-%define _version 2.3.1
+%define _version 2.3.2
%define _release 1
%define _prefix /usr
diff --git a/buildBin.sh b/buildBin.sh
index ad31af0..f28ce69 100755
--- a/buildBin.sh
+++ b/buildBin.sh
@@ -1,6 +1,6 @@
#! /bin/sh
arch=x86_64 #i686
-version=1.3.0a-1
+version=2.3.2
strip bppSuite/bppdist
strip bppSuite/bpppars
diff --git a/doc/bppsuite.texi b/doc/bppsuite.texi
index 313132f..1d94d96 100644
--- a/doc/bppsuite.texi
+++ b/doc/bppsuite.texi
@@ -1,7 +1,7 @@
\input texinfo @c -*-texinfo-*-
@c %**start of header
@setfilename bppsuite.info
- at settitle BppSuite Manual 2.3.1
+ at settitle BppSuite Manual 2.3.2
@documentencoding UTF-8
@afourpaper
@dircategory Science Biology Genetics
@@ -21,7 +21,7 @@
@c %**end of header
@copying
-This is the manual of the Bio++ Program Suite, version 2.3.1.
+This is the manual of the Bio++ Program Suite, version 2.3.2.
Copyright @copyright{} 2007-2017 Bio++ development team
@end copying
@@ -57,23 +57,23 @@ Copyright @copyright{} 2007-2017 Bio++ development team
Common options encountered in several programs.
-* Alphabet:: Alphabets and genetic codes
-* Sequences:: Loading sequences/alignments
-* Tree:: Loading trees
-* AlphabetIndex:: Setting biochemical properties and distances
-* Process:: Specifying the substitution process
-* Distribution:: Setting of the discrete distributions
-* Estimation:: Estimating parameters by maximizing a likelihood function
-* WritingSequences:: Writing sequences/alignments to files
-* WritingTrees:: Writing trees to files
+* Alphabet:: Alphabets and genetic codes.
+* Sequences:: Loading sequences/alignments.
+* Tree:: Loading trees.
+* AlphabetIndex:: Setting biochemical properties and distances.
+* Process::
+* Distribution:: Setting of the discrete distributions.
+* Estimation:: Estimating parameters by maximizing a likelihood function.
+* WritingSequences:: Writing sequences/alignments to files.
+* WritingTrees:: Writing trees to files.
Process specification
-* Model:: Substitution process
-* Non-homogeneity:: Specific declaration of non-homogeneous modelling
+* Model::
+* Non-homogeneity:: Specific declaration of non-homogeneous modelling.
* FrequenciesSet:: Frequencies
* Rates:: Rates across sites
-* Linking:: Aliasing parameters
+* Linking::
Setting up the substitution model
@@ -84,21 +84,22 @@ Setting up the substitution model
* Multiple:: General multiple site models
* Meta:: Meta models
* Mixture:: Mixture of models
+* Conditioned:: Models conditioned by events
Bio++ Program Suite Reference
-* bppml:: Bio++ Maximum Likelihood
-* bppseqgen:: Bio++ Sequence Generator
-* bppancestor:: Bio++ Ancestral Sequences and Rates reconstruction
-* bppmixedlikelihoods:: Bio++ Site-Likelihoods Inside Mixed Models
-* bppdist:: Bio++ Distance Methods
-* bpppars:: Bio++ Maximum Parsimony
-* bppconsense:: Bio++ Consensus Trees
-* bppreroot:: Bio++ Serial Tree Re-rooting
-* bppseqman:: Bio++ Sequences Manipulation
+* bppml:: Bio++ Maximum Likelihood.
+* bppseqgen:: Bio++ Sequence Generator.
+* bppancestor:: Bio++ Ancestral Sequences and Rates reconstruction.
+* bppmixedlikelihoods:: Bio++ Site-Likelihoods Inside Mixed Models.
+* bppdist:: Bio++ Distance Methods.
+* bpppars:: Bio++ Maximum Parsimony.
+* bppconsense:: Bio++ Consensus Trees.
+* bppreroot:: Bio++ Serial Tree Re-rooting.
+* bppseqman:: Bio++ Sequences Manipulation.
* bppalnscore:: Bio++ Alignment Scoring
-* bpppopstats:: Bio++ Population Genetics
-* bpptreedraw:: Bio++ Tree Drawing
+* bpppopstats::
+* bpptreedraw:: Bio++ Tree Drawing.
@end detailmenu
@end menu
@@ -268,9 +269,9 @@ Space characters are allowed around the '=' and ',' ponctuations.
It is possible to recall anywhere the value of an option by using $(parameter).
@cartouche
@example
-optimization.topology.algorithm = NNI
-optimization.topology.algorithm_nni.method = phyml
-output.tree.file = MyData_$(optimization.topology.algorithm)_$(optimization.topology.algorithm_nni.method).dnd
+topo.algo = NNI
+topo.algo_nni.method = phyml
+output.tree.file = MyData_$(topo.algo)_$(topo.algo_nni.method).dnd
@end example
@end cartouche
You can use this syntax to define global variables:
@@ -326,10 +327,9 @@ data=LSU
@section Setting alphabet and genetic code
@table @command
- at item alphabet =
-@{DNA|RNA|Protein|Binary|Word(letter=@{DNA|RNA|Protein@},length=@{int@})|
-Codon(letter=@{DNA|RNA@}, type=@{Standard|EchinodermMitochondrial|InvertebrateMitochondrial|\
-VertebrateMitochondrial@})@}
+ at item alphabet = @{DNA|RNA|Protein|Binary|Word(letter=@{DNA|RNA|Protein@},length=@{int@})|
+Codon(letter=@{DNA|RNA@})@}
+
The alphabet to use when reading sequences. DNA and RNA alphabet can in addition take an argument:
@table @command
@@ -338,8 +338,10 @@ Tell is exclamation mark should be considered as a gap character. The default is
@end table
@item genetic_code = @{translation table@}
-Where 'translation table' specifies the code to use, either as a text description, or as the NCBI number.
-The following table give the currently implemented codes with their corresponding names:
+The genetic code used for codon alphabet, where 'translation table'
+specifies the code to use, either as a text description, or as the
+NCBI number. The following table give the currently implemented codes
+with their corresponding names:
@multitable @columnfractions 0.5 0.5
@item Standard @tab 1
@@ -350,6 +352,7 @@ The following table give the currently implemented codes with their correspondin
@item EchinodermMitochondrial @tab 9
@item AscidianMitochondrial @tab 13
@end multitable
+
@end table
The states of the alphabets are in alphabetical order. For the proteic
@@ -394,17 +397,17 @@ the rest of the header being considered as comments.
The Mase format (as read by Seaview and Phylo_win for instance), with
an optional site selection name.
- at item Phylip(order=@{interleaved|sequential@}, type=@{classic|extended@}, split=@{spaces|tab@})
-The Phylip format, with several variations.
-The argument @command{order} distinguishes between sequential and
-interleaved format, while the option @command{type} distinguished
-between the plain old Phylip format and the more recent extension
-allowing for sequence names longer than 10 characters, as understood
-by PAML and PhyML.
-Finally, the @command{split} argument specifies the type of character
-that separates the sequence name from the sequence content. The
-conventional option is to use one (classic) or more (extended) spaces,
-but tabs can also be used instead.
+ at item Phylip(order=@{interleaved|sequential@}, type=@{classic|extended@}, split=@{spaces|tab@})
+
+The Phylip format, with several variations. The argument
+ at command{order} distinguishes between sequential and interleaved
+format, while the option @command{type} distinguished between the
+plain old Phylip format and the more recent extension allowing for
+sequence names longer than 10 characters, as understood by PAML and
+PhyML. Finally, the @command{split} argument specifies the type of
+character that separates the sequence name from the sequence content.
+The conventional option is to use one (classic) or more (extended)
+spaces, but tabs can also be used instead.
@item Clustal(extraSpaces=@{int@})
The Clustal format.
@@ -611,6 +614,7 @@ of site-specific rate.
* Multiple:: General multiple site models
* Meta:: Meta models
* Mixture:: Mixture of models
+* Conditioned:: Models conditioned by events
@end menu
@table @command
@@ -658,70 +662,70 @@ the frequencies are computed from observed data.
@item JC69
The Jukes and Cantor model. This model has no additional parameter.
See the
- at uref{http://biopp.univ-montp2.fr/Documents/ClassDocumentation/bpp-phyl/html/classbpp_1_1JCnuc.html#_details, Bio++ description}.
+ at uref{http://bioweb.me/bpp-phyl-doc/classbpp_1_1JCnuc.html#details, Bio++ description, Bio++ description}.
@item K80([kappa=@{real>0@}])
The Kimura 2 parameters model. @var{kappa} is the transition over
transversion ratio. Default: @var{kappa}=1. See the
- at uref{http://biopp.univ-montp2.fr/Documents/ClassDocumentation/bpp-phyl/html/classbpp_1_1K80.html#_details, Bio++ description}.
+ at uref{http://bioweb.me/bpp-phyl-doc/classbpp_1_1K80.html#details, Bio++ description, Bio++ description}.
- at item F84([kappa=@{real>0@}, theta=@{real]0,1[@}, theta1=@{real]0,1[@},theta2=@{real]0,1[@} ,"equilibrium frequencies"] )
+ at item F84([kappa=@{real>0@}, theta=@{real]0,1[@}, theta1=@{real]0,1[@},theta2=@{real]0,1[@}, "equilibrium frequencies"] )
Felsenstein's 1984 substitution model, with transition/transversion
ratio and 4 distinct equilibrium frequencies, set using three
independent parameters: @var{theta} is the GC content, @var{theta1} is
the proportion of G / (G + C) and @var{theta2} is the proportion of A
/ (A + T or U). See the
- at uref{http://biopp.univ-montp2.fr/Documents/ClassDocumentation/bpp-phyl/html/classbpp_1_1F84.html#_details, Bio++ description}.
+ at uref{http://bioweb.me/bpp-phyl-doc/classbpp_1_1F84.html#details, Bio++ description, Bio++ description}.
- at item HKY85([kappa=@{real>0@}, theta=@{real]0,1[@}, theta1=@{real]0,1[@}, theta2=@{real]0,1[@} ,"equilibrium frequencies"])
+ at item HKY85([kappa=@{real>0@}, theta=@{real]0,1[@}, theta1=@{real]0,1[@}, theta2=@{real]0,1[@}, "equilibrium frequencies"])
Hasegawa, Kishino and Yano 1985's substitution model. The model is
similar to @command{F84}, but with a different implementation. The
@var{kappa} parameter used here is comparable to the one in
@command{K80}. See the
- at uref{http://biopp.univ-montp2.fr/Documents/ClassDocumentation/bpp-phyl/html/classbpp_1_1HKY85.html#_details, Bio++ description}.
+ at uref{http://bioweb.me/bpp-phyl-doc/classbpp_1_1HKY85.html#details, Bio++ description, Bio++ description}.
@item T92([kappa=@{real>0@}, theta=@{real]0,1[@} ,"equilibrium frequencies"])
Tamura 1992's model for nucleotides, similar to @command{HKY85}, yet
assuming that the frequencies of A = T/U and G = C. See the
- at uref{http://biopp.univ-montp2.fr/Documents/ClassDocumentation/bpp-phyl/html/classbpp_1_1T92.html#_details, Bio++ description}.
+ at uref{http://bioweb.me/bpp-phyl-doc/classbpp_1_1T92.html#details, Bio++ description, Bio++ description}.
- at item TN93([kappa1=@{real>0@}, kappa2=@{real>0@}, theta=@{real]0,1[@}, theta1=@{real]0,1[@}, theta2=@{real]0,1[@} ,"equilibrium frequencies"])
+ at item TN93([kappa1=@{real>0@}, kappa2=@{real>0@}, theta=@{real]0,1[@}, theta1=@{real]0,1[@}, theta2=@{real]0,1[@}, "equilibrium frequencies"])
Tamura and Nei 1993's model, similar to @command{HKY85}, but allowing
for two distinct transition/transversion ratios. See the
- at uref{http://biopp.univ-montp2.fr/Documents/ClassDocumentation/bpp-phyl/html/classbpp_1_1TN93.html#_details, Bio++ description}.
+ at uref{http://bioweb.me/bpp-phyl-doc/classbpp_1_1TN93.html#details, Bio++ description, Bio++ description}.
@item GTR([a=@{real>0@}, b=@{real>0@}, c=@{real>0@}, d=@{real>0@}, e=@{real>0@}, theta=@{real]0,1[@}, theta1=@{real]0,1[@}, theta2=@{real]0,1[@} ,"equilibrium frequencies"])
The General Time-Reversible substitution model. Parameters @var{a},
@var{b}, @var{c}, @var{d}, @var{e} are the entries of the
exchangeability matrix. See the
- at uref{http://biopp.univ-montp2.fr/Documents/ClassDocumentation/bpp-phyl/html/classbpp_1_1GTR.html#_details, Bio++ description}.
+ at uref{http://bioweb.me/bpp-phyl-doc/classbpp_1_1GTR.html#details, Bio++ description, Bio++ description}.
@item L95([beta=@{real>0@}, gamma=@{real>0@}, delta=@{real>0@}, theta=@{real]0,1[@}, theta1=@{real]0,1[@}, theta2=@{real]0,1[@} ,"equilibrium frequencies"])
The strand-symmetric model of Lobry 1995, for nucleotides. See the
- at uref{http://biopp.univ-montp2.fr/Documents/ClassDocumentation/bpp-phyl/html/classbpp_1_1L95.html#_details, Bio++ description}.
+ at uref{http://bioweb.me/bpp-phyl-doc/classbpp_1_1L95.html#details, Bio++ description, Bio++ description}.
@item SSR([beta=@{real>0@}, gamma=@{real>0@}, delta=@{real>0@}, theta=@{real]0,1[@}])
The strand-symmetric reversible model, for nucleotides. See the
- at uref{http://biopp.univ-montp2.fr/Documents/ClassDocumentation/bpp-phyl/html/classbpp_1_1SSR.html#_details, Bio++ description}.
+ at uref{http://bioweb.me/bpp-phyl-doc/classbpp_1_1SSR.html#details, Bio++ description, Bio++ description}.
@item RN95([thetaR=@{real]0,1[@}, thetaC=@{real]0,1[@}, thetaG=@{real]0,1[@}, kappaP=@{real[0,1[@}, gammaP=@{real[0,1[@}, sigmaP=@{real>1@}, alphaP=@{real>1@}])
The model described by Rhetsky and Nei, where the only hypothesis is
that the transversion rates are only dependent of the target
nucleotide. This model is not reversible. See the
- at uref{http://biopp.univ-montp2.fr/Documents/ClassDocumentation/bpp-phyl/html/classbpp_1_1RN95.html#_details,Bio++
-description}.
+ at uref{http://bioweb.me/bpp-phyl-doc/classbpp_1_1RN95.html#details,Bio++
+description, Bio++ description}.
@item RN95s([thetaA=@{real]0,0.5[@}, gamma=@{real]0,0.5[@}, alphaP=@{real>1@}])
The instersection of models RN95 and L95. The two hypotheses are that
the transversion rates are only dependent of the target nucleotide,
and strand symmetry. See the
- at uref{http://biopp.univ-montp2.fr/Documents/ClassDocumentation/bpp-phyl/html/classbpp_1_1RN95s.html#_details,Bio++
-description}.
+ at uref{http://bioweb.me/bpp-phyl-doc/classbpp_1_1RN95s.html#details,Bio++
+description, Bio++ description}.
@end table
@@ -732,69 +736,69 @@ description}.
@item JC69
The Jukes and Cantor model. This model has no additional parameter. See the
- at uref{http://biopp.univ-montp2.fr/Documents/ClassDocumentation/bpp-phyl/html/classbpp_1_1JCprot.html#_details, Bio++ description}.
+ at uref{http://bioweb.me/bpp-phyl-doc/classbpp_1_1JCprot.html#details, Bio++ description, Bio++ description}.
@item DSO78
Protein substitution model, using the dcmutt implementation of Kosiol
and Goldman 2005. See the
- at uref{http://biopp.univ-montp2.fr/Documents/ClassDocumentation/bpp-phyl/html/classbpp_1_1DSO78.html#_details, Bio++ description}.
+ at uref{http://bioweb.me/bpp-phyl-doc/classbpp_1_1DSO78.html#details, Bio++ description, Bio++ description}.
@item JTT92
Protein substitution model, using the dcmutt implementation of Kosiol
and Goldman 2005. See the
- at uref{http://biopp.univ-montp2.fr/Documents/ClassDocumentation/bpp-phyl/html/classbpp_1_1JTT92.html#_details, Bio++ description}.
+ at uref{http://bioweb.me/bpp-phyl-doc/classbpp_1_1JTT92.html#details, Bio++ description, Bio++ description}.
@item WAG01
Protein substitution model, from Whelan & Goldman 2001. See the
- at uref{http://biopp.univ-montp2.fr/Documents/ClassDocumentation/bpp-phyl/html/classbpp_1_1WAG01.html#_details, Bio++ description}.
+ at uref{http://bioweb.me/bpp-phyl-doc/classbpp_1_1WAG01.html#details, Bio++ description, Bio++ description}.
@item LG08
Protein substitution model, from Le & Gascuel 2008. See the
- at uref{http://biopp.univ-montp2.fr/Documents/ClassDocumentation/bpp-phyl/html/classbpp_1_1LG08.html#_details, Bio++ description}.
+ at uref{http://bioweb.me/bpp-phyl-doc/classbpp_1_1LG08.html#details, Bio++ description, Bio++ description}.
@item LLG08_EX2([relrate1=@{real]0,1[@}, relproba1=@{real]0,1[@}])
Protein substitution model, from Le, Lartillot & Gascuel 2008.
@xref{Mixture}, for the meaning of the variables. See the
- at uref{http://biopp.univ-montp2.fr/Documents/ClassDocumentation/bpp-phyl/html/classbpp_1_1LLG08__EX2.html#_details,
-Bio++ description}.
+ at uref{http://bioweb.me/bpp-phyl-doc/classbpp_1_1LLG08__EX2.html#details,
+Bio++ description, Bio++ description}.
@item LLG08_EX3([relrate1=@{real]0,1[@}, relrate2=@{real]0,1[@}, relproba1=@{real]0,1[@}, relproba2=@{real]0,1[@}])
Protein substitution model, from Le, Lartillot & Gascuel 2008. @xref{Mixture}, for the meaning of the variables. See the
- at uref{http://biopp.univ-montp2.fr/Documents/ClassDocumentation/bpp-phyl/html/classbpp_1_1LLG08__EX3.html#_details, Bio++ description}.
+ at uref{http://bioweb.me/bpp-phyl-doc/classbpp_1_1LLG08__EX3.html#details, Bio++ description, Bio++ description}.
@item LLG08_EHO([relrate1=@{real]0,1[@}, relrate2=@{real]0,1[@}, relproba1=@{real]0,1[@}, relproba2=@{real]0,1[@}])
Protein substitution model, from Le, Lartillot & Gascuel 2008. @xref{Mixture}, for the meaning of the variables. See the
- at uref{http://biopp.univ-montp2.fr/Documents/ClassDocumentation/bpp-phyl/html/classbpp_1_1LLG08__EHO.html#_details, Bio++ description}.
+ at uref{http://bioweb.me/bpp-phyl-doc/classbpp_1_1LLG08__EHO.html#details, Bio++ description, Bio++ description}.
@item LLG08_UL2([relrate1=@{real]0,1[@}, relproba1=@{real]0,1[@}])
Protein substitution model, from Le, Lartillot & Gascuel 2008. @xref{Mixture}, for the meaning of the variables. See the
- at uref{http://biopp.univ-montp2.fr/Documents/ClassDocumentation/bpp-phyl/html/classbpp_1_1LLG08__UL2.html#_details, Bio++ description}.
+ at uref{http://bioweb.me/bpp-phyl-doc/classbpp_1_1LLG08__UL2.html#details, Bio++ description, Bio++ description}.
@item LLG08_UL3([relrate1=@{real]0,1[@}, relrate2=@{real]0,1[@}, relproba1=@{real]0,1[@}, relproba2=@{real]0,1[@}])
Protein substitution model, from Le, Lartillot & Gascuel 2008. @xref{Mixture}, for the meaning of the variables. See the
- at uref{http://biopp.univ-montp2.fr/Documents/ClassDocumentation/bpp-phyl/html/classbpp_1_1LLG08__UL3.html#_details, Bio++ description}.
+ at uref{http://bioweb.me/bpp-phyl-doc/classbpp_1_1LLG08__UL3.html#details, Bio++ description, Bio++ description}.
@item LGL08_CAT(nbCat=@{[10,20,30,40,50,60]@}, [relrate1=@{real]0,1[@}, relrate2=@{real]0,1[@}, ..., relproba1=@{real]0,1[@}, relproba2=@{real]0,1[@}, ...] ))
CAT protein substitution model, from Le, Gascuel & Lartillot 2008, with a
given number (@var{nbCat}) of profiles. @xref{Mixture}, for the meaning
of the variables. See the
- at uref{http://biopp.univ-montp2.fr/Documents/ClassDocumentation/bpp-phyl/html/classbpp_1_1LGL08__CAT.html#_details,
-Bio++ description}.
+ at uref{http://bioweb.me/bpp-phyl-doc/classbpp_1_1LGL08__CAT.html#details,
+Bio++ description, Bio++ description}.
@item LGL08_CAT_C@{[1,...,nbCat]@}(nbCat=@{[10,20,30,40,50,60]@})
Submodel of a given CAT Protein substitution model, from Le, Gascuel &
Lartillot 2008, with a given number (@var{nbCat}) of profiles. See the
- at uref{http://biopp.univ-montp2.fr/Documents/ClassDocumentation/bpp-phyl/html/classbpp_1_1LGL08__CAT.html#_details,
-Bio++ description}.
+ at uref{http://bioweb.me/bpp-phyl-doc/classbpp_1_1LGL08__CAT.html#details,
+Bio++ description, Bio++ description}.
@item DSO78+F([theta=@{real]0,1[@}, theta1=@{real]0,1[@}, theta2=@{real]0,1[@}, ... ,"equilibrium frequencies"])
Protein substitution model, using the dcmutt implementation of Kosiol
@@ -802,25 +806,25 @@ and Goldman 2005 and free equilibrium frequencies. The @var{thetaX}
are frequencies parameters, where X is 1 to 19. Parameter @var{theta1}
is the proportion of A, @var{theta2} is the proportion of R over
(1-A), @var{theta3} the proportion of N over (1-A-R), etc. See the
- at uref{http://biopp.univ-montp2.fr/Documents/ClassDocumentation/bpp-phyl/html/classbpp_1_1DSO78.html#_details, Bio++ description}.
+ at uref{http://bioweb.me/bpp-phyl-doc/classbpp_1_1DSO78.html#details, Bio++ description, Bio++ description}.
@item JTT92+F([theta=@{real]0,1[@}, theta1=@{real]0,1[@}, theta2=@{real]0,1[@}, ..., "equilibrium frequencies"])
Protein substitution model, using the dcmutt implementation of Kosiol
and Goldman 2005 and free equilibrium frequencies. See the
- at uref{http://biopp.univ-montp2.fr/Documents/ClassDocumentation/bpp-phyl/html/classbpp_1_1JTT92.html#_details, Bio++ description}.
+ at uref{http://bioweb.me/bpp-phyl-doc/classbpp_1_1JTT92.html#details, Bio++ description, Bio++ description}.
@item WAG01+F([theta=@{real]0,1[@}, theta1=@{real]0,1[@}, theta2=@{real]0,1[@}, ..., "equilibrium frequencies"])
Protein substitution model, from Whelan & Goldman 2001, and free
equilibrium frequencies. See the
- at uref{http://biopp.univ-montp2.fr/Documents/ClassDocumentation/bpp-phyl/html/classbpp_1_1WAG01.html#_details, Bio++ description}.
+ at uref{http://bioweb.me/bpp-phyl-doc/classbpp_1_1WAG01.html#details, Bio++ description, Bio++ description}.
@item LG08+F([theta=@{real]0,1[@}, theta1=@{real]0,1[@}, theta2=@{real]0,1[@}, ..., "equilibrium frequencies"])
Protein substitution model, from Le & Gascuel 2008, and free
equilibrium frequencies. See the
- at uref{http://biopp.univ-montp2.fr/Documents/ClassDocumentation/bpp-phyl/html/classbpp_1_1LG08.html#_details, Bio++ description}.
+ at uref{http://bioweb.me/bpp-phyl-doc/classbpp_1_1LG08.html#details, Bio++ description, Bio++ description}.
@item Empirical(name=@{chars@}, file=@{path@})
@@ -842,7 +846,7 @@ namespace, including for frequencies.
Build the model on binary alphabet, where @var{kappa} is the relative
proportion of 1 over 0 in the equilibrium distribution. Default:
@var{kappa}=1. See the
- at uref{http://biopp.univ-montp2.fr/Documents/ClassDocumentation/bpp-phyl/html/classbpp_1_1BinarySubstitutionModel.html#_details, Bio++ description}.
+ at uref{http://bioweb.me/bpp-phyl-doc/classbpp_1_1BinarySubstitutionModel.html#details, Bio++ description, Bio++ description}.
@end table
@@ -850,10 +854,7 @@ proportion of 1 over 0 in the equilibrium distribution. Default:
@node Codon, Multiple, Miscellaneous, Model
@subsubsection Codon models
-Standard codon models: the global @var{genetic_code} argument
-describes the genetic code and has to be specified.
-
-Codon models also take as argument a @var{frequencies} option
+Some codon models also take as argument a @var{frequencies} option
specifying the equilibrium frequencies of the model. Any frequencies
description can be used here, but the syntax also supports options
similar to the ones used in the PAML software:
@@ -884,46 +885,46 @@ models, in the case of non stationarity.
@table @command
- at item GY94([genetic_code=@{genetic code description@}, kappa=@{real>0@}, V=@{real>0@}, "equilibrium frequencies"])
+ at item GY94([kappa=@{real>0@}, V=@{real>0@}, "equilibrium frequencies"])
Goldman and Yang (1994) substitution model for codons (default values:
@var{kappa}=1 and @var{V}=10000). See the
- at uref{http://biopp.univ-montp2.fr/Documents/ClassDocumentation/bpp-phyl/html/classbpp_1_1GY94.html#_details, Bio++ description}.
+ at uref{http://bioweb.me/bpp-phyl-doc/classbpp_1_1GY94.html#details, Bio++ description, Bio++ description}.
- at item MG94([genetic_code=@{genetic code descrition@}, rho=@{real>0@}, "equilibrium frequencies"])
+ at item MG94([rho=@{real>0@}, "equilibrium frequencies"])
Muse and Gaut (1994) substitution model for codons (default values:
@var{rho}=1). See the
- at uref{http://biopp.univ-montp2.fr/Documents/ClassDocumentation/bpp-phyl/html/classbpp_1_1MG94.html#_details, Bio++ description}.
+ at uref{http://bioweb.me/bpp-phyl-doc/classbpp_1_1MG94.html#details, Bio++ description, Bio++ description}.
- at item YN98([genetic_code=@{genetic code description@}, kappa=@{real>0@}, omega=@{real>0@}, "equilibrium frequencies"])
+ at item YN98([kappa=@{real>0@}, omega=@{real>0@}, "equilibrium frequencies"])
Yang and Nielsen (1998) substitution model for codons (default values:
@var{kappa}=1 and @var{omega}=1). See the
- at uref{http://biopp.univ-montp2.fr/Documents/ClassDocumentation/bpp-phyl/html/classbpp_1_1YN98.html#_details, Bio++ description}.
+ at uref{http://bioweb.me/bpp-phyl-doc/classbpp_1_1YN98.html#details, Bio++ description, Bio++ description}.
- at item YNGP_M0([genetic_code=@{genetic code description@}, kappa=@{real>0@}, omega=@{real>0@}, "equilibrium frequencies"])
+ at item YNGP_M0([kappa=@{real>0@}, omega=@{real>0@}, "equilibrium frequencies"])
The M0 model of PAML, ie the same as YN98. See the
- at uref{http://biopp.univ-montp2.fr/Documents/ClassDocumentation/bpp-phyl/html/classbpp_1_1YN98.html#_details, Bio++ description}.
+ at uref{http://bioweb.me/bpp-phyl-doc/classbpp_1_1YN98.html#details, Bio++ description, Bio++ description}.
- at item YNGP_M1([genetic_code=@{genetic code description@},kappa=@{real>0@}, omega=@{real>0@}, p0=@{real>0 and <1 @}, "equilibrium frequencies"])
+ at item YNGP_M1([kappa=@{real>0@}, omega=@{real>0@}, p0=@{real>0 and <1 @}, "equilibrium frequencies"])
The M1a model of PAML, see Yang, Z., R. Nielsen, N. Goldman, and A.-M.
K. Pedersen (2000) (default values: @var{kappa}=1, @var{p0}=0.5,
@var{omega}=0.5). See the
- at uref{http://biopp.univ-montp2.fr/Documents/ClassDocumentation/bpp-phyl/html/classbpp_1_1YNGP__M1.html#_details, Bio++ description}.
+ at uref{http://bioweb.me/bpp-phyl-doc/classbpp_1_1YNGP__M1.html#details, Bio++ description, Bio++ description}.
- at item YNGP_M2([genetic_code=@{genetic code description@},kappa=@{real>0@}, omega0=@{real>0 and <1@}, theta1=@{real>0 and <1 @}], omega1=@{real>1@}, theta2=@{real>0 and <1 @}, "equilibrium frequencies"])
+ at item YNGP_M2([kappa=@{real>0@}, omega0=@{real>0 and <1@}, theta1=@{real>0 and <1 @}], omega1=@{real>1@}, theta2=@{real>0 and <1 @}, "equilibrium frequencies"])
The M2a model of PAML, see Yang, Z., R. Nielsen, N. Goldman, and A.-M.
K. Pedersen (2000), with p0=theta1 and
p1=(1-theta1)*theta2 (default values: @var{kappa}=1, @var{theta1}=0.33333,
@var{theta2}=0.5, @var{omega0}=0.5, @var{omega2}=0.5). See the
- at uref{http://biopp.univ-montp2.fr/Documents/ClassDocumentation/bpp-phyl/html/classbpp_1_1YNGP__M2.html#_details, Bio++ description}.
+ at uref{http://bioweb.me/bpp-phyl-doc/classbpp_1_1YNGP__M2.html#details, Bio++ description, Bio++ description}.
- at item YNGP_M3([genetic_code=@{genetic code description@}, n=@{integer>0@}, kappa=@{real>0@}, omega0=@{real>0 and <1@}, delta1=@{real>0@}, ..., delta at var{n-1}=@{real>0@}, theta1=@{real>0 and <1 @}, ..., theta at var{n-1}1=@{real>0 and <1 @}, "equilibrium frequencies"])
+ at item YNGP_M3([n=@{integer>0@}, kappa=@{real>0@}, omega0=@{real>0 and <1@}, delta1=@{real>0@}, ..., delta at var{n-1}=@{real>0@}, theta1=@{real>0 and <1 @}, ..., theta at var{n-1}1=@{real>0 and <1 @}, "equilibrium frequencies"])
The M3 model of PAML, see Yang, Z., R. Nielsen, N. Goldman, and A.-M.
K. Pedersen (2000), with @var{n} discrete values, with p0=theta1
@@ -931,31 +932,32 @@ and pk=(1-theta1)*...*(1-thetak)*theta(k+1), and
omegak=omega0+delta1+....+deltak (default values: @var{n}=3,
@var{kappa}=1, @var{thetak}=1/(n-k+1), @var{omega0}=0.5,
@var{deltak}=0.5). See the
- at uref{http://biopp.univ-montp2.fr/Documents/ClassDocumentation/bpp-phyl/html/classbpp_1_1YNGP__M3.html#_details, Bio++ description}.
+ at uref{http://bioweb.me/bpp-phyl-doc/classbpp_1_1YNGP__M3.html#details, Bio++ description, Bio++ description}.
- at item YNGP_M7(n=@{integer>0@}, genetic_code=@{genetic code description@},kappa=@{real>0@}, p=@{real>1@}, q=@{real>1 @}, "equilibrium frequencies"])
+ at item YNGP_M7(n=@{integer>0@}, kappa=@{real>0@}, p=@{real>1@}, q=@{real>1 @}, "equilibrium frequencies"])
The M7 model of PAML, see Yang, Z., R. Nielsen, N. Goldman, and A.-M.
K. Pedersen (2000), with the Beta distribution discretized in @var{n}
classes (default values: @var{kappa}=1, @var{p}=2, @var{q}=2). See the
- at uref{http://biopp.univ-montp2.fr/Documents/ClassDocumentation/bpp-phyl/html/classbpp_1_1YNGP__M7.html#_details, Bio++ description}.
+ at uref{http://bioweb.me/bpp-phyl-doc/classbpp_1_1YNGP__M7.html#details, Bio++ description, Bio++ description}.
- at item YNGP_M8(n=@{integer>0@}, [genetic_code=@{genetic code description@},kappa=@{real>0@}, omegas=@{real>1@}, p0=@{real>0@},p=@{real>1@}, q=@{real>1 @}, "equilibrium frequencies"])
+ at item YNGP_M8(n=@{integer>0@}, [kappa=@{real>0@}, omegas=@{real>1@}, p0=@{real>0@},p=@{real>1@}, q=@{real>1 @}, "equilibrium frequencies"])
The M8 model of PAML, see Yang, Z., R. Nielsen, N. Goldman, and A.-M.
K. Pedersen (2000), with the Beta distribution discretized in @var{n}
classes (default values: @var{kappa}=1, @var{p}=2, @var{q}=2,
@var{p0}=0.5, @var{omegas}=2). See the
- at uref{http://biopp.univ-montp2.fr/Documents/ClassDocumentation/bpp-phyl/html/classbpp_1_1YNGP__M8.html#_details, Bio++ description}.
+ at uref{http://bioweb.me/bpp-phyl-doc/classbpp_1_1YNGP__M8.html#details, Bio++ description, Bio++ description}.
@end table
It is also possible to setup more specific models, by specifying a
nucleotide model for each position. Model parameters names then take
-the form of <codon model name>.<position set>_<position model name>.<position specific parameter name>.
+the form of <codon model name>.<position set>_<position model
+name>.<position specific parameter name>.
In the following models, the arguments @var{model} and
@var{model@{i@}} are for descriptions of models on bases.
@@ -977,13 +979,20 @@ the whole triplet alphabet, and then the substitution rates to and
from stop codons are set to zero and the generator is normalized with
this modification.
- at table @command
+The model names est defined through several words that can be mixed
+together to build models at hand. Some words are exclusive. The
+model description must begin with @var{Codon}.
- at item CodonRate(model=@{model name@} [, relrate1=@{real>0@}, relrate2=@{real>0@}, "equilibrium frequencies"])
-or
+ at var{Rate} and @var{Prot} and @var{Dist} words define how the models
+are mixed, either with specific rates, or using proteic models, or
+with non-synonymous vs synonymous substitution rates. They are
+exclusive, and one of the three must be used. The default model is
+ at var{Rate}.
+
+ at table @command
+ at item Rate(model... [, relrate1=@{real>0@}, relrate2=@{real>0@}])
- at item CodonRate(model1=@{model name@}, model2=@{model name@}, model3=@{model name@}[, relrate1=@{real>0@}, relrate2=@{real>0@}, "equilibrium frequencies"])
Substitution model on codons with position specific evolution rates.
@@ -992,7 +1001,8 @@ of the sites. Default: @var{relrate@{i@}=1/@{4-i@}}, such that the rate
of each site is 1/3.
@example
-alphabet=Codon(letter=DNA, type=Standard)
+alphabet=Codon(letter=DNA)
+genetic_code=Standard
model=CodonRate(model=T92)
@end example
builds a model on codons, such all sites follow the same T92 model.
@@ -1000,7 +1010,8 @@ The parameters names are @var{CodonRate.123_T92.kappa},
@var{CodonRate.relrate1}, @var{CodonRate.relrate2}.
@example
-alphabet=Codon(letter=DNA, type=Standard)
+alphabet=Codon(letter=DNA)
+genetic_code=Standard
model=CodonRate(model1=T92, model2=T92, model3=JC69)
@end example
builds a model on codons, such that first and second sites follow
@@ -1015,13 +1026,10 @@ model=CodonRate(model1=T92(theta=0.5, kappa=2), \
@end example
See the
- at uref{http://biopp.univ-montp2.fr/Documents/ClassDocumentation/bpp-phyl/html/classbpp_1_1CodonRateSubstitutionModel.html#_details, Bio++ description}.
-
- at item CodonDist(model=@{model name@}[, genetic_code=@{genetic code description@}, beta=@{real>0@}, "equilibrium frequencies"])
+ at uref{http://bioweb.me/bpp-phyl-doc/classbpp_1_1CodonRateSubstitutionModel.html#details, Bio++ description, Bio++ description}.
-or
- at item CodonDist(model1=@{model name@}, model2=@{model name@}, model3=@{model name@}[, geneticcode=@{genetic code description@}, beta=@{real>0@}, "equilibrium frequencies"])
+ at item Dist(model...[, beta=@{real>0@}])
Substitution model on codons that takes into account the difference
between synonymous and non-synonymous substitutions.
@@ -1030,7 +1038,8 @@ Optional argument @var{beta} is the ratio between non-synonymous
substitution rate and synonymous substitution rate. Default value: 1.
@example
-alphabet=Codon(letter=DNA, type=Standard)
+alphabet=Codon(letter=DNA)
+
model=CodonDist(model=T92)
@end example
builds a model on codons, such all sites follow the same T92 model.
@@ -1047,70 +1056,51 @@ parameters names are @var{CodonDist.1_T92.kappa},
@var{CodonDist.2_T92.kappa}, @var{CodonDist.beta}.
See the
- at uref{http://biopp.univ-montp2.fr/Documents/ClassDocumentation/bpp-phyl/html/classbpp_1_1CodonDistanceSubstitutionModel.html#_details, Bio++ description}.
+ at uref{http://bioweb.me/bpp-phyl-doc/classbpp_1_1CodonDistanceSubstitutionModel.html#details, Bio++ description, Bio++ description}.
- at item CodonRateFreq(model=@{model name@}, frequencies=@{frequencies set description@}[, relrate1=@{real>0@}, relrate2=@{real>0@}, "equilibrium frequencies"])
-or
- at item CodonRateFreq(model1=@{model name@}, model2=@{model name@}, model3=@{model name@}, frequencies=@{frequencies set description@} [, relrate1=@{real>0@}, relrate2=@{real>0@}, "equilibrium frequencies"])
-
+ at item Prot(model..., protmodel=@{proteic model name@}[, beta=@{real>0@}])
-Substitution model on codons with position specific evolution rates,
-where the sustitution rates are multiplied by the frequency of the
-target codon in the given frequencies set.
+Substitution model on codons that takes into account the substitution
+rates in a protein model. Those rates are multiplied by a
+non-synonymous susbtitution factor, aka @var{beta}.
-This model should be used with nucleotidic models which equilibrium
-distribution is fixed, ans does not depend on the parameters.
-Otherwise there may be problems of identifiability of the parameters.
-
-The multiplicative distribution of the model is described by the
- at var{frequencies} argument. See the description of the Frequencies Set
-below.
+ at var{Prot} and @var{Dist} words are exclusive.
-Each single site model is normalized and the substitution rates
-between codons that differ on more than one letter are null.
-Arguments @var{relrate@{i@}} stands for the relative substitution rates
-of the sites. Default: @var{relrate@{i@}=1/@{4-i@}}, such that the rate
-of each site is 1/3.
+Optional argument @var{beta} is the ratio between average substitution
+rate between amino-acids and synonymous substitution rate. Default
+value: 1.
@example
-alphabet=Codon(letter=DNA, type=Standard)
-model=CodonRateFreq(frequencies=Full())
+alphabet=Codon(letter=DNA)
+genetic_code=Standard
+model=CodonProt(model=T92, protmodel=LG08)
@end example
-has parameters @var{CodonRateFreq.123_K80.kappa},
- at var{CodonRateFreq.Full.theta_1}, ...,
- at var{CodonRateFreq.Full.theta_60},
- at var{CodonRateFreq.relrate1},
- at var{CodonRateFreq.relrate2}.
+builds a model on codons, such all sites follow the same T92 model,
+and amino-acid rates are proportional to LG08 substition matrice.
+The parameters names are @var{CodonProt.123_T92.kappa} and
+ at var{CodonProt.beta}.
-See the
- at uref{http://biopp.univ-montp2.fr/Documents/ClassDocumentation/bpp-phyl/html/classbpp_1_1CodonRateFrequenciesSubstitutionModel.html#_details, Bio++ description}.
-
- at item CodonDistFreq(model=@{model name@}, frequencies=@{frequencies set description@} [geneticcode=@{genetic code description@}, beta=@{real>0@}, "equilibrium frequencies"])
-
-or
-
- at item CodonDistFreq(model1=@{model name@}, model2=@{model name@}, model3=@{model name@}, frequencies=@{frequencies set description@} [geneticcode=@{genetic code description@}, beta=@{real>0@}, "equilibrium frequencies"])
-
-Substitution model on codons that takes into account the difference
-between synonymous and non-synonymous substitutions. Moreover, the
-sustitution rates are multiplied by the frequency of the target codon
-in the given frequencies set.
+ at end table
-This model should be used with nucleotidic models which equilibrium
-distribution is fixed, ans does not depend on the parameters.
+Optional words to describe the use of equilibrium frequencies
+sets. This word should be used with nucleotidic models which
+equilibrium distribution is fixed, ans does not depend on parameters.
Otherwise there may be problems of identifiability of the parameters.
-The multiplicative distribution of the model is described by the
+ at table @command
+ at item Freq(frequencies=@{frequencies set description@})
+
+Sustitution rates are multiplied by the frequency of the target codon
+in the given frequencies set. This factor is described by the
@var{frequencies} argument. See the description of the Frequencies Set
below.
-Optional argument @var{beta} is the ratio between non-synonymous
-substitution rate and synonymous substitution rate. Default value: 1.
@example
-alphabet=Codon(letter=DNA, type=Standard)
+alphabet=Codon(letter=DNA)
+genetic_code=Standard
model=CodonDistFreq(frequencies=Full())
@end example
has parameters @var{CodonDistFreq.012_T92.kappa},
@@ -1119,14 +1109,35 @@ has parameters @var{CodonDistFreq.012_T92.kappa},
@var{CodonDistFreq.beta}.
See the
- at uref{http://biopp.univ-montp2.fr/Documents/ClassDocumentation/bpp-phyl/html/classbpp_1_1CodonDistanceFrequenciesSubstitutionModel.html#_details, Bio++ description}.
+ at uref{http://bioweb.me/bpp-phyl-doc/classbpp_1_1CodonDistanceFrequenciesSubstitutionModel.html#details, Bio++ description, Bio++ description}.
+
+ at item PhasFreq(frequencies=@{frequencies set description@})
+
+The sustitution rates are multiplied by the product of the frequencies
+of the changed nucleotides -- conditioned on the phase -- in the given
+frequencies set. This factor is described by the @var{frequencies}
+argument. See the description of the Frequencies Set below.
+
+
+For example, see the
+ at uref{http://bioweb.me/bpp-phyl-doc/classbpp_1_1CodonDistancePhaseFrequenciesSubstitutionModel.html#details, Bio++ description, Bio++ description}.
+
+ at end table
+
+
+
+In addition some models are defined that allow multiple substitions,
+with similar logic of included words. These models are prefixed by @var{Kron}.
+
+
+ at table @command
@item KronDistFreq(model=@{model name@} [,positions=pos1*pos2*...*posn + posx*...*posm + ...)])
@item KronDistFreq(model1=@{model name@}, model1=@{model name@}, ..., modeln=@{model name@}[,positions=pos1*pos2*...*posn + posx*...*posm + ...])
-substitution model on codons as @var{CodonDistFreq} above,
-allowing simultaneous substitutions.
+substitution model on codons as @var{CodonDistFreq} above, allowing
+simultaneous substitutions.
Optional argument @var{positions} can be used to describe which
substitutions are allowed. See model @xref{Kron}.
@@ -1147,62 +1158,9 @@ Kronecker Codon Model based on a unique (KCM7) or one per position
(KCM19) GTR model. From Zaheri \& al, MBE, 2014.
See the
- at uref{http://biopp.univ-montp2.fr/Documents/ClassDocumentation/bpp-phyl/html/classbpp_1_1KCM.html#_details, Bio++ description}.
-
-
- at item CodonDistPhasFreq(model=@{model name@}, frequencies=@{frequencies set description@} [, geneticcode=@{genetic code description@}, beta=@{real>0@}])
+ at uref{http://bioweb.me/bpp-phyl-doc/classbpp_1_1KCM.html#details, Bio++ description, Bio++ description}.
-or
-
- at item CodonDistPhasFreq(model1=@{model name@}, model2=@{model name@}, model3=@{model name@}, frequencies=@{frequencies set description@} [, geneticcode=@{genetic code description@}, beta=@{real>0@}])
-Substitution model on codons that takes into account the difference
-between synonymous and non-synonymous substitutions. Moreover, the
-sustitution rates are multiplied by the product of the frequencies of
-the changed nucleotides -- conditioned on the phase -- in the given
-frequencies set.
-
-This model should be used with nucleotidic models in which equilibrium
-distribution is fixed, ans does not depend on the parameters.
-Otherwise there may be problems of identifiability of the parameters.
-
-The multiplicative distribution of the model is described by the
- at var{frequencies} argument. See the description of the Frequencies Set
-below.
-
-Optional argument @var{beta} is the ratio between non-synonymous
-substitution rate and synonymous substitution rate. Default value: 1.
-
-See the
- at uref{http://biopp.univ-montp2.fr/Documents/ClassDocumentation/bpp-phyl/html/classbpp_1_1CodonDistancePhaseFrequenciesSubstitutionModel.html#_details, Bio++ description}.
-
- at item CodonDistFitPhasFreq(model=@{model name@}, frequencies=@{frequencies set description@}, fitness=@{frequencies set description@} [, geneticcode=@{genetic code description@}, beta=@{real>0@}])
-
-or
-
- at item CodonDistFitPhasFreq(model1=@{model name@}, model2=@{model name@}, model3=@{model name@}, frequencies=@{frequencies set description@}, fitness=@{frequencies set description@} [, geneticcode=@{genetic code description@}, beta=@{real>0@}])
-
-Substitution model on codons that takes into account the difference
-between synonymous and non-synonymous substitutions and the difference
-between synonymous codons, in the same manner as in Yang and Nielsen's
-2008 substitution model. The sustitution rates are multiplied by the
-product of the frequencies of the changed nucleotides -- conditioned
-on the phase -- in the given frequencies set, and by ratios of
-fitnesses of the codons.
-
-This model should be used with nucleotidic models in which equilibrium
-distribution is fixed, ans does not depend on the parameters.
-Otherwise there may be problems of identifiability of the parameters.
-
-The multiplicative distribution of the model is described by the
- at var{frequencies} and @var{fitness} arguments. See the description of
-the Frequencies Set below.
-
-Optional argument @var{beta} is the ratio between non-synonymous
-substitution rate and synonymous substitution rate. Default value: 1.
-
-See the
- at uref{http://biopp.univ-montp2.fr/Documents/ClassDocumentation/bpp-phyl/html/classbpp_1_1CodonDistanceFitnessPhaseFrequenciesSubstitutionModel.html#_details, Bio++ description}.
@end table
@@ -1259,7 +1217,7 @@ site follows a HKY85 model. Then the parameters names are
@var{Word.relrate1}, @var{Word.relrate2}, @var{Word.relrate3}.
See the
- at uref{http://biopp.univ-montp2.fr/Documents/ClassDocumentation/bpp-phyl/html/classbpp_1_1WordSubstitutionModel.html#_details, Bio++ description}.
+ at uref{http://bioweb.me/bpp-phyl-doc/classbpp_1_1WordSubstitutionModel.html#details, Bio++ description, Bio++ description}.
@item Kron(model=@{model name@} [,positions=pos1*pos2*...*posn + posx*...*posm + ...)])
@anchor{Kron}
@@ -1319,7 +1277,7 @@ model.
See the
- at uref{http://biopp.univ-montp2.fr/Documents/ClassDocumentation/bpp-phyl/html/classbpp_1_1KronSubstitutionModel.html#_details, Bio++ description}.
+ at uref{http://bioweb.me/bpp-phyl-doc/classbpp_1_1KronSubstitutionModel.html#details, Bio++ description, Bio++ description}.
@item Triplet(model=@{model description@} [, relrate1=@{real>0@}, relrate2=@{real>0@}])
@@ -1347,7 +1305,8 @@ of the sites. Default: @var{relrate@{i@}=1/@{4-i@}}, such that the rate
of each site is 1/3.
@example
-alphabet=Codon(letter=DNA, type=Standard)
+alphabet=Codon(letter=DNA)
+genetic_code=Standard
model=Triplet(model=T92)
@end example
builds a model on codons, such all sites follow the same T92 model.
@@ -1365,14 +1324,14 @@ parameters names are @var{Triplet.1_T92.kappa},
@var{Triplet.relrate2}.
See the
- at uref{http://biopp.univ-montp2.fr/Documents/ClassDocumentation/bpp-phyl/html/classbpp_1_1TripletSubstitutionModel.html#_details, Bio++ description}.
+ at uref{http://bioweb.me/bpp-phyl-doc/classbpp_1_1TripletSubstitutionModel.html#details, Bio++ description, Bio++ description}.
@item YpR_Sym(model=@{model description@}, [rCgT=@{real>=0@}, rTgC=@{real>=0@}, rCaT=@{real>=0@}, rTaC=@{real>=0@}])
substitution model on quotiented triplets to handle strand symetric
neighbour-dependency inside dinucleotides YpR (see Bérard and Guéguen
2012). See the
- at uref{http://biopp.univ-montp2.fr/Documents/ClassDocumentation/bpp-phyl/html/classbpp_1_1YpR_SymSubstitutionModel.html#_details, Bio++ description}.
+ at uref{http://bioweb.me/bpp-phyl-doc/classbpp_1_1YpR_SymSubstitutionModel.html#details, Bio++ description, Bio++ description}.
@item YpR_Gen(model=@{model description@}, [rCgT=@{real>=0@}, rcGA=@{real>=0@}, rTgC=@{real>=0@}, rtGA=@{real>=0@}, rCaT=@{real>=0@}, rcAG=@{real>=0@}, rTaC=@{real>=0@}, rtAG=@{real>=0@}])
@@ -1380,7 +1339,7 @@ neighbour-dependency inside dinucleotides YpR (see Bérard and Guéguen
substitution model on quotiented triplets to handle general symetric
neighbour-dependency inside dinucleotides YpR (see Bérard and Guéguen
2012). See the
- at uref{http://biopp.univ-montp2.fr/Documents/ClassDocumentation/bpp-phyl/html/classbpp_1_1YpR_GenSubstitutionModel.html#_details, Bio++ description}.
+ at uref{http://bioweb.me/bpp-phyl-doc/classbpp_1_1YpR_GenSubstitutionModel.html#details, Bio++ description, Bio++ description}.
@end table
@@ -1396,7 +1355,7 @@ These substitution models take as argument another substitution model, and add s
Tuffley and Steel 1998's 'covarion' model, taking a nested
substitution model as argument for @var{model}. The nested model can
be any substitution model for any alphabet. See the
- at uref{http://biopp.univ-montp2.fr/Documents/ClassDocumentation/bpp-phyl/html/classbpp_1_1TS98.html#_details, Bio++ description}.
+ at uref{http://bioweb.me/bpp-phyl-doc/classbpp_1_1TS98.html#details, Bio++ description, Bio++ description}.
@item G01(model=@{model description@}, rdist=@{rate distribution description@}, mu=@{real>0@} [, "equilibrium frequencies"])
@@ -1404,19 +1363,19 @@ Galtier 2001's 'covarion' model, taking a nested substitution model as
argument for @var{model} and a rate distribution for parameter
@var{rdist} (see below). The nested model can be any substitution
model for any alphabet. See the
- at uref{http://biopp.univ-montp2.fr/Documents/ClassDocumentation/bpp-phyl/html/classbpp_1_1G01.html#_details, Bio++ description}.
+ at uref{http://bioweb.me/bpp-phyl-doc/classbpp_1_1G01.html#details, Bio++ description, Bio++ description}.
@item RE08(model=@{model description@}, lambda=@{real>0@}, mu=@{real>0@} [, "equilibrium frequencies"])
Rivas and Eddy 2008's substitution model with gaps, taking a nested
substitution model as argument for @var{model}. Parameter @var{lambda}
is the insertion rate, while @var{mu} is the deletion rate. See the
- at uref{http://biopp.univ-montp2.fr/Documents/ClassDocumentation/bpp-phyl/html/classbpp_1_1RE08.html#_details, Bio++ description}.
+ at uref{http://bioweb.me/bpp-phyl-doc/classbpp_1_1RE08.html#details, Bio++ description, Bio++ description}.
@end table
- at node Mixture, , Meta, Model
+ at node Mixture, Conditioned , Meta, Model
@subsubsection Mixture of models
@table @command
@@ -1480,7 +1439,7 @@ has parameters @var{TN93.kappa1_Gamma.alpha},
@var{TN93.theta2}.
See the
- at uref{http://biopp.univ-montp2.fr/Documents/ClassDocumentation/bpp-phyl/html/classbpp_1_1MixtureOfASubstitutionModel.html#_details, Bio++ description}.
+ at uref{http://bioweb.me/bpp-phyl-doc/classbpp_1_1MixtureOfASubstitutionModel.html#details, Bio++ description, Bio++ description}.
@item Mixture(model1=@{model description@},..., modeln=@{model description@} [, relrate1=@{1>real>0@},..., relrate@{n-1@}=@{1>real>0@}, relproba1=@{1>real>0@}, ..., relproba@{n-1@}=@{1>real>0@}, "equilibrium frequencies"])
Mixture model built from several @var{models}: each model has its own
@@ -1500,13 +1459,34 @@ has parameters at var{Mixture.relrate1}, @var{Mixture.relproba1},
@var{Mixture.2_YN98.kappa}, @var{Mixture.2_YN98.omega}.
See the
- at uref{http://biopp.univ-montp2.fr/Documents/ClassDocumentation/bpp-phyl/html/classbpp_1_1MixtureOfSubstitutionModels.html#_details, Bio++ description}.
+ at uref{http://bioweb.me/bpp-phyl-doc/classbpp_1_1MixtureOfSubstitutionModels.html#details, Bio++ description, Bio++ description}.
@end table
+ at node Conditioned, , Mixture, Model
+ at subsubsection Conditioned models
+The transition probabilities on the branches are conditioned by the
+occurence of given events. The model is then no-markovian, but
+semi-markovian. The sets of considered events follow the one (ie
+register) defined for substitution mapping (see the testnh manual).
+ at table @command
+ at item OneChange(model=@{model description@})
+The transition probabilities along each branch are conditioned by the
+fact that there has been at least one substitution on this branch with
+thid model.
+
+ at item OneChange(model=@{model description@}, register=@{register
+name@}, numReg=num1+num2+...)
+
+The transition probabilities along each branch are conditioned by the
+fact that there has been on this branch at least one substitution
+of the specific types in the register. The "+" permits the declaration
+of several types.
+
+ at end table
@node Non-homogeneity, FrequenciesSet, Model, Process
@subsection Setting up non-stationary / non-homogeneous models
@@ -1718,18 +1698,18 @@ frequency set parameters are position dependent.
@example
alphabet=Word(letter=DNA,length=4)
-Word(frequency=GC())
+nonhomogeneous.root_freq=Word(frequency=GC())
@end example
-builds a frequency set on 4 bases words, such that all sites
+builds a root frequency set on 4 bases words, such that all sites
frequencies follow the same GC frequency set model. The parameter
name is @var{1234_GC.theta}.
@example
alphabet=Word(letter=DNA,length=4)
-Word(frequency1=GC(),frequency2=GC(),frequency3=Fixed(),\
- frequency4=Full())
+nonhomogeneous.root_freq=Word(frequency1=GC(),frequency2=GC(),\
+ frequency3=Fixed(),frequency4=Full())
@end example
-builds a frequency set on 4 bases words, such first and second sites
+builds a root frequency set on 4 bases words, such first and second sites
follow independent GC frequency sets, third site follows a Fixed
frequency set, and fourth site follows a Full frequency set. Then the
parameters names are @var{1_GC.theta},
@@ -1757,16 +1737,18 @@ that case, all single site frequency set parameters are position
dependent.
@example
-alphabet=Codon(letter=DNA, type=Standard)
-Codon(frequency=GC())
+alphabet=Codon(letter=DNA)
+genetic_code=Standard
+nonhomogeneous.root_freq=Codon(frequency=GC())
@end example
builds a frequency set on codons, such that all sites frequencies
follow the same GC frequency set model. The parameter name is
@var{123_GC.theta}.
@example
-alphabet=Codon(letter=DNA, type=Standard)
-Codon(frequency1=GC(),frequency2=GC(),frequency3=Fixed())
+alphabet=Codon(letter=DNA)
+genetic_code=Standard
+nonhomogeneous.root_freq=Codon(frequency1=GC(),frequency2=GC(),frequency3=Fixed())
@end example
builds a frequency set on codons, such that first and second sites
--
Alioth's /usr/local/bin/git-commit-notice on /srv/git.debian.org/git/debian-med/bppsuite.git
More information about the debian-med-commit
mailing list