[med-svn] [Git][med-team/python-treetime][upstream] New upstream version 0.5.2
Andreas Tille
gitlab at salsa.debian.org
Wed Dec 5 11:56:02 GMT 2018
Andreas Tille pushed to branch upstream at Debian Med / python-treetime
Commits:
4cf4459f by Andreas Tille at 2018-12-05T11:00:06Z
New upstream version 0.5.2
- - - - -
15 changed files:
- README.md
- bin/treetime
- doc/source/conf.py
- doc/source/index.rst
- doc/source/treetime.rst
- setup.py
- treetime/__init__.py
- treetime/aa_models.py
- treetime/clock_tree.py
- treetime/gtr.py
- treetime/treeanc.py
- treetime/treeregression.py
- treetime/treetime.py
- treetime/utils.py
- treetime/wrappers.py
Changes:
=====================================
README.md
=====================================
@@ -11,10 +11,12 @@ To optimize the likelihood of time-scaled phylogenies, TreeTime uses an iterativ
The only topology optimization are (optional) resolution of polytomies in a way that is most (approximately) consistent with the sampling time constraints on the tree.
The package is designed to be used as a stand-alone tool on the command-line or as a library used in larger phylogenetic analysis work-flows.
-In addition to scripting TreeTime or using it via the command-line, there is also a small web server at [treetime.ch](https://treetime.ch).
+In addition to scripting TreeTime or using it via the command-line, there is also a small web server at [treetime.ch](https://treetime.biozentrum.unibas.ch/).
![Molecular clock phylogeny of 200 NA sequences of influenza A H3N2](https://raw.githubusercontent.com/neherlab/treetime_examples/master/figures/tree_and_clock.png)
+Have a look at our [examples and tutorials](https://github.com/neherlab/treetime_examples).
+
#### Features
* ancestral sequence reconstruction (marginal and joint maximum likelihood)
* molecular clock tree inference (marginal and joint maximum likelihood)
@@ -30,6 +32,7 @@ In addition to scripting TreeTime or using it via the command-line, there is als
+ [Ancestral sequence reconstruction](#ancestral-sequence-reconstruction)
+ [Homoplasy analysis](#homoplasy-analysis)
+ [Mugration analysis](#mugration-analysis)
+ + [Metadata and date format](#metadata-and-date-format)
* [Example scripts](#example-scripts)
* [Related tools](#related-tools)
* [Projects using TreeTime](#projects-using-treetime)
@@ -56,6 +59,10 @@ You may install TreeTime and its dependencies by running
pip install .
```
within this repository.
+You can also install TreeTime from PyPi via
+```bash
+ pip install phylo-treetime
+```
You might need root privileges for system wide installation. Alternatively, you can simply use it TreeTime locally without installation. In this case, just download and unpack it, and then add the TreeTime folder to your $PYTHONPATH.
@@ -63,7 +70,8 @@ You might need root privileges for system wide installation. Alternatively, you
### Command-line usage
TreeTime can be used as part of python programs that create and interact with tree time objects. How TreeTime can be used to address typical questions like ancestral sequence reconstruction, rerooting, timetree inference etc is illustrated by a collection of example scripts described below.
-In addition, we provide scripts that can be run from the command line with arguments specifying input data and parameters.
+In addition, TreeTime can be used from the command line with arguments specifying input data and parameters.
+Trees can be read as newick, nexus and phylip files; fasta and phylip are supported alignment formats; metadata and dates can be provided as csv or tsv files, see [below](#metadata-and-date-format) for details.
#### Timetrees
The to infer a timetree, i.e. a phylogenetic tree in which branch length reflect time rather than divergence, TreeTime offers implements the command:
@@ -113,6 +121,25 @@ where `<field>` is the relevant column in the csv file specifying the metadata `
The full list if options is available by typing `treetime mugration -h`.
Please see [treetime_examples/mugration.md](https://github.com/neherlab/treetime_examples/blob/master/mugration.md) for examples and more documentation.
+#### Metadata and date format
+Several of TreeTime commands require the user to specify a file with dates and/or other meta data.
+TreeTime assumes these files to by either comma (csv) or tab-separated (tsv) files.
+The first line of these files is interpreted as header line specifying the content of the columns.
+Each file needs to have at least one column that is named `name`, `accession`, or `strain`.
+This column needs to contain the names of each sequence and match the names of taxons in the tree if one is provided.
+If more than one of `name`, `accession`, or `strain` is found, TreeTime will use the first.
+
+If the analysis requires dates, at least one column name needs to contain `date` (i.e. `sampling date` is fine).
+Again, if multiple hits are found, TreeTime will use the first.
+TreeTime will attempt to parse dates in the following way and order
+
+| order | type/format | example | description|
+| --- |-------------|---------|------------|
+| 1| float | 2017.56 | decimal date |
+| 2| [float:float] | [2013.45:2015.56] | decimal date range |
+| 3| %Y-%m-%d | 2017-08-25 | calendar date in ISO format |
+| 4| %Y-XX-XX | 2017-XX-XX | calendar date missing month and/or day |
+
### Example scripts
The following scripts illustrate how treetime can be used to solve common problem with short python scripts. They are meant to be used in an interactive ipython environment and run as `run examples/ancestral_inference.py`.
=====================================
bin/treetime
=====================================
@@ -132,6 +132,7 @@ def add_reroot_group(parser):
def add_gtr_arguments(parser):
parser.add_argument('--gtr', default='infer', help=gtr_description)
parser.add_argument('--gtr-params', nargs='+', help=gtr_params_description)
+ parser.add_argument('--aa', action='store_true', help="use aminoacid alphabet")
def add_anc_arguments(parser):
@@ -162,15 +163,19 @@ if __name__ == '__main__':
add_reroot_group(t_parser)
add_gtr_arguments(t_parser)
t_parser.add_argument('--clock-rate', type=float, help="if specified, the rate of the molecular clock won't be optimized.")
+ t_parser.add_argument('--clock-std-dev', type=float, help="standard deviation of the provided clock rate estimate")
t_parser.add_argument('--branch-length-mode', default='auto', type=str, choices=['auto', 'input', 'joint', 'marginal'],
help="If set to 'input', the provided branch length will be used without modification. "
"Note that branch lengths optimized by treetime are only accurate at short evolutionary distances.")
t_parser.add_argument('--confidence', action='store_true', help="estimate confidence intervals of divergence times.")
t_parser.add_argument('--keep-polytomies', default=False, action='store_true',
help="Don't resolve polytomies using temporal information.")
- t_parser.add_argument('--relax',nargs='*', default=False,
- help='use an autocorrelated molecular clock. Prior strength and coupling of parent '
- 'and offspring rates can be specified e.g. as --relax 1.0 0.5')
+ t_parser.add_argument('--relax',nargs=2, type=float,
+ help='use an autocorrelated molecular clock. Strength of the gaussian priors on'
+ ' branch specific rate deviation and the coupling of parent and offspring'
+ ' rates can be specified e.g. as --relax 1.0 0.5. Values around 1.0 correspond'
+ ' to weak priors, larger values constrain rate deviations more strongly.'
+ ' Coupling 0 (--relax 1.0 0) corresponds to an un-correlated clock.')
t_parser.add_argument('--max-iter', default=2, type=int,
help='maximal number of iterations the inference cycle is run. Note that for polytomy resolution and coalescence models max_iter should be at least 2')
t_parser.add_argument('--coalescent', default="0.0", type=str,
@@ -181,6 +186,10 @@ if __name__ == '__main__':
t_parser.add_argument('--plot-rtt', default="root_to_tip_regression.pdf",
help = "filename to save the plot to. Suffix will determine format"
" (choices pdf, png, svg, default=pdf)")
+ t_parser.add_argument('--tip-labels', action='store_true',
+ help = "add tip labels (default for small trees with <30 leaves)")
+ t_parser.add_argument('--no-tip-labels', action='store_true',
+ help = "don't show tip labels (default for small trees with >=30 leaves)")
add_anc_arguments(t_parser)
add_common_args(t_parser)
=====================================
doc/source/conf.py
=====================================
@@ -69,7 +69,7 @@ master_doc = 'index'
# General information about the project.
project = u'TreeTime'
-copyright = u'2017, Pavel Sagulenko and Richard Neher'
+copyright = u'2017-2018, Pavel Sagulenko and Richard Neher'
author = u'Pavel Sagulenko and Richard Neher'
# The version info for the project you're documenting, acts as replacement for
@@ -77,9 +77,9 @@ author = u'Pavel Sagulenko and Richard Neher'
# built documents.
#
# The short X.Y version.
-version = u'0.4.0'
+version = u'0.5.0'
# The full version, including alpha/beta/rc tags.
-release = u'0.4.0'
+release = u'0.5.0'
# The language for content autogenerated by Sphinx. Refer to documentation
# for a list of supported languages.
=====================================
doc/source/index.rst
=====================================
@@ -48,41 +48,40 @@ Utility code
Command-line functions
======================
-TreeTime is designed to be part of python workflows, but for a number of standard
-tasks we have implemented scripts that can be called from the command line like
-regular linux programs.
+TreeTime is designed to be part of python workflows, but we have exposed a number of standard
+tasks via a command-line interface.
+The TreeTime command-line tool is called :code:`treetime`.
+Examples and documentation of the command-line interface can be found in the github repo https://github.com/neherlab/treetime_examples.
+In its standard mode, it will take a tree, an alignment, and file with dates as input and estimate a time-scaled phylogeny.
+The full set of options are available via :code:`treetime -h`.
+
+
+Subcommand :code:`treetime ancestral`
+-------------------------------------
+This subcommand reconstructs ancestral sequences and maps mutations to the tree.
+It produces an alignment file containing inferred ancestral sequences and a tree file
+with mutations included as comments. The inferred GTR model is written to stdout.
-homoplasy_scanner
------------------
+Subcommand :code:`treetime homoplasy`
+-------------------------------------
Reconstructs ancestral sequences and maps mutations to the tree.
The tree is then scanned for homoplasies. An excess number of homoplasies
might suggest contamination, recombination, culture adaptation or similar.
Results are printed to stdout.
-ancestral_reconstruction
-------------------------
-Reconstructs ancestral sequences and maps mutations to the tree.
-Produces an alignment file containing inferred ancestral sequences and a tree file
-with mutations included as comments. The inferred GTR model is written to stdout.
-temporal_signal
+Subcommand :code:`treetime clock`
---------------
Calculates the root-to-tip regression and quantifies the 'clock-i-ness' of the tree.
It will reroot the tree to maximize the clock-like
-signal and recalculate branch length unless run with --keep_root.
+signal and recalculate branch length unless run with :code:`--keep_root`.
-timetree_inference
-------------------
-Reconstructs ancestral sequences and infers a molecular clock tree. Produces
-an alignment file containing inferred ancestral sequences and a tree file with
-mutations included as comments, and prints the molecular clock and inferred
-GTR model.
-
-mugration
----------
+Subcommand :code:`treetime mugration`
+-------------------------------------
Reconstructs discrete ancestral states, for example geographic location, host, or similar.
+
Indices and tables
==================
=====================================
doc/source/treetime.rst
=====================================
@@ -29,8 +29,6 @@ Additional functionality
.. automethod:: treetime.TreeTime.reroot_to_best_root
-.. automethod:: treetime.TreeTime.find_best_root_and_regression
-
.. automethod:: treetime.TreeTime.plot_root_to_tip
.. automethod:: treetime.TreeTime.print_lh
=====================================
setup.py
=====================================
@@ -27,7 +27,7 @@ setup(
packages=['treetime'],
install_requires = [
'biopython>=1.66',
- 'matplotlib>=2.0',
+ 'matplotlib>=2.0, ==2.*',
'numpy>=1.10.4',
'pandas>=0.17.1',
'scipy>=0.16.1'
=====================================
treetime/__init__.py
=====================================
@@ -6,4 +6,4 @@ from .treetime import ttconf as treetime_conf
from .gtr import GTR
from .merger_models import Coalescent
from .treeregression import TreeRegression
-version="0.5.0"
+version="0.5.2"
=====================================
treetime/aa_models.py
=====================================
@@ -4,7 +4,7 @@ from .seq_utils import alphabets
def JTT92(mu=1.0):
- from gtr import GTR
+ from .gtr import GTR
# stationary concentrations:
pis = np.array([
0.07674789,
=====================================
treetime/clock_tree.py
=====================================
@@ -69,8 +69,10 @@ class ClockTree(TreeAnc):
self.rel_tol_refine = ttconf.REL_TOL_REFINE
self.branch_length_mode = branch_length_mode
self.clock_model=None
+ self.use_covariation=True # if false, covariation will be ignored in rate estimates.
self._set_precision(precision)
- self._assign_dates()
+ if self._assign_dates()==ttconf.ERROR:
+ raise ValueError("ClockTree requires date constraints!")
def _assign_dates(self):
@@ -78,23 +80,30 @@ class ClockTree(TreeAnc):
Returns
-------
- TYPE
- Description
+ str
+ success/error code
"""
if self.tree is None:
self.logger("ClockTree._assign_dates: tree is not set, can't assign dates", 0)
return ttconf.ERROR
+ bad_branch_counter = 0
for node in self.tree.find_clades(order='postorder'):
if node.name in self.date_dict:
- try:
- tmp = np.mean(self.date_dict[node.name])
- node.raw_date_constraint = self.date_dict[node.name]
- node.bad_branch = False
- except:
- self.logger("WARNING: ClockTree.init: node %s has a bad date: %s"%(node.name, str(self.date_dict[node.name])), 2, warn=True)
+ tmp_date = self.date_dict[node.name]
+ if np.isscalar(tmp_date) and np.isnan(tmp_date):
+ self.logger("WARNING: ClockTree.init: node %s has a bad date: %s"%(node.name, str(tmp_date)), 2, warn=True)
node.raw_date_constraint = None
node.bad_branch = True
+ else:
+ try:
+ tmp = np.mean(tmp_date)
+ node.raw_date_constraint = tmp_date
+ node.bad_branch = False
+ except:
+ self.logger("WARNING: ClockTree.init: node %s has a bad date: %s"%(node.name, str(tmp_date)), 2, warn=True)
+ node.raw_date_constraint = None
+ node.bad_branch = True
else: # nodes without date contraints
node.raw_date_constraint = None
@@ -107,6 +116,13 @@ class ClockTree(TreeAnc):
# this node, the branch is marked as 'bad'
node.bad_branch = np.all([x.bad_branch for x in node])
+ if node.is_terminal() and node.bad_branch:
+ bad_branch_counter += 1
+
+ if bad_branch_counter>self.tree.count_terminals()-3:
+ self.logger("ERROR: ALMOST NO VALID DATE CONSTRAINTS, EXITING", 1, warn=True)
+ return ttconf.ERROR
+
return ttconf.SUCCESS
@@ -205,7 +221,7 @@ class ClockTree(TreeAnc):
def get_clock_model(self, covariation=True, slope=None):
Treg = self.setup_TreeRegression(covariation=covariation)
self.clock_model = Treg.regression(slope=slope)
- if not Treg.valid_confidence:
+ if not Treg.valid_confidence or (slope is not None):
if 'cov' in self.clock_model:
self.clock_model.pop('cov')
self.clock_model['valid_confidence']=False
@@ -273,7 +289,7 @@ class ClockTree(TreeAnc):
node.branch_length_interpolator.gamma = gamma
# use covariance in clock model only after initial timetree estimation is done
- use_cov = (len(has_clock_length) - np.sum(has_clock_length))<3
+ use_cov = (np.sum(has_clock_length) > len(has_clock_length)*0.7) and self.use_covariation
self.get_clock_model(covariation=use_cov, slope=clock_rate)
# make node distribution objects
@@ -312,7 +328,7 @@ class ClockTree(TreeAnc):
Key word arguments to initialize dates constraints
'''
- self.logger("ClockTree: Maximum likelihood tree optimization with temporal constraints:",1)
+ self.logger("ClockTree: Maximum likelihood tree optimization with temporal constraints",1)
self.init_date_constraints(clock_rate=clock_rate, **kwargs)
@@ -688,12 +704,101 @@ class ClockTree(TreeAnc):
n.branch_length = n.numdate - n.up.numdate
+ def calc_rate_susceptibility(self, rate_std=None, params=None):
+ """return the time tree estimation of evolutionary rates +/- one
+ standard deviation form the ML estimate.
+
+ Returns
+ -------
+ TreeTime.return_code : str
+ success or failure
+ """
+ params = params or {}
+ if rate_std is None:
+ if not (self.clock_model['valid_confidence'] and 'cov' in self.clock_model):
+ self.logger("ClockTree.calc_rate_susceptibility: need valid standard deviation of the clock rate to estimate dating error.", 1, warn=True)
+ return ttconf.ERROR
+ rate_std = np.sqrt(self.clock_model['cov'][0,0])
+
+ current_rate = self.clock_model['slope']
+ upper_rate = self.clock_model['slope'] + rate_std
+ lower_rate = max(0.1*current_rate, self.clock_model['slope'] - rate_std)
+ for n in self.tree.find_clades():
+ if n.up:
+ n.branch_length_interpolator.gamma*=upper_rate/current_rate
+
+ self.logger("###ClockTree.calc_rate_susceptibility: run with upper bound of rate estimate", 1)
+ self.make_time_tree(**params)
+ self.logger("###ClockTree.calc_rate_susceptibility: rate: %f, LH:%f"%(upper_rate, self.tree.positional_joint_LH), 2)
+ for n in self.tree.find_clades():
+ n.numdate_rate_variation = [(upper_rate, n.numdate)]
+ if n.up:
+ n.branch_length_interpolator.gamma*=lower_rate/upper_rate
+
+ self.logger("###ClockTree.calc_rate_susceptibility: run with lower bound of rate estimate", 1)
+ self.make_time_tree(**params)
+ self.logger("###ClockTree.calc_rate_susceptibility: rate: %f, LH:%f"%(lower_rate, self.tree.positional_joint_LH), 2)
+ for n in self.tree.find_clades():
+ n.numdate_rate_variation.append((lower_rate, n.numdate))
+ if n.up:
+ n.branch_length_interpolator.gamma*=current_rate/lower_rate
+
+ self.logger("###ClockTree.calc_rate_susceptibility: run with central rate estimate", 1)
+ self.make_time_tree(**params)
+ self.logger("###ClockTree.calc_rate_susceptibility: rate: %f, LH:%f"%(current_rate, self.tree.positional_joint_LH), 2)
+ for n in self.tree.find_clades():
+ n.numdate_rate_variation.append((current_rate, n.numdate))
+ n.numdate_rate_variation.sort(key=lambda x:x[1]) # sort estimates for different rates by numdate
+
+ return ttconf.SUCCESS
+
+
+ def date_uncertainty_due_to_rate(self, node, interval=(0.05, 0.095)):
+ """use previously calculated variation of the rate to estimate
+ the uncertainty in a particular numdate due to rate variation.
+
+ Parameters
+ ----------
+ node : PhyloTree.Clade
+ node for which the confidence interval is to be calculated
+ interval : tuple, optional
+ Array of length two, or tuple, defining the bounds of the confidence interval
+
+ """
+ if hasattr(node, "numdate_rate_variation"):
+ from scipy.special import erfinv
+ nsig = [np.sqrt(2.0)*erfinv(-1.0 + 2.0*x) if x*(1.0-x) else 0
+ for x in interval]
+ l,c,u = [x[1] for x in node.numdate_rate_variation]
+ return np.array([c + x*np.abs(y-c) for x,y in zip(nsig, (l,u))])
+
+ else:
+ return None
+
+ def combine_confidence(self, center, limits, c1=None, c2=None):
+ if c1 is None and c2 is None:
+ return np.array(limits)
+ elif c1 is None:
+ min_val,max_val = c2
+ elif c2 is None:
+ min_val,max_val = c1
+ else:
+ min_val = center - np.sqrt((c1[0]-center)**2 + (c2[0]-center)**2)
+ max_val = center + np.sqrt((c1[1]-center)**2 + (c2[1]-center)**2)
+
+ return np.array([max(limits[0], min_val),
+ min(limits[1], max_val)])
+
+
+
def get_confidence_interval(self, node, interval = (0.05, 0.95)):
'''
If temporal reconstruction was done using the marginal ML mode, the entire distribution of
times is available. This function determines the 90% (or other) confidence interval, defined as the
range where 5% of probability is below and above. Note that this does not necessarily contain
the highest probability position.
+ In absense of marginal reconstruction, it will return uncertainty based on rate
+ variation. If both are present, the wider interval will be returned.
Parameters
----------
@@ -711,21 +816,28 @@ class ClockTree(TreeAnc):
Array with two numerical dates delineating the confidence interval
'''
+ rate_contribution = self.date_uncertainty_due_to_rate(node, interval)
+
if hasattr(node, "marginal_inverse_cdf"):
+ min_date, max_date = [self.date2dist.to_numdate(x) for x in
+ (node.marginal_pos_LH.xmax, node.marginal_pos_LH.xmin)]
if node.marginal_inverse_cdf=="delta":
return np.array([node.numdate, node.numdate])
else:
- return self.date2dist.to_numdate(node.marginal_inverse_cdf(np.array(interval))[::-1])
+ mutation_contribution = self.date2dist.to_numdate(node.marginal_inverse_cdf(np.array(interval))[::-1])
else:
- return np.array([np.nan, np.nan])
+ min_date, max_date = [-np.inf, np.inf]
+ return self.combine_confidence(node.numdate, (min_date, max_date),
+ c1=rate_contribution, c2=mutation_contribution)
def get_max_posterior_region(self, node, fraction = 0.9):
'''
If temporal reconstruction was done using the marginal ML mode, the entire distribution of
- times is available. This function determines the 95% confidence interval defines as the
- range where 5% of probability are below and above. Note that this does not necessarily contain
- the highest probability position.
+ times is available. This function determines the interval around the highest
+ posterior probability region that contains the specified fraction of the probability mass.
+ In absense of marginal reconstruction, it will return uncertainty based on rate
+ variation. If both are present, the wider interval will be returned.
Parameters
----------
@@ -743,18 +855,18 @@ class ClockTree(TreeAnc):
Array with two numerical dates delineating the high posterior region
'''
- if not hasattr(node, "marginal_inverse_cdf"):
- return np.array([np.nan, np.nan])
-
if node.marginal_inverse_cdf=="delta":
return np.array([node.numdate, node.numdate])
+
min_max = (node.marginal_pos_LH.xmin, node.marginal_pos_LH.xmax)
+ min_date, max_date = [self.date2dist.to_numdate(x) for x in min_max][::-1]
if node.marginal_pos_LH.peak_pos == min_max[0]: #peak on the left
return self.get_confidence_interval(node, (0, fraction))
elif node.marginal_pos_LH.peak_pos == min_max[1]: #peak on the right
return self.get_confidence_interval(node, (1.0-fraction ,1.0))
else: # peak in the center of the distribution
+ rate_contribution = self.date_uncertainty_due_to_rate(node, ((1-fraction)*0.5, 1.0-(1.0-fraction)*0.5))
# construct height to position interpolators left and right of the peak
# this assumes there is only one peak --- might fail in odd cases
@@ -775,9 +887,12 @@ class ClockTree(TreeAnc):
# minimza and determine success
sol = minimize(func, bracket=[0,10], args=(fraction,))
if sol['success']:
- return self.date2dist.to_numdate(np.array([right(sol['x']), left(sol['x'])]).squeeze())
+ mutation_contribution = self.date2dist.to_numdate(np.array([right(sol['x']), left(sol['x'])]).squeeze())
else: # on failure, return standard confidence interval
- return self.get_confidence_interval(node, (0.5*(1-fraction), 1-0.5*(1-fraction)))
+ mutation_contribution = None
+
+ return self.combine_confidence(node.numdate, (min_date, max_date),
+ c1=rate_contribution, c2=mutation_contribution)
if __name__=="__main__":
=====================================
treetime/gtr.py
=====================================
@@ -3,7 +3,6 @@ from collections import defaultdict
import numpy as np
from treetime import config as ttconf
from .seq_utils import alphabets, profile_maps, alphabet_synonyms
-from .aa_models import JTT92
class GTR(object):
@@ -334,6 +333,7 @@ class GTR(object):
"""
from .nuc_models import JC69, K80, F81, HKY85, T92, TN93
+ from .aa_models import JTT92
if model.lower() in ['jc', 'jc69', 'jukes-cantor', 'jukes-cantor69', 'jukescantor', 'jukescantor69']:
return JC69(**kwargs)
=====================================
treetime/treeanc.py
=====================================
@@ -10,6 +10,7 @@ from .gtr import GTR
string_types = [str] if sys.version_info[0]==3 else [str, unicode]
+
class TreeAnc(object):
"""
Class defines simple tree object with basic interface methods: reading and
@@ -82,24 +83,27 @@ class TreeAnc(object):
self.log=log
self.logger("TreeAnc: set-up",1)
self._internal_node_count = 0
+ self.additional_constant_sites = 0 # sites not part of the alignment but assumed constant
self.use_mutation_length=False
- # if not specified, this will be set as 1/alignment_length
+ # if not specified, this will be set as the alignment_length or reference length
self._seq_len = None
self.seq_len = kwargs['seq_len'] if 'seq_len' in kwargs else None
self.fill_overhangs = fill_overhangs
self.is_vcf = False #this is set true when aln is set, if aln is dict
+ # if sequences represent multiple samples, this can be added as multiplicity here
self.seq_multiplicity = {} if seq_multiplicity is None else seq_multiplicity
- self.multiplicity = None
self.ignore_gaps = ignore_gaps
- self._gtr = None
- self.set_gtr(gtr or 'JC69', **kwargs)
self._tree = None
self.tree = tree
if tree is None:
raise AttributeError("TreeAnc: tree loading failed! exiting")
+ # set up GTR model
+ self._gtr = None
+ self.set_gtr(gtr or 'JC69', **kwargs)
+
# will be None if not set
self.ref = ref
@@ -111,7 +115,9 @@ class TreeAnc(object):
# otherwise self.aln will be None
self._aln = None
self.reduced_to_full_sequence_map = None
+ self.multiplicity = None
self.aln = aln
+
if self.aln and self.tree:
if len(self.tree.get_terminals()) != len(self.aln):
print("**WARNING: Number of sequences in tree differs from number of sequences in alignment!**")
@@ -322,10 +328,29 @@ class TreeAnc(object):
if (not self.is_vcf) and self.convert_upper:
self._aln = MultipleSeqAlignment([seq.upper() for seq in self._aln])
- if self.is_vcf:
- self.seq_len = len(self.ref)
+ if self.seq_len:
+ if self.is_vcf and self.seq_len!=len(self.ref):
+ self.logger("TreeAnc.aln: specified sequence length doesn't match reference length, ignoring sequence length.", 1, warn=True)
+ self._seq_len = len(self.ref)
+ else:
+ self.logger("TreeAnc.aln: specified sequence length doesn't match alignment length. Treating difference as constant sites.", 2, warn=True)
+ self.additional_constant_sites = max(0, self.seq_len - self.aln.get_alignment_length())
else:
- self.seq_len = self.aln.get_alignment_length()
+ if self.is_vcf:
+ self.seq_len = len(self.ref)
+ else:
+ self.seq_len = self.aln.get_alignment_length()
+
+
+ # check whether the alignment is consistent with a nucleotide alignment.
+ likely_alphabet = self._guess_alphabet()
+ from .seq_utils import alphabets
+ # if likely alignment is not nucleotide but the gtr alignment is, WARN
+ if likely_alphabet=='aa' and len(self.gtr.alphabet)==len(alphabets['nuc']) and np.all(self.gtr.alphabet==alphabets['nuc']):
+ self.logger('WARNING: small fraction of ACGT-N in alignment. Really a nucleotide alignment? if not, rerun with --aa', 1, warn=True)
+ # conversely, warn if alignment is consistent with nucleotide but gtr has a long alphabet
+ if likely_alphabet=='nuc' and len(self.gtr.alphabet)>10:
+ self.logger('WARNING: almost exclusively ACGT-N in alignment. Really a protein alignment?', 1, warn=True)
if hasattr(self, '_tree') and (self.tree is not None):
self._attach_sequences_to_nodes()
@@ -396,6 +421,26 @@ class TreeAnc(object):
"""
self._ref = in_ref
+ def _guess_alphabet(self):
+ if self.aln:
+ if self.is_vcf and self.ref:
+ total=self.seq_len
+ nuc_count = 0
+ for n in 'ACGT-N':
+ nuc_count += self.ref.upper().count(n)
+ else:
+ total=self.seq_len*len(self.aln)
+ nuc_count = 0
+ for seq in self.aln:
+ for n in 'ACGT-N':
+ nuc_count += seq.seq.upper().count(n)
+ if nuc_count>0.9*total:
+ return 'nuc'
+ else:
+ return 'aa'
+ else:
+ return 'nuc'
+
def _attach_sequences_to_nodes(self):
'''
@@ -504,7 +549,7 @@ class TreeAnc(object):
return
else:
aln_transpose = np.array(seqs).T
- positions = range(self.seq_len)
+ positions = range(aln_transpose.shape[0])
for pi in positions:
if self.is_vcf:
@@ -539,6 +584,31 @@ class TreeAnc(object):
# sequence to the reduced aln<->sequence_pos_indexes map
alignment_patterns[str_pat][1].append(pi)
+ # add constant alignment column not in the alignment. We don't know where they
+ # are, so just add them to the end. First, determine sequence composition.
+ if self.additional_constant_sites:
+ character_counts = {c:np.sum(aln_transpose==c) for c in self.gtr.alphabet
+ if c not in [self.gtr.ambiguous, '-']}
+ total = np.sum(list(character_counts.values()))
+ additional_columns = [(c,int(np.round(self.additional_constant_sites*n/total)))
+ for c, n in character_counts.items()]
+ columns_left = self.additional_constant_sites
+ pi = len(positions)
+ for c,n in additional_columns:
+ if c==additional_columns[-1][0]: # make sure all additions add up to the correct number to avoid rounding
+ n = columns_left
+ str_pat = c*len(self.aln)
+ pos_list = list(range(pi, pi+n))
+
+ if str_pat in alignment_patterns:
+ alignment_patterns[str_pat][1].extend(pos_list)
+ else:
+ alignment_patterns[str_pat] = (len(tmp_reduced_aln), pos_list)
+ tmp_reduced_aln.append(np.array(list(str_pat)))
+ pi += n
+ columns_left -= n
+
+
# count how many times each column is repeated in the real alignment
self.multiplicity = np.zeros(len(alignment_patterns))
for p, pos in alignment_patterns.values():
@@ -796,13 +866,14 @@ class TreeAnc(object):
self.logger("TreeAnc.infer_gtr: counting mutations...done", 3)
if print_raw:
print('alphabet:',alpha)
- print('n_ij:', nij)
- print('T_i:', Ti)
+ print('n_ij:', nij, nij.sum())
+ print('T_i:', Ti, Ti.sum())
root_state = np.array([np.sum((self.tree.root.cseq==nuc)*self.multiplicity) for nuc in alpha])
self._gtr = GTR.infer(nij, Ti, root_state, fixed_pi=fixed_pi, pc=pc,
alphabet=self.gtr.alphabet, logger=self.logger,
prof_map = self.gtr.profile_map)
+
if normalized_rate:
self.logger("TreeAnc.infer_gtr: setting overall rate to 1.0...", 2)
self._gtr.mu=1.0
@@ -958,7 +1029,7 @@ class TreeAnc(object):
return mut_matrix_stack
- def expanded_sequence(self, node):
+ def expanded_sequence(self, node, include_additional_constant_sites=False):
"""
Get node's compressed sequence and expand it to the real sequence
@@ -968,13 +1039,18 @@ class TreeAnc(object):
Tree node
Returns
- -------
+ -------f
seq : np.array
Sequence as np.array of chars
"""
- seq = np.zeros_like(self.full_to_reduced_sequence_map, dtype='U1')
+ if include_additional_constant_sites:
+ L = self.seq_len
+ else:
+ L = self.seq_len - self.additional_constant_sites
+ seq = np.zeros_like(self.full_to_reduced_sequence_map[:L], dtype='U1')
for pos, state in enumerate(node.cseq):
- seq[self.reduced_to_full_sequence_map[pos]] = state
+ full_pos = self.reduced_to_full_sequence_map[pos]
+ seq[full_pos[full_pos<L]] = state
return seq
=====================================
treetime/treeregression.py
=====================================
@@ -245,7 +245,7 @@ class TreeRegression(object):
a vector of length 6 containing the updated quantities
"""
if n.is_terminal() and outgroup==False:
- if tv is None:
+ if tv is None or np.isinf(tv) or np.isnan(tv):
res = np.array([0, 0, 0, 0, 0, 0])
elif var==0:
res = np.array([np.inf, np.inf, np.inf, np.inf, np.inf, np.inf])
@@ -332,7 +332,6 @@ class TreeRegression(object):
bv = self.branch_value(n)
var = self.branch_variance(n)
x, chisq = self._optimal_root_along_branch(n, tv, bv, var)
-
if (chisq<best_root["chisq"]):
tmpQ = self.propagate_averages(n, tv, bv*x, var*x) \
+ self.propagate_averages(n, tv, bv*(1-x), var*(1-x), outgroup=True)
@@ -341,8 +340,16 @@ class TreeRegression(object):
best_root = {"node":n, "split":x}
best_root.update(reg)
+ if 'node' not in best_root:
+ print("TreeRegression.find_best_root: Not valid root found!", force_positive)
+ return None
+
# calculate differentials with respect to x
deriv = []
+ n = best_root["node"]
+ tv = self.tip_value(n)
+ bv = self.branch_value(n)
+ var = self.branch_variance(n)
for dx in [-0.001, 0.001]:
y = min(1.0, max(0.0, best_root["split"]+dx))
tmpQ = self.propagate_averages(n, tv, bv*y, var*y) \
=====================================
treetime/treetime.py
=====================================
@@ -33,7 +33,8 @@ class TreeTime(ClockTree):
def run(self, root=None, infer_gtr=True, relaxed_clock=None, n_iqd = None,
resolve_polytomies=True, max_iter=0, Tc=None, fixed_clock_rate=None,
- time_marginal=False, sequence_marginal=False, branch_length_mode='auto', **kwargs):
+ time_marginal=False, sequence_marginal=False, branch_length_mode='auto',
+ vary_rate=False, use_covariation=True, **kwargs):
"""
Run TreeTime reconstruction. Based on the input parameters, it divides
@@ -43,62 +44,82 @@ class TreeTime(ClockTree):
Parameters
----------
+ root : str
+ Try to find better root position on a given tree. If string is passed,
+ the root will be searched according to the specified method. If none,
+ use tree as-is.
- root : str
- Try to find better root position on a given tree. If string is passed,
- the root will be searched according to the specified method. If none,
- use tree as-is.
+ See :py:meth:`treetime.TreeTime.reroot` for available rooting methods.
- See :py:meth:`treetime.TreeTime.reroot` for available rooting methods.
+ infer_gtr : bool
+ If True, infer GTR model
- infer_gtr : bool
- If True, infer GTR model
+ relaxed_clock : dic
+ If not None, use autocorrelated molecular clock model. Specify the
+ clock parameters as :code:`{slack:<slack>, coupling:<coupling>}` dictionary.
- relaxed_clock : dic
- If not None, use autocorrelated molecular clock model. Specify the
- clock parameters as :code:`{slack:<slack>, coupling:<coupling>}` dictionary.
+ n_iqd : int
+ If not None, filter tree nodes which do not obey the molecular clock
+ for the particular tree. The nodes, which deviate more than
+ :code:`n_iqd` interquantile intervals from the molecular clock
+ regression will be marked as 'BAD' and not used in the TreeTime
+ analysis
- n_iqd : int
- If not None, filter tree nodes which do not obey the molecular clock
- for the particular tree. The nodes, which deviate more than
- :code:`n_iqd` interquantile intervals from the molecular clock
- regression will be marked as 'BAD' and not used in the TreeTime
- analysis
+ resolve_polytomies : bool
+ If True, attempt to resolve multiple mergers
- resolve_polytomies : bool
- If True, attempt to resolve multiple mergers
+ max_iter : int
+ Maximum number of iterations to optimize the tree
- max_iter : int
- Maximum number of iterations to optimize the tree
+ Tc : float, str
+ If not None, use coalescent model to correct the branch lengths by
+ introducing merger costs.
- Tc : float, str
- If not None, use coalescent model to correct the branch lengths by
- introducing merger costs.
+ If Tc is float, it is interpreted as the coalescence time scale
- If Tc is float, it is interpreted as the coalescence time scale
+ If Tc is str, it should be one of (:code:`opt`, :code:`const`, :code:`skyline`)
- If Tc is str, it should be one of (:code:`opt`, :code:`skyline`)
+ fixed_clock_rate : float
+ Fixed clock rate to be used. If None, infer clock rate from the molecular clock.
- fixed_clock_rate : float
- Fixed clock rate to be used. If None, infer clock rate from the molecular clock.
+ time_marginal : bool
+ If True, perform a final round of marginal reconstruction of the node's positions.
- time_marginal : bool
- If True, perform a final round of marginal reconstruction of the node's positions.
+ sequence_marginal : bool, optional
+ use marginal reconstruction for ancestral sequences
- branch_length_mode : str
- Should be one of: :code:`joint`, :code:`marginal`, :code:`input`.
+ branch_length_mode : str
+ Should be one of: :code:`joint`, :code:`marginal`, :code:`input`.
- If 'input', rely on the branch lengths in the input tree and skip directly
- to the maximum-likelihood ancestral sequence reconstruction.
- Otherwise, perform preliminary sequence reconstruction using parsimony
- algorithm and do branch length optimization
+ If 'input', rely on the branch lengths in the input tree and skip directly
+ to the maximum-likelihood ancestral sequence reconstruction.
+ Otherwise, perform preliminary sequence reconstruction using parsimony
+ algorithm and do branch length optimization
- **kwargs
- Keyword arguments needed by the dowstream functions
+ vary_rate : bool or float, optional
+ redo the time tree estimation for rates +/- one standard deviation.
+ if a float is passed, it is interpreted as standard deviation,
+ otherwise this standard deviation is estimated from the root-to-tip regression
+
+ use_covariation : bool, optional
+ default True, if False, rate estimates will be performed using simple
+ regression ignoring phylogenetic covaration between nodes.
+
+ **kwargs
+ Keyword arguments needed by the dowstream functions
+
+
+ Returns
+ -------
+ TreeTime error/succces code : str
+ return value depending on success or error
"""
+ # register the specified covaration mode
+ self.use_covariation = use_covariation
+
if (self.tree is None) or (self.aln is None and self.seq_len is None):
self.logger("TreeTime.run: ERROR, alignment or tree are missing", 0)
return ttconf.ERROR
@@ -214,6 +235,18 @@ class TreeTime(ClockTree):
self.logger("###TreeTime.run: CONVERGED",0)
break
+ # if the rate is too be varied and the rate estimate has a valid confidence interval
+ # rerun the estimation for variations of the rate
+ if vary_rate:
+ if type(vary_rate)==float:
+ res = self.calc_rate_susceptibility(rate_std=vary_rate, params=tt_kwargs)
+ elif self.clock_model['valid_confidence']:
+ res = self.calc_rate_susceptibility(params=tt_kwargs)
+ else:
+ res = ttconf.ERROR
+
+ if res==ttconf.ERROR:
+ self.logger("TreeTime.run: rate variation failed and can't be used for confidence estimation", 1, warn=True)
# if marginal reconstruction requested, make one more round with marginal=True
# this will set marginal_pos_LH, which to be used as error bar estimations
@@ -394,7 +427,7 @@ class TreeTime(ClockTree):
#(Without outgroup_branch_length, gives a trifurcating root, but this will mean
#mutations may have to occur multiple times.)
self.tree.root_with_outgroup(new_root, outgroup_branch_length=new_root.branch_length/2)
- self.get_clock_model(covariation=True)
+ self.get_clock_model(covariation=self.use_covariation)
if new_root == ttconf.ERROR:
@@ -419,7 +452,7 @@ class TreeTime(ClockTree):
n.clock_length = n.branch_length
self.prepare_tree()
- self.get_clock_model(covariation='ML' in root)
+ self.get_clock_model(covariation=(('ML' in root) and self.use_covariation))
return ttconf.SUCCESS
@@ -682,6 +715,7 @@ class TreeTime(ClockTree):
act_len = node.clock_length if hasattr(node, 'clock_length') else node.branch_length
# opt_len \approx 1.0*len(node.mutations)/node.profile.shape[0] but calculated via gtr model
+ # stiffness is the expectation of the inverse variance of branch length (one_mutation/opt_len)
# contact term: stiffness*(g*bl - bl_opt)^2 + slack(g-1)^2 =
# (slack+bl^2) g^2 - 2 (bl*bl_opt+1) g + C= k2 g^2 + k1 g + C
node._k2 = slack + c*act_len**2/(opt_len+self.one_mutation)
@@ -763,14 +797,15 @@ def plot_vs_years(tt, step = None, ax=None, confidence=None, ticks=True, **kwarg
'''
import matplotlib.pyplot as plt
tt.branch_length_to_years()
+ nleafs = tt.tree.count_terminals()
+
if ax is None:
- fig = plt.figure()
+ fig = plt.figure(figsize=(12,10))
ax = plt.subplot(111)
else:
fig = None
# draw tree
if "label_func" not in kwargs:
- nleafs = tt.tree.count_terminals()
kwargs["label_func"] = lambda x:x.name if (x.is_terminal() and nleafs<30) else ""
Phylo.draw(tt.tree, axes=ax, **kwargs)
@@ -784,13 +819,13 @@ def plot_vs_years(tt, step = None, ax=None, confidence=None, ticks=True, **kwarg
step/=5
elif date_range/step<5:
step/=2
- step = max(1,step)
+ step = max(1.0/12,step)
# set axis labels
if step:
dtick = step
min_tick = step*(offset//step)
- extra = 0 if dtick<date_range else dtick
+ extra = dtick if dtick<date_range else dtick
tick_vals = np.arange(min_tick, min_tick+date_range+extra, dtick)
xticks = tick_vals - offset
else:
@@ -819,7 +854,8 @@ def plot_vs_years(tt, step = None, ax=None, confidence=None, ticks=True, **kwarg
edgecolor=[1,1,1])
ax.add_patch(r)
if year in tick_vals and pos>=xlim[0] and pos<=xlim[1] and ticks:
- ax.text(pos,ylim[0]-0.04*(ylim[1]-ylim[0]),str(int(year)),
+ label_str = str(step*(year//step)) if step<1 else str(int(year))
+ ax.text(pos,ylim[0]-0.04*(ylim[1]-ylim[0]), label_str,
horizontalalignment='center')
ax.set_axis_off()
=====================================
treetime/utils.py
=====================================
@@ -181,9 +181,10 @@ def parse_dates(date_file):
It will first try to parse the column as float, than via
pandas.to_datetime and finally as ambiguous date such as 2018-05-XX
"""
+ print("\nAttempting to parse dates...")
dates = {}
if not os.path.isfile(date_file):
- print("\n ERROR: file %s does not exist, exiting..."%date_file)
+ print("\n\tERROR: file %s does not exist, exiting..."%date_file)
return dates
# separator for the csv/tsv file. If csv, we'll strip extra whitespace around ','
full_sep = '\t' if date_file.endswith('.tsv') else r'\s*,\s*'
@@ -208,46 +209,60 @@ def parse_dates(date_file):
df.iloc[i,ci] = tmp_d.strip(d[0])
if 'date' in col.lower():
potential_date_columns.append((ci, col))
- if any([x in col.lower() for x in ['name', 'strain', 'accession']]):
+ if any([x==col.lower() for x in ['name', 'strain', 'accession']]):
potential_index_columns.append((ci, col))
+
+ dates = {}
# if a potential numeric date column was found, use it
# (use the first, if there are more than one)
if not len(potential_index_columns):
- print("Cannot read metadata: need at least one column that contains the taxon labels."
+ print("ERROR: Cannot read metadata: need at least one column that contains the taxon labels."
" Looking for the first column that contains 'name', 'strain', or 'accession' in the header.", file=sys.stderr)
- return {}
+ return dates
else:
+ # use the first column that is either 'name', 'strain', 'accession'
index_col = sorted(potential_index_columns)[0][1]
+ print("\tUsing column '%s' as name. This needs match the taxon names in the tree!!"%index_col)
if len(potential_date_columns)>=1:
#try to parse the csv file with dates in the idx column:
idx = potential_date_columns[0][0]
col_name = potential_date_columns[0][1]
- dates = {}
+ print("\tUsing column '%s' as date."%col_name)
for ri, row in df.iterrows():
date_str = row.loc[col_name]
k = row.loc[index_col]
+ # try parsing as a float first
try:
dates[k] = float(date_str)
continue
except ValueError:
+ # try whether the date string can be parsed as [2002.2:2004.3]
+ # to indicate general ambiguous ranges
+ if date_str[0]=='[' and date_str[-1]==']' and len(date_str[1:-1].split(':'))==2:
+ try:
+ dates[k] = [float(x) for x in date_str[1:-1].split(':')]
+ continue
+ except ValueError:
+ pass
+ # try date format parsing 2017-08-12
try:
tmp_date = pd.to_datetime(date_str)
dates[k] = numeric_date(tmp_date)
- except ValueError:
+ except ValueError: # try ambiguous date format parsing 2017-XX-XX
lower, upper = ambiguous_date_to_date_range(date_str, '%Y-%m-%d')
if lower is not None:
dates[k] = [numeric_date(x) for x in [lower, upper]]
else:
- print("Metadata file has no column which looks like a sampling date!", file=sys.stderr)
+ print("ERROR: Metadata file has no column which looks like a sampling date!", file=sys.stderr)
if all(v is None for v in dates.values()):
- print("Cannot parse dates correctly! Check date format.", file=sys.stderr)
+ print("ERROR: Cannot parse dates correctly! Check date format.", file=sys.stderr)
return {}
return dates
except:
- print("Cannot read the metadata file!", file=sys.stderr)
+ print("ERROR: Cannot read the metadata file!", file=sys.stderr)
return {}
=====================================
treetime/wrappers.py
=====================================
@@ -11,6 +11,7 @@ from treetime import TreeAnc, GTR, TreeTime
from treetime import config as ttconf
from treetime import utils
from treetime.vcf_utils import read_vcf, write_vcf
+from treetime.seq_utils import alphabets
def assure_tree(params, tmp_dir='treetime_tmp'):
"""
@@ -39,7 +40,7 @@ def create_gtr(params):
model = params.gtr
gtr_params = params.gtr_params
if model == 'infer':
- gtr = GTR.standard('jc')
+ gtr = GTR.standard('jc', alphabet='aa' if params.aa else 'nuc')
else:
try:
kwargs = {}
@@ -60,7 +61,7 @@ def create_gtr(params):
infer_gtr = False
except:
print ("Could not create GTR model from input arguments. Using default (Jukes-Cantor 1969)")
- gtr = GTR.standard('jc')
+ gtr = GTR.standard('jc', alphabet='aa' if params.aa else 'nuc')
infer_gtr = False
return gtr
@@ -138,7 +139,8 @@ def read_if_vcf(params):
aln = sequences
if not hasattr(params, 'gtr') or params.gtr=="infer": #if not specified, set it:
- fixed_pi = [ref.count(base)/len(ref) for base in ['A','C','G','T','-']]
+ alpha = alphabets['aa'] if params.aa else alphabets['nuc']
+ fixed_pi = [ref.count(base)/len(ref) for base in alpha]
if fixed_pi[-1] == 0:
fixed_pi[-1] = 0.05
fixed_pi = [v-0.01 for v in fixed_pi]
@@ -185,8 +187,7 @@ def export_sequences_and_tree(tt, basename, is_vcf=False, zero_based=False,
fh_dates.write('%s\t%s\t%f\t%f\t%f\n'%(n.name, n.date, n.numdate,conf[0], conf[1]))
else:
fh_dates.write('%s\t%s\t%f\n'%(n.name, n.date, n.numdate))
- if n.up is None:
- continue
+
n.confidence=None
# due to a bug in older versions of biopython that truncated filenames in nexus export
# we truncate them by hand and make them unique.
@@ -204,14 +205,23 @@ def export_sequences_and_tree(tt, basename, is_vcf=False, zero_based=False,
n.comment+=(',' if n.comment else '&') + 'date=%1.2f'%n.numdate
# write tree to file
+ fmt_bl = "%1.6f" if tt.seq_len<1e6 else "%1.8e"
if timetree:
outtree_name = basename + 'timetree.nexus'
print("--- saved divergence times in \n\t %s\n"%dates_fname)
+ Phylo.write(tt.tree, outtree_name, 'nexus')
else:
outtree_name = basename + 'annotated_tree.nexus'
- Phylo.write(tt.tree, outtree_name, 'nexus')
+ Phylo.write(tt.tree, outtree_name, 'nexus', format_branch_length=fmt_bl)
print("--- tree saved in nexus format as \n\t %s\n"%outtree_name)
+ if timetree:
+ for n in tt.tree.find_clades():
+ n.branch_length = n.mutation_length
+ outtree_name = basename + 'divergence_tree.nexus'
+ Phylo.write(tt.tree, outtree_name, 'nexus', format_branch_length=fmt_bl)
+ print("--- divergence tree saved in nexus format as \n\t %s\n"%outtree_name)
+
def print_save_plot_skyline(tt, n_std=2.0, screen=True, save='', plot=''):
if plot:
@@ -452,8 +462,13 @@ def timetree(params):
"""
implementeing treetime tree
"""
- if params.relax==[]:
- params.relax=True
+ if params.relax is None:
+ relaxed_clock_params = None
+ elif params.relax==[]:
+ relaxed_clock_params=True
+ elif len(params.relax)==2:
+ relaxed_clock_params={'slack':params.relax[0], 'coupling':params.relax[1]}
+
dates = utils.parse_dates(params.dates)
if len(dates)==0:
@@ -474,10 +489,13 @@ def timetree(params):
aln, ref, fixed_pi = read_if_vcf(params)
is_vcf = True if ref is not None else False
branch_length_mode = params.branch_length_mode
- if is_vcf: #variable-site-only trees can have big branch lengths, setting this wrong.
+ #variable-site-only trees can have big branch lengths, the auto setting won't work.
+ if is_vcf or (params.aln and params.sequence_length):
if branch_length_mode == 'auto':
branch_length_mode = 'joint'
+
+
###########################################################################
### SET-UP and RUN
###########################################################################
@@ -488,6 +506,10 @@ def timetree(params):
aln=aln, gtr=gtr, seq_len=params.sequence_length,
verbose=params.verbose)
+ if not myTree.one_mutation:
+ print("TreeTime setup failed, exiting")
+ return 1
+
# coalescent model options
try:
coalescent = float(params.coalescent)
@@ -501,14 +523,18 @@ def timetree(params):
"a float, 'opt', 'const' or 'skyline'")
coalescent = None
+ vary_rate = params.confidence
+ if params.clock_std_dev and params.clock_rate:
+ vary_rate = params.clock_std_dev
root = None if params.keep_root else params.reroot
- success = myTree.run(root=root, relaxed_clock=params.relax,
+ success = myTree.run(root=root, relaxed_clock=relaxed_clock_params,
resolve_polytomies=(not params.keep_polytomies),
Tc=coalescent, max_iter=params.max_iter,
fixed_clock_rate=params.clock_rate,
n_iqd=params.clock_filter,
time_marginal="assign" if params.confidence else False,
+ vary_rate = vary_rate,
branch_length_mode = branch_length_mode,
fixed_pi=fixed_pi)
if success==ttconf.ERROR: # if TreeTime.run failed, exit
@@ -524,31 +550,45 @@ def timetree(params):
print(myTree.date2dist)
basename = get_basename(params, outdir)
- if coalescent in ['skyline', 'opt']:
+ if coalescent in ['skyline', 'opt', 'const']:
print("Inferred coalescent model")
- if coalescent=='skyline':
- print_save_plot_skyline(myTree, plot=basename+'skyline.pdf', save=basename+'skyline.tsv', screen=True)
- elif coalescent=='opt':
- Tc = myTree.merger_model.Tc.y[0]
- print(" --T_c: \t %1.4f \toptimized inverse merger rate"%Tc)
- print(" --N_e: \t %1.1f \tcorresponding pop size assument 50 gen/year\n"%(Tc/myTree.date2dist.clock_rate*50))
+ if coalescent=='skyline':
+ print_save_plot_skyline(myTree, plot=basename+'skyline.pdf', save=basename+'skyline.tsv', screen=True)
+ else:
+ Tc = myTree.merger_model.Tc.y[0]
+ print(" --T_c: \t %1.2e \toptimized inverse merger rate in units of substitutions"%Tc)
+ print(" --T_c: \t %1.2e \toptimized inverse merger rate in years"%(Tc/myTree.date2dist.clock_rate))
+ print(" --N_e: \t %1.2e \tcorresponding 'effective population size' assuming 50 gen/year\n"%(Tc/myTree.date2dist.clock_rate*50))
# plot
import matplotlib.pyplot as plt
from .treetime import plot_vs_years
leaf_count = myTree.tree.count_terminals()
- label_func = lambda x: x.name[:20] if (leaf_count<30 & x.is_terminal()) else ''
+ label_func = lambda x: (x.name if x.is_terminal() and ((leaf_count<30
+ and (not params.no_tip_labels))
+ or params.tip_labels) else '')
- plot_vs_years(myTree, show_confidence=False, label_func = label_func,
+ plot_vs_years(myTree, show_confidence=False, label_func=label_func,
confidence=0.9 if params.confidence else None)
tree_fname = (outdir + params.plot_tree)
plt.savefig(tree_fname)
print("--- saved tree as \n\t %s\n"%tree_fname)
+ plot_rtt(myTree, outdir + params.plot_rtt)
+ if params.relax:
+ fname = outdir+'substitution_rates.tsv'
+ print("--- wrote branch specific rates to\n\t %s\n"%fname)
+ with open(fname, 'w') as fh:
+ fh.write("#node\tclock_length\tmutation_length\trate\tfold_change\n")
+ for n in myTree.tree.find_clades(order="preorder"):
+ if n==myTree.tree.root:
+ continue
+ g = n.branch_length_interpolator.gamma
+ fh.write("%s\t%1.3e\t%1.3e\t%1.3e\t%1.2f\n"%(n.name, n.clock_length, n.mutation_length, myTree.date2dist.clock_rate*g, g))
+
export_sequences_and_tree(myTree, basename, is_vcf, params.zero_based,
timetree=True, confidence=params.confidence)
- plot_rtt(myTree, outdir + params.plot_rtt)
return 0
@@ -810,7 +850,13 @@ def estimate_clock_model(params):
ofile.write("#Dates of nodes that didn't have a specified date are inferred from the root-to-tip regression.\n")
for n in myTree.tree.get_terminals():
if hasattr(n, "raw_date_constraint") and (n.raw_date_constraint is not None):
- ofile.write("%s, %f, %f\n"%(n.name, n.raw_date_constraint, n.dist2root))
+ if np.isscalar(n.raw_date_constraint):
+ tmp_str = str(n.raw_date_constraint)
+ elif len(n.raw_date_constraint):
+ tmp_str = str(n.raw_date_constraint[0])+'-'+str(n.raw_date_constraint[1])
+ else:
+ tmp_str = ''
+ ofile.write("%s, %s, %f\n"%(n.name, tmp_str, n.dist2root))
else:
ofile.write("%s, %f, %f\n"%(n.name, d2d.numdate_from_dist2root(n.dist2root), n.dist2root))
for n in myTree.tree.get_nonterminals(order='preorder'):
View it on GitLab: https://salsa.debian.org/med-team/python-treetime/commit/4cf4459f78ea20dd007cd75494f5ce11f6885ba2
--
View it on GitLab: https://salsa.debian.org/med-team/python-treetime/commit/4cf4459f78ea20dd007cd75494f5ce11f6885ba2
You're receiving this email because of your account on salsa.debian.org.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://alioth-lists.debian.net/pipermail/debian-med-commit/attachments/20181205/9ccbcfad/attachment-0001.html>
More information about the debian-med-commit
mailing list