[med-svn] [Git][med-team/python-treetime][master] 4 commits: routine-update: New upstream version
Andreas Tille (@tille)
gitlab at salsa.debian.org
Tue Oct 11 20:14:48 BST 2022
Andreas Tille pushed to branch master at Debian Med / python-treetime
Commits:
53b2e0d5 by Andreas Tille at 2022-10-11T20:53:22+02:00
routine-update: New upstream version
- - - - -
83154f17 by Andreas Tille at 2022-10-11T21:02:52+02:00
New upstream version 0.9.4
- - - - -
e6c92f87 by Andreas Tille at 2022-10-11T21:03:08+02:00
Update upstream source from tag 'upstream/0.9.4'
Update to upstream version '0.9.4'
with Debian dir fe172fafc46f49d86d2fb3ef6ddc956e220f6daa
- - - - -
ecd6e268 by Andreas Tille at 2022-10-11T21:05:23+02:00
routine-update: Ready to upload to unstable
- - - - -
15 changed files:
- README.md
- changelog.md
- debian/changelog
- docs/source/APIdoc.rst
- docs/source/installation.rst
- treetime/__init__.py
- treetime/argument_parser.py
- treetime/clock_tree.py
- treetime/distribution.py
- treetime/gtr.py
- treetime/treeanc.py
- treetime/treeregression.py
- treetime/treetime.py
- treetime/utils.py
- treetime/wrappers.py
Changes:
=====================================
README.md
=====================================
@@ -8,7 +8,7 @@
TreeTime provides routines for ancestral sequence reconstruction and inference of molecular-clock phylogenies, i.e., a tree where all branches are scaled such that the positions of terminal nodes correspond to their sampling times and internal nodes are placed at the most likely time of divergence.
-To optimize the likelihood of time-scaled phylogenies, TreeTime uses an iterative approach that first infers ancestral sequences given the branch length of the tree, then optimizes the positions of unconstrained nodes on the time axis, and then repeats this cycle.
+To optimize the likelihood of time-scaled phylogenies, TreeTime uses an iterative approach that first infers ancestral sequences given the branch length of the tree (assuming the branch length of the input tree is in units of average number of nucleotide or protein substitutions per site in the sequence). After infering ancestral sequences TreeTime optimizes the positions of unconstrained nodes on the time axis, and then repeats this cycle.
The only topology optimization are (optional) resolution of polytomies in a way that is most (approximately) consistent with the sampling time constraints on the tree.
The package is designed to be used as a stand-alone tool on the command-line or as a library used in larger phylogenetic analysis work-flows.
[The documentation of TreeTime is hosted on readthedocs.org](https://treetime.readthedocs.io/en/latest/).
=====================================
changelog.md
=====================================
@@ -1,3 +1,18 @@
+# 0.9.4: bug fix and performance improvments
+
+ * avoid negative variance associated with branch lengths in tree regression. This could happen in rare cases when marginal time tree estimation returned short negative branch length and the variance was estimated as being proportional to branch length. Variances in the `TreeRegression` clock model are now always non-negative.
+ * downsample the grid during multiplication of distribution objects. This turned out to be an issue for trees with very large polytomies. In these cases, a large number of distributions get multiplied which resulted in grid sizes above 100000 points. Grid sizes are now downsampled to the average grid size.
+
+# 0.9.3
+
+ * Add extra error class for "unknown" (==unhandled) errors
+ * Wrap `run` function and have it optionally raise unhandled exceptions as `TreeTimeUnknownError`.
+ This is mainly done to improve interaction with `augur` that uses `TreeTime` internals as a library.
+ (both by @anna-parker with input from @victorlin)
+
+[PR #206](https://github.com/neherlab/treetime/pull/206)
+[PR #208](https://github.com/neherlab/treetime/pull/208)
+
# 0.9.2
bug fix release:
* CLI now works for windows (thanks @corneliusroemer for the fix)
=====================================
debian/changelog
=====================================
@@ -1,3 +1,9 @@
+python-treetime (0.9.4-1) unstable; urgency=medium
+
+ * New upstream version
+
+ -- Andreas Tille <tille at debian.org> Tue, 11 Oct 2022 21:03:31 +0200
+
python-treetime (0.9.2-1) unstable; urgency=medium
* New upstream version
=====================================
docs/source/APIdoc.rst
=====================================
@@ -35,7 +35,7 @@ Core classes
~~~~~~~~~~~~~~~~~~~~~
:doc:`Coalescent class<merger_models>`
-~~~~~~~~~~~~~~~~~~~~~
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
=====================================
docs/source/installation.rst
=====================================
@@ -54,7 +54,7 @@ The API documentation for the TreeTime package is generated created with Sphinx.
.. code:: bash
- pip install recommonmark sphinx-argparse
+ pip install recommonmark sphinx-argparse sphinx_rtd_theme
After required packages are installed, navigate to doc directory, and build the docs by typing:
=====================================
treetime/__init__.py
=====================================
@@ -1,7 +1,15 @@
-version="0.9.2"
-
+version="0.9.4"
+## Here we define an error class for TreeTime errors, MissingData, UnknownMethod and NotReady errors
+## are all due to incorrect calling of TreeTime functions or input data that does not fit our base assumptions.
+## Errors marked as TreeTimeUnknownErrors might be due to data not fulfilling base assumptions or due
+## to bugs in TreeTime. Please report them to the developers if they persist.
class TreeTimeError(Exception):
- """TreeTimeError class"""
+ """
+ TreeTimeError class
+ Parent class for more specific errors
+ Raised when treetime is used incorrectly in contrast with `TreeTimeUnknownError`
+ `TreeTimeUnknownError` is raised when the reason of the error is unknown, could indicate bug
+ """
pass
class MissingDataError(TreeTimeError):
@@ -9,13 +17,17 @@ class MissingDataError(TreeTimeError):
pass
class UnknownMethodError(TreeTimeError):
- """MissingDataError class raised when tree or alignment are missing"""
+ """MissingDataError class raised when an unknown method is called"""
pass
class NotReadyError(TreeTimeError):
"""NotReadyError class raised when results are requested before inference"""
pass
+class TreeTimeUnknownError(Exception):
+ """TreeTimeUnknownError class raised when TreeTime fails during inference due to an unknown reason. This might be due to data not fulfilling base assumptions or due to bugs in TreeTime. Please report them to the developers if they persist."""
+ pass
+
from .treeanc import TreeAnc
from .treetime import TreeTime, plot_vs_years
=====================================
treetime/argument_parser.py
=====================================
@@ -79,10 +79,11 @@ reroot_description = "Reroot the tree using root-to-tip regression. Valid choice
"By default, TreeTime will reroot using 'least-squares'. "\
"Use --keep-root to keep the current root."
-tree_description = "Name of file containing the tree in "\
- "newick, nexus, or phylip format. If none is provided, "\
- "treetime will attempt to build a tree from the alignment "\
- "using fasttree, iqtree, or raxml (assuming they are installed)"
+tree_description = "Name of file containing the tree in newick, nexus, or phylip format, "\
+ "the branch length of the tree should be in units of average number of nucleotide or protein "\
+ "substitutions per site. If no file is provided, treetime will attempt "\
+ "to build a tree from the alignment using fasttree, iqtree, or raxml "\
+ "(assuming they are installed). "
aln_description = "alignment file (fasta)"
=====================================
treetime/clock_tree.py
=====================================
@@ -1,6 +1,6 @@
import numpy as np
from . import config as ttconf
-from . import MissingDataError
+from . import MissingDataError, UnknownMethodError
from .treeanc import TreeAnc
from .utils import numeric_date, DateConversion, datestring_from_numeric
from .distribution import Distribution
@@ -77,7 +77,7 @@ class ClockTree(TreeAnc):
"""
super(ClockTree, self).__init__(*args, **kwargs)
if dates is None:
- raise ValueError("ClockTree requires date constraints!")
+ raise MissingDataError("ClockTree requires date constraints!")
self.debug=debug
self.real_dates = real_dates
@@ -203,7 +203,7 @@ class ClockTree(TreeAnc):
" fft_grid_points=%.3e"%(precision_fft), 2)
self.fft_grid_size = precision_fft
elif precision_fft!='auto':
- raise ValueError(f"ClockTree: precision_fft needs to be either 'auto' or an integer, got '{precision_fft}'.")
+ raise UnknownMethodError(f"ClockTree: precision_fft needs to be either 'auto' or an integer, got '{precision_fft}'.")
else:
self.fft_grid_size = ttconf.FFT_FWHM_GRID_SIZE
if type(precision_branch) is int:
@@ -244,7 +244,7 @@ class ClockTree(TreeAnc):
branch_value = lambda x:x.mutation_length
if covariation:
om = self.one_mutation
- branch_variance = lambda x:((x.clock_length if hasattr(x,'clock_length') else x.mutation_length)
+ branch_variance = lambda x:((max(0,x.clock_length) if hasattr(x,'clock_length') else x.mutation_length)
+(self.tip_slack**2*om if x.is_terminal() else 0.0))*om
else:
branch_variance = lambda x:1.0 if x.is_terminal() else 0.0
@@ -256,6 +256,7 @@ class ClockTree(TreeAnc):
def get_clock_model(self, covariation=True, slope=None):
+ self.logger(f'ClockTree.get_clock_model: estimating clock model with covariation={covariation}',3)
Treg = self.setup_TreeRegression(covariation=covariation)
self.clock_model = Treg.regression(slope=slope)
if not np.isfinite(self.clock_model['slope']):
@@ -596,7 +597,7 @@ class ClockTree(TreeAnc):
pass
method = 'FFT' if self.use_fft else 'explicit'
- self.logger("ClockTree - Marginal reconstruction: Propagating leaves -> root...", 2)
+ self.logger(f"ClockTree - Marginal reconstruction using {method} convolution: Propagating leaves -> root...", 2)
# go through the nodes from leaves towards the root:
for node in self.tree.find_clades(order='postorder'): # children first, msg to parents
if node.bad_branch:
@@ -640,7 +641,7 @@ class ClockTree(TreeAnc):
## for k branches, but due to the fact that inner branches overlap at time t one can be removed
## resulting in the exponent (k-1))
if hasattr(self, 'merger_model') and self.merger_model:
- time_points = np.unique(np.concatenate([msg.x for msg in msgs_to_multiply]))
+ time_points = node.product_of_child_messages.x
# set multiplicity of node to number of good child branches
if node.is_terminal():
merger_contribution = Distribution(time_points, -self.merger_model.integral_merger_rate(time_points), is_log=True)
@@ -701,7 +702,7 @@ class ClockTree(TreeAnc):
complementary_msgs.append(parent.msg_from_parent)
if hasattr(self, 'merger_model') and self.merger_model:
- time_points = np.unique(np.concatenate([msg.x for msg in complementary_msgs]))
+ time_points = parent.marginal_pos_LH.x
# As Lx do not include the node contribution this must be added on
complementary_msgs.append(self.merger_model.node_contribution(parent, time_points))
=====================================
treetime/distribution.py
=====================================
@@ -49,6 +49,10 @@ class Distribution(object):
# 0.69... is log(2), there is always one value for which this is true since
# the minimum is subtracted
tmp = np.where(log_prob < 0.693147)[0]
+ if len(tmp)==0:
+ raise ValueError("Error in computing the FWHM for the distribution. This is "
+ "most likely caused by incorrect input data.")
+
x_l, x_u = tmp[0], tmp[-1]
if L < 2:
print ("Not enough points to compute FWHM: returning zero")
@@ -98,6 +102,14 @@ class Distribution(object):
x_vals = np.unique(np.concatenate([k.x for k in dists]))
x_vals = x_vals[(x_vals> new_xmin-TINY_NUMBER)&(x_vals< new_xmax+TINY_NUMBER)]
+ n_dists = len(dists)
+ if len(x_vals)>100*n_dists and n_dists>3:
+ n_bins = len(x_vals)//n_dists - 6
+ lower_cut_off = n_dists*3
+ upper_cut_off = n_dists*(n_bins + 3)
+ x_vals = np.concatenate((x_vals[:lower_cut_off],
+ x_vals[lower_cut_off:upper_cut_off].reshape((-1,n_dists)).mean(axis=1),
+ x_vals[upper_cut_off:]))
y_vals = np.sum([k.__call__(x_vals) for k in dists], axis=0)
try:
peak = y_vals.min()
=====================================
treetime/gtr.py
=====================================
@@ -1,6 +1,6 @@
from collections import defaultdict
import numpy as np
-from . import config as ttconf
+from . import config as ttconf, TreeTimeError, MissingDataError
from .seq_utils import alphabets, profile_maps, alphabet_synonyms
def avg_transition(W,pi, gap_index=None):
@@ -951,7 +951,12 @@ class GTR(object):
Array of values exp(lambda(i) * t),
where (i) - alphabet index (the eigenvalue number).
"""
- return np.exp(self.mu * t * self.eigenvals)
+ log_val = self.mu * t * self.eigenvals
+ if any(i > 10 for i in log_val):
+ raise ValueError("Error in computing exp(Q * t): Q has positive eigenvalues or the branch length t is too large. "
+ "This is most likely caused by incorrect input data.")
+
+ return np.exp(log_val)
def expQt(self, t):
=====================================
treetime/treeanc.py
=====================================
@@ -63,8 +63,10 @@ class TreeAnc(object):
Parameters
----------
tree : str, Bio.Phylo.Tree
- Phylogenetic tree. String passed is interpreted as a filename with
- a tree in a standard format that can be parsed by the Biopython Phylo module.
+ Phylogenetic tree. String passed is interpreted as a filename with
+ a tree in a standard format that can be parsed by the Biopython Phylo module.
+ Branch length should be in units of average number of nucleotide or protein
+ substitutions per site. Use on trees with longer branches (>4) is not recommended.
aln : str, Bio.Align.MultipleSequenceAlignment, dict
Sequence alignment. If a string passed, it is interpreted as the
@@ -145,7 +147,7 @@ class TreeAnc(object):
self._tree = None
self.tree = tree
if tree is None:
- raise ValueError("TreeAnc: tree loading failed! exiting")
+ raise MissingDataError("TreeAnc: tree loading failed! exiting")
# set up GTR model
self._gtr = None
@@ -309,20 +311,28 @@ class TreeAnc(object):
if fmt in ['nexus', 'nex']:
self._tree=Phylo.read(in_tree, 'nexus')
else:
- raise ValueError('TreeAnc: could not load tree, format needs to be nexus or newick! input was '+str(in_tree))
+ raise MissingDataError('TreeAnc: could not load tree, format needs to be nexus or newick! input was '+str(in_tree))
else:
- raise ValueError('TreeAnc: could not load tree! input was '+str(in_tree))
+ raise MissingDataError('TreeAnc: could not load tree! input was '+str(in_tree))
if self._tree.count_terminals()<3:
- raise ValueError('TreeAnc: tree in %s as only %d tips. Please check your tree!'%(str(in_tree), self._tree.count_terminals()))
+ raise MissingDataError('TreeAnc: tree in %s as only %d tips. Please check your tree!'%(str(in_tree), self._tree.count_terminals()))
# remove all existing sequence attributes
+ branch_length_warning = False
for node in self._tree.find_clades():
node.branch_length = node.branch_length if node.branch_length else 0.0
+ if node.branch_length > ttconf.MAX_BRANCH_LENGTH:
+ branch_length_warning = True
if hasattr(node, "_cseq"):
node.__delattr__("_cseq")
node.original_length = node.branch_length
node.mutation_length = node.branch_length
+ if branch_length_warning:
+ self.logger("WARNING: TreeTime has detected branches that are longer than %d. "
+ "TreeTime requires trees where branch length is in units of average number "
+ "of nucleotide or protein substitutions per site. "
+ "Use on trees with longer branches is not recommended for ancestral sequence reconstruction."%(ttconf.MAX_BRANCH_LENGTH), 0, warn=True)
self.prepare_tree()
if self.data:
@@ -518,7 +528,7 @@ class TreeAnc(object):
elif method.lower() in ['fitch', 'parsimony']:
_ml_anc = self._fitch_anc
else:
- raise ValueError("Reconstruction method needs to be in ['ml', 'probabilistic', 'fitch', 'parsimony'], got '{}'".format(method))
+ raise UnknownMethodError("Reconstruction method needs to be in ['ml', 'probabilistic', 'fitch', 'parsimony'], got '{}'".format(method))
if infer_gtr:
self.infer_gtr(marginal=marginal, **kwargs)
=====================================
treetime/treeregression.py
=====================================
@@ -119,7 +119,7 @@ class TreeRegression(object):
def Cov(self):
"""
calculate the covariance matrix of the tips assuming variance
- has accumulated along branches of the tree accoriding to the
+ has accumulated along branches of the tree according to the
the provided
Returns
-------
@@ -347,7 +347,7 @@ class TreeRegression(object):
best_root.update(reg)
if 'node' not in best_root:
- print("TreeRegression.find_best_root: No valid root found!", force_positive)
+ print(f"TreeRegression.find_best_root: No valid root found! force_positive={force_positive}")
return None
if 'hessian' in best_root:
=====================================
treetime/treetime.py
=====================================
@@ -2,7 +2,7 @@ import numpy as np
from scipy import optimize as sciopt
from Bio import Phylo
from . import config as ttconf
-from . import MissingDataError,UnknownMethodError,NotReadyError,TreeTimeError
+from . import MissingDataError,UnknownMethodError,NotReadyError,TreeTimeError, TreeTimeUnknownError
from .utils import tree_layout
from .clock_tree import ClockTree
@@ -25,7 +25,7 @@ def reduce_time_marginal_argument(input_time_marginal):
elif input_time_marginal == 'confidence-only':
return input_time_marginal
else:
- raise ValueError(f"'{input_time_marginal}' is not a known time marginal argument")
+ raise UnknownMethodError(f"'{input_time_marginal}' is not a known time marginal argument")
class TreeTime(ClockTree):
@@ -51,7 +51,29 @@ class TreeTime(ClockTree):
super(TreeTime, self).__init__(*args, **kwargs)
- def run(self, root=None, infer_gtr=True, relaxed_clock=None, n_iqd = None,
+ def run(self, raise_uncaught_exceptions=False, **kwargs):
+ import sys
+ try:
+ return self._run(**kwargs)
+ except TreeTimeError as err:
+ if raise_uncaught_exceptions:
+ raise err
+ else:
+ print(f"ERROR: {err} \n", file=sys.stderr)
+ sys.exit(2)
+ except BaseException as err:
+ import traceback
+ print(traceback.format_exc(), file=sys.stderr)
+ print(f"ERROR: {err} \n ", file=sys.stderr)
+ print("ERROR in TreeTime.run: An error occurred which was not properly handled in TreeTime. If this error persists, please let us know "
+ "by filing a new issue including the original command and the error above at: https://github.com/neherlab/treetime/issues \n", file=sys.stderr)
+ if raise_uncaught_exceptions:
+ raise TreeTimeUnknownError() from err
+ else:
+ sys.exit(2)
+
+
+ def _run(self, root=None, infer_gtr=True, relaxed_clock=None, n_iqd = None,
resolve_polytomies=True, max_iter=0, Tc=None, fixed_clock_rate=None,
time_marginal='never', sequence_marginal=False, branch_length_mode='auto',
vary_rate=False, use_covariation=False, tracelog_file=None,
@@ -177,7 +199,7 @@ class TreeTime(ClockTree):
time_marginal=kwargs["do_marginal"]
if assign_gamma and relaxed_clock:
- raise TreeTimeError("assign_gamma and relaxed clock are incompatible arguments")
+ raise UnknownMethodError("assign_gamma and relaxed clock are incompatible arguments")
# initially, infer ancestral sequences and infer gtr model if desired
if self.branch_length_mode=='input':
=====================================
treetime/utils.py
=====================================
@@ -2,7 +2,7 @@ import os,sys
import datetime
import pandas as pd
import numpy as np
-from . import TreeTimeError
+from . import TreeTimeError, MissingDataError
class DateConversion(object):
"""
@@ -263,11 +263,11 @@ def parse_dates(date_file, name_col=None, date_col=None):
potential_index_columns.append((ci, col))
if date_col and date_col not in df.columns:
- raise TreeTimeError("ERROR: specified column for dates does not exist. \n\tAvailable columns are: "\
+ raise MissingDataError("ERROR: specified column for dates does not exist. \n\tAvailable columns are: "\
+", ".join(df.columns)+"\n\tYou specified '%s'"%date_col)
if name_col and name_col not in df.columns:
- raise TreeTimeError("ERROR: specified column for the taxon name does not exist. \n\tAvailable columns are: "\
+ raise MissingDataError("ERROR: specified column for the taxon name does not exist. \n\tAvailable columns are: "\
+", ".join(df.columns)+"\n\tYou specified '%s'"%name_col)
@@ -275,7 +275,7 @@ def parse_dates(date_file, name_col=None, date_col=None):
# if a potential numeric date column was found, use it
# (use the first, if there are more than one)
if not (len(potential_index_columns) or name_col):
- raise TreeTimeError("ERROR: Cannot read metadata: need at least one column that contains the taxon labels."
+ raise MissingDataError("ERROR: Cannot read metadata: need at least one column that contains the taxon labels."
" Looking for the first column that contains 'name', 'strain', or 'accession' in the header.")
else:
# use the first column that is either 'name', 'strain', 'accession'
@@ -320,10 +320,10 @@ def parse_dates(date_file, name_col=None, date_col=None):
dates[k] = [numeric_date(x) for x in [lower, upper]]
else:
- raise TreeTimeError("ERROR: Metadata file has no column which looks like a sampling date!")
+ raise MissingDataError("ERROR: Metadata file has no column which looks like a sampling date!")
if all(v is None for v in dates.values()):
- raise TreeTimeError("ERROR: Cannot parse dates correctly! Check date format.")
+ raise MissingDataError("ERROR: Cannot parse dates correctly! Check date format.")
return dates
except TreeTimeError as err:
raise err
=====================================
treetime/wrappers.py
=====================================
@@ -8,7 +8,7 @@ from . import TreeAnc, GTR, TreeTime
from . import utils
from .vcf_utils import read_vcf, write_vcf
from .seq_utils import alphabets
-from . import TreeTimeError, MissingDataError
+from . import TreeTimeError, MissingDataError, UnknownMethodError
from .treetime import reduce_time_marginal_argument
def assure_tree(params, tmp_dir='treetime_tmp'):
@@ -794,7 +794,7 @@ def reconstruct_discrete_traits(tree, traits, missing_data='?', pc=1.0, sampling
if len(missing_weights)>0.5*n_observed_states:
print("More than half of discrete states missing from the weights file")
print("Weights read from file are:", weights)
- raise TreeTimeError("More than half of discrete states missing from the weights file")
+ raise MissingDataError("More than half of discrete states missing from the weights file")
unique_states=sorted(unique_states)
# make a map from states (excluding missing data) to characters in the alphabet
@@ -1022,7 +1022,7 @@ def estimate_clock_model(params):
try:
res = myTree.reroot(params.reroot,
force_positive=not params.allow_negative_rate)
- except TreeTimeError as e:
+ except UnknownMethodError as e:
print("ERROR: unknown root or rooting mechanism!")
raise e
@@ -1071,7 +1071,7 @@ def estimate_clock_model(params):
tmp_str = ''
ofile.write("%s, %s, %f, %f\n"%(n.name, tmp_str, n.dist2root, clock_deviation))
else:
- ofile.write("%s, %f, %f, %f\n"%(n.name, d2d.numdate_from_dist2root(n.dist2root), n.dist2root, clock_deviation))
+ ofile.write("%s, %f, %f, 0.0\n"%(n.name, d2d.numdate_from_dist2root(n.dist2root), n.dist2root))
for n in myTree.tree.get_nonterminals(order='preorder'):
ofile.write("%s, %f, %f, 0.0\n"%(n.name, d2d.numdate_from_dist2root(n.dist2root), n.dist2root))
print("--- wrote dates and root-to-tip distances to \n\t%s\n"%table_fname)
View it on GitLab: https://salsa.debian.org/med-team/python-treetime/-/compare/c221b5dd2e36e4e97dc0da45efee16ec3da7c565...ecd6e26872f9bb5bf1b5f657e40f7940d3d6a78f
--
View it on GitLab: https://salsa.debian.org/med-team/python-treetime/-/compare/c221b5dd2e36e4e97dc0da45efee16ec3da7c565...ecd6e26872f9bb5bf1b5f657e40f7940d3d6a78f
You're receiving this email because of your account on salsa.debian.org.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://alioth-lists.debian.net/pipermail/debian-med-commit/attachments/20221011/71123b57/attachment-0001.htm>
More information about the debian-med-commit
mailing list