[med-svn] [Git][med-team/python-treetime][master] 4 commits: routine-update: New upstream version

Andreas Tille (@tille) gitlab at salsa.debian.org
Sun Jul 17 21:04:58 BST 2022



Andreas Tille pushed to branch master at Debian Med / python-treetime


Commits:
d13544a0 by Andreas Tille at 2022-07-17T16:10:56+02:00
routine-update: New upstream version

- - - - -
223bed68 by Andreas Tille at 2022-07-17T22:01:44+02:00
New upstream version 0.9.1
- - - - -
1b44b7bd by Andreas Tille at 2022-07-17T22:02:00+02:00
Update upstream source from tag 'upstream/0.9.1'

Update to upstream version '0.9.1'
with Debian dir c8586f0f13d782b7fdca6d05adc8a278c133e547
- - - - -
ee80d288 by Andreas Tille at 2022-07-17T22:04:22+02:00
routine-update: Ready to upload to unstable

- - - - -


20 changed files:

- − .github/workflows/ci.yml
- − .gitignore
- changelog.md
- − debian-tests-data/.gitignore
- debian/changelog
- docs/source/tutorials/timetree.rst
- test/run_tests.py
- test/test_treetime.py
- treetime/__init__.py
- treetime/arg.py
- treetime/argument_parser.py
- treetime/branch_len_interpolator.py
- treetime/clock_tree.py
- treetime/distribution.py
- treetime/merger_models.py
- treetime/node_interpolator.py
- treetime/seqgen.py
- treetime/sequence_data.py
- treetime/treeanc.py
- treetime/treetime.py


Changes:

=====================================
.github/workflows/ci.yml deleted
=====================================
@@ -1,47 +0,0 @@
-name: CI
-
-on:
-  pull_request:
-  push:
-    branches:
-      - master
-
-concurrency:
-  group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref || github.run_id }}
-  cancel-in-progress: true
-
-jobs:
-  test:
-    name: 'test with python ${{ matrix.python-version }}'
-    runs-on: ubuntu-20.04
-    strategy:
-      fail-fast: false
-      matrix:
-        python-version:
-          - '3.6'
-          - '3.7'
-          - '3.8'
-          - '3.9'
-          - '3.10'
-
-    defaults:
-      run:
-        shell: bash -l {0}
-
-    steps:
-      - uses: actions/checkout at v3
-
-      - uses: conda-incubator/setup-miniconda at v2
-        with:
-          python-version: ${{ matrix.python-version }}
-          activate-environment: test
-
-      - run: conda install numpy scipy biopython pandas matplotlib pytest
-
-      - run: python setup.py install
-
-      - run: conda info
-
-      - run: conda list
-
-      - run: bash test.sh


=====================================
.gitignore deleted
=====================================
@@ -1,99 +0,0 @@
-*~
-# KDE directory preferences
- .directory
-# # Linux trash folder which might appear on any partition or disk
- .Trash-*
-
-
-#vscode
-.vscode/
-
-
-#VIM backup and swap
-[a-w][a-z]
-[._]s[a-w][a-z]
-*.un~
-Session.vim
-.netrwhist
-*~
-.ropeproject/
-
-#python compiled files
-# Byte-compiled / optimized / DLL files
-__pycache__/
-*.py[cod]
-# C extensions
-*.so
-# Distribution / packaging
-.Python
-env/
-build/
-develop-eggs/
-dist/
-downloads/
-eggs/
-.eggs/
-lib/
-lib64/
-parts/
-sdist/
-var/
-*.egg-info/
-.installed.cfg
-*.egg
-# PyInstaller
-# Usually these files are written by a python script from a template
-# before PyInstaller builds the exe, so as to inject date/other infos into it.
-*.manifest
-*.spec
-# Installer logs
-pip-log.txt
-pip-delete-this-directory.txt
-# Unit test / coverage reports
-htmlcov/
-.tox/
-.coverage
-.coverage.*
-.cache
-nosetests.xml
-coverage.xml
-*,cover
-# Translations
-*.mo
-*.pot
-# Django stuff:
-*.log
-# Sphinx documentation
-docs/_build/
-# PyBuilder
-target/
-
-#file for sublime text
-*.tmlanguage.cache
-*.tmPreferences.cache
-*.stTheme.cache
-# workspace files are user-specific
-*.sublime-workspace
-# project files should be checked into the repository, unless a significant
-# proportion of contributors will probably not be using SublimeText
-*.sublime-project
-# sftp configuration file
-sftp-config.json
-
-#data files other resources
-*.svg
-
-!data/
-
-# OS generated files #
-######################
-.DS_Store
-.DS_Store?
-._*
-.Spotlight-V100
-.Trashes
-Icon?
-ehthumbs.db
-Thumbs.db
-
-test/treetime_examples
\ No newline at end of file


=====================================
changelog.md
=====================================
@@ -1,3 +1,10 @@
+# 0.9.1
+This release is mostly a bug-fix release and contains some additional safeguards against unwanted side-effects of greedy polytomy resolution.
+
+ * resolve polytomies only when significant LH gain can be achieved
+ * performance enhancement in pre-order iteration during marginal time tree estimate when hitting large polytomies.
+ * allow users to set branch specific rates (only when used as a library)
+
 # 0.9.0
 
 This release contains several major changes to how TreeTime calculates time scaled phylogenies.
@@ -39,7 +46,7 @@ This release fixes a few bugs and adds a few features
  * output statistics of different iterations of the treetime optimization loop (trace-log, thanks to @ktmeaton)
  * speed ups by @akislyuk
  * fix errors with dates in the distant future
- * better precision of tablular skyline output
+ * better precision of tabular skyline output
  * adds clock-deviation to the root-to-tip output of the `clock` command
 
 


=====================================
debian-tests-data/.gitignore deleted
=====================================
@@ -1,2 +0,0 @@
-201*
-*results


=====================================
debian/changelog
=====================================
@@ -1,3 +1,9 @@
+python-treetime (0.9.1-1) unstable; urgency=medium
+
+  * New upstream version
+
+ -- Andreas Tille <tille at debian.org>  Sun, 17 Jul 2022 22:02:29 +0200
+
 python-treetime (0.9.0-1) unstable; urgency=medium
 
   * Team upload.


=====================================
docs/source/tutorials/timetree.rst
=====================================
@@ -10,7 +10,7 @@ This tutorial uses data provided in the github repository `github.com/neherlab/t
    treetime --tree data/h3n2_na/h3n2_na_20.nwk --dates data/h3n2_na/h3n2_na_20.metadata.csv --aln data/h3n2_na/h3n2_na_20.fasta --outdir h3n2_timetree
 
 The tree can be in newick or nexus format, the alignment in fasta or phylip, and the dates should be given as a tsv or csv file.
-TreeTime will attempt to parse dates, preferred formats are are "%Y-%m-%d" or numerical as in 2016.45.
+TreeTime will attempt to parse dates, preferred formats are "%Y-%m-%d" or numerical as in YYYY.F (where 'F' is the fraction of the year that has passed, in days. For example 2016.04918 means that 18/366 days of 2016 have passed). 
 This command will estimate an GTR model, a molecular clock model, and a time-stamped phylogeny.
 The results are saved to several files in the directory specified as `outdir` and printed to standard out:
 
@@ -63,7 +63,7 @@ The results are saved to several files in the directory specified as `outdir` an
    --- root-to-tip plot saved to
        h3n2_timetree/root_to_tip_regression.pdf
 
-Other output files include an alignment with reconstructed ancestral sequences, an annotated tree in nexus format in which branch length correspond to years and mutations and node dates are added as comments to each node.
+Other output files include an alignment with reconstructed ancestral sequences, an annotated tree in nexus format in which branch length correspond to years and mutations and node dates (in the numeric format detailed above) are added as comments to each node.
 In addition, the root-to-tip vs time regression and the tree are drawn and saved to file.
 
 


=====================================
test/run_tests.py
=====================================
@@ -2,6 +2,8 @@ from test_treetime import *
 
 test_import_short()
 
+test_assign_gamma()
+
 test_GTR()
 
 test_ancestral()


=====================================
test/test_treetime.py
=====================================
@@ -10,6 +10,27 @@ def test_import_short():
     from treetime import TreeAnc
     from treetime import seq_utils
 
+def test_assign_gamma(root_dir=None):
+    print("testing assign gamma")
+    import os
+    from treetime import TreeTime
+    from treetime.utils import parse_dates
+    if root_dir is None:
+        root_dir = os.path.dirname(os.path.realpath(__file__))
+    fasta = root_dir + "/treetime_examples/data/h3n2_na/h3n2_na_20.fasta"
+    nwk = root_dir + "/treetime_examples/data/h3n2_na/h3n2_na_20.nwk"
+    dates = parse_dates(root_dir + "/treetime_examples/data/h3n2_na/h3n2_na_20.metadata.csv")
+    seq_kwargs = {"marginal_sequences":True,
+                    "branch_length_mode": 'input',
+                    "reconstruct_tip_states":False}
+    tt_kwargs = {'clock_rate': 0.0001,
+                    'time_marginal':'assign'}
+    myTree = TreeTime(gtr='Jukes-Cantor', tree = nwk, use_fft=False,
+                    aln = fasta, verbose = 1, dates = dates, precision=3, debug=True)
+    def assign_gamma(tree):
+        return tree
+    success = myTree.run(infer_gtr=False, assign_gamma=assign_gamma, max_iter=1, verbose=3, **seq_kwargs, **tt_kwargs)
+    assert success
 
 def test_GTR():
     from treetime import GTR


=====================================
treetime/__init__.py
=====================================
@@ -1,4 +1,4 @@
-version="0.9.0"
+version="0.9.1"
 
 class TreeTimeError(Exception):
     """TreeTimeError class"""


=====================================
treetime/arg.py
=====================================
@@ -59,9 +59,8 @@ def parse_arg(tree1, tree2, aln1, aln2, MCC_file, fill_overhangs=True):
     return {"MCCs": MCCs, "trees":[t1,t2], "alignment":MultipleSeqAlignment(aln_combined),
             "masks":[mask1,mask2], "combined_mask":combined_mask}
 
-
 def setup_arg(T, aln, total_mask, segment_mask, dates, MCCs, gtr='JC69',
-              verbose=0, fill_overhangs=True, reroot=True, fixed_clock_rate=None, **kwargs):
+            verbose=0, fill_overhangs=True, reroot=True, fixed_clock_rate=None, alphabet='nuc', **kwargs):
     """construct a TreeTime object with the appropriate masks on each node
     for branch length optimization with full or segment only alignment.
 
@@ -83,9 +82,9 @@ def setup_arg(T, aln, total_mask, segment_mask, dates, MCCs, gtr='JC69',
     from treetime import TreeTime
 
     tt = TreeTime(dates=dates, tree=T,
-                  aln=aln, gtr=gtr, alphabet='nuc', verbose=verbose,
-                  fill_overhangs=fill_overhangs, keep_node_order=True,
-                  compress=False, **kwargs)
+            aln=aln, gtr=gtr, alphabet=alphabet, verbose=verbose,
+            fill_overhangs=fill_overhangs, keep_node_order=True,
+            compress=False, **kwargs)
 
 
     if reroot:


=====================================
treetime/argument_parser.py
=====================================
@@ -195,7 +195,7 @@ def add_timetree_args(parser):
     parser.add_argument('--n-branches-posterior', default=False, action='store_true',
                           help= "add posterior LH to coalescent model: use the posterior probability distributions of "
                                 "divergence times for estimating the number of branches when calculating the coalescent merger"
-                                "rate or use infered time before present (default)." )
+                                "rate or use inferred time before present (default)." )
     parser.add_argument('--plot-tree', default="timetree.pdf",
                             help = "filename to save the plot to. Suffix will determine format"
                                    " (choices pdf, png, svg, default=pdf)")


=====================================
treetime/branch_len_interpolator.py
=====================================
@@ -109,24 +109,10 @@ class BranchLenInterpolator (Distribution):
 
     @gamma.setter
     def gamma(self, value):
-        self._gamma = max(ttconf.TINY_NUMBER, value)
-
-
-    @property
-    def peak_pos(self):
-        return super(BranchLenInterpolator,self).peak_pos/self.gamma
-
-    @property
-    def support(self):
-        return self._support/self.gamma
-
-    @property
-    def fwhm(self):
-        return super(BranchLenInterpolator,self).fwhm/self.gamma
-
-    @property
-    def effective_support(self):
-        return tuple((x/self.gamma for x in super(BranchLenInterpolator,self).effective_support))
+        new_gamma = max(ttconf.TINY_NUMBER, value)
+        ratio = self._gamma/new_gamma
+        self.x_rescale(ratio)
+        self._gamma = new_gamma
 
     def __mul__(self, other):
         res = BranchLenInterpolator(super(BranchLenInterpolator, self).__mul__(other), gtr=self.gtr)


=====================================
treetime/clock_tree.py
=====================================
@@ -354,7 +354,7 @@ class ClockTree(TreeAnc):
                 node.date_constraint = None
 
 
-    def make_time_tree(self, time_marginal=False, clock_rate=None, assign_dates=True, **kwargs):
+    def make_time_tree(self, time_marginal=False, clock_rate=None, **kwargs):
         '''
         Use the date constraints to calculate the most likely positions of
         unconstrained nodes.
@@ -374,14 +374,12 @@ class ClockTree(TreeAnc):
         self.init_date_constraints(clock_rate=clock_rate, **kwargs)
 
         if time_marginal:
-            self._ml_t_marginal(assign_dates = assign_dates)
+            self._ml_t_marginal()
         else:
             self._ml_t_joint()
 
         self.tree.positional_LH = self.timetree_likelihood(time_marginal)
-
-        if assign_dates or not time_marginal:
-            self.convert_dates()
+        self.convert_dates()
 
 
     def _ml_t_joint(self):
@@ -420,10 +418,10 @@ class ClockTree(TreeAnc):
                 node.joint_pos_Lx = None
                 node.joint_pos_Cx = None
             else: # all other nodes
-                if node.date_constraint is not None and node.date_constraint.is_delta: # there is a time constraint
+                if node.date_constraint is not None and node.date_constraint.is_delta: # there is a strict time constraint
                     # subtree probability given the position of the parent node
                     # Lx.x is the position of the parent node
-                    # Lx.y is the probablity of the subtree (consisting of one terminal node in this case)
+                    # Lx.y is the probability of the subtree (consisting of one terminal node in this case)
                     # Cx.y is the branch length corresponding the optimal subtree
                     bl = node.branch_length_interpolator.x
                     x = bl + node.date_constraint.peak_pos
@@ -446,7 +444,10 @@ class ClockTree(TreeAnc):
                     ## resulting in the exponent (k-1))
                     if hasattr(self, 'merger_model') and self.merger_model:
                         time_points = np.unique(np.concatenate([msg.x for msg in msgs_to_multiply]))
-                        msgs_to_multiply.append(self.merger_model.node_contribution(node, time_points))
+                        if node.is_terminal():
+                            msgs_to_multiply.append(Distribution(time_points, -self.merger_model.integral_merger_rate(time_points), is_log=True))
+                        else:
+                            msgs_to_multiply.append(self.merger_model.node_contribution(node, time_points))
 
                     # msgs_to_multiply combined returns the subtree likelihood given the node's constraint and child messages
                     if len(msgs_to_multiply) == 0: # there are no constraints
@@ -543,7 +544,7 @@ class ClockTree(TreeAnc):
         Return the likelihood of the data given the current branch length in the tree
         '''
         if time_marginal:
-            LH =  -self.tree.root.marginal_pos_LH.peak_val
+            LH = self.tree.root.marginal_pos_LH.integrate(return_log=True, a=self.tree.root.marginal_pos_LH.xmin, b=self.tree.root.marginal_pos_LH.xmax, n=1000)
         else:
             LH = 0
             for node in self.tree.find_clades(order='preorder'):  # sum the likelihood contributions of all branches
@@ -557,7 +558,7 @@ class ClockTree(TreeAnc):
         return LH
 
 
-    def _ml_t_marginal(self, assign_dates=False):
+    def _ml_t_marginal(self):
         """
         Compute the marginal probability distribution of the internal nodes positions by
         propagating from the tree leaves towards the root. The result of
@@ -620,6 +621,17 @@ class ClockTree(TreeAnc):
                     msgs_to_multiply = [node.date_constraint] if node.date_constraint is not None else []
                     msgs_to_multiply.extend([child.marginal_pos_Lx for child in node.clades
                                              if child.marginal_pos_Lx is not None])
+
+                    # combine the different msgs and constraints
+                    if len(msgs_to_multiply)==0:
+                        # no information
+                        node.marginal_pos_Lx = None
+                        continue
+                    elif len(msgs_to_multiply)==1:
+                        node.product_of_child_messages = msgs_to_multiply[0]
+                    else: # combine the different msgs and constraints
+                        node.product_of_child_messages = Distribution.multiply(msgs_to_multiply)
+
                     ## When a coalescent model is being used, the cost of having no merger events along the branch
                     ## and one at the node at time t is also factored in: -np.log((gamma(t) * np.exp**-I(t))**(k-1)),
                     ## where k is the number of branches that merge at node t, gamma(t) is the total_merger_rate
@@ -630,17 +642,13 @@ class ClockTree(TreeAnc):
                     if hasattr(self, 'merger_model') and self.merger_model:
                         time_points = np.unique(np.concatenate([msg.x for msg in msgs_to_multiply]))
                         # set multiplicity of node to number of good child branches
-                        msgs_to_multiply.append(self.merger_model.node_contribution(node, time_points))
-
-                    # combine the different msgs and constraints
-                    if len(msgs_to_multiply)==0:
-                        # no information
-                        node.marginal_pos_Lx = None
-                        continue
-                    elif len(msgs_to_multiply)==1:
-                        node.subtree_distribution = msgs_to_multiply[0]
-                    else: # combine the different msgs and constraints
-                        node.subtree_distribution = Distribution.multiply(msgs_to_multiply)
+                        if node.is_terminal():
+                            merger_contribution = Distribution(time_points, -self.merger_model.integral_merger_rate(time_points), is_log=True)
+                        else:
+                            merger_contribution = self.merger_model.node_contribution(node, time_points)
+                        node.subtree_distribution = Distribution.multiply([merger_contribution, node.product_of_child_messages])
+                    else:
+                        node.subtree_distribution = node.product_of_child_messages
 
                     if node.up is None: # this is the root, set dates
                         node.subtree_distribution._adjust_grid(rel_tol=self.rel_tol_prune)
@@ -672,7 +680,7 @@ class ClockTree(TreeAnc):
             ## If a delta constraint in known no further work required
             if (node.date_constraint is not None) and (not node.bad_branch) and node.date_constraint.is_delta:
                 node.marginal_pos_LH = node.date_constraint
-                node.msg_from_parent = None #if internal node has a delta constraint noprevious information passed on
+                node.msg_from_parent = None #if internal node has a delta constraint no previous information is passed on
             elif node.up is None:
                 node.msg_from_parent = None # nothing beyond the root
             # all other cases (All internal nodes + unconstrained terminals)
@@ -681,11 +689,13 @@ class ClockTree(TreeAnc):
 
                 msg_parent_to_node =None
                 if node.marginal_pos_Lx is not None:
-                    # messages from the complementary subtree (iterate over all sister nodes)
-                    complementary_msgs = [parent.date_constraint] if parent.date_constraint is not None else []
-                    complementary_msgs.extend([sister.marginal_pos_Lx for sister in parent.clades
-                                                if (sister != node) and (sister.marginal_pos_Lx is not None)])
-
+                    if len(parent.clades)<5:
+                        # messages from the complementary subtree (iterate over all sister nodes)
+                        complementary_msgs = [parent.date_constraint] if parent.date_constraint is not None else []
+                        complementary_msgs.extend([sister.marginal_pos_Lx for sister in parent.clades
+                                                    if (sister != node) and (sister.marginal_pos_Lx is not None)])
+                    else:
+                        complementary_msgs = [Distribution.divide(parent.product_of_child_messages, node.marginal_pos_Lx)]
                     # if parent itself got smth from the root node, include it
                     if parent.msg_from_parent is not None:
                         complementary_msgs.append(parent.msg_from_parent)
@@ -730,7 +740,7 @@ class ClockTree(TreeAnc):
                     node.marginal_pos_LH = NodeInterpolator.multiply((node.msg_from_parent, node.subtree_distribution))
 
                 self.logger('ClockTree._ml_t_root_to_leaves: computed convolution'
-                                ' with %d points at node %s'%(len(res.x),node.name),4)
+                                ' with %d points at node %s'%(len(res.x),node.name), 4)
 
                 if self.debug:
                     tmp = np.diff(res.y-res.peak_val)
@@ -746,15 +756,14 @@ class ClockTree(TreeAnc):
                         plt.xlim(-0.05, 0.05)
                         #import ipdb; ipdb.set_trace()
 
-            # assign positions of nodes and branch length only when desired
-            # since marginal reconstruction can result in negative branch length
-            if assign_dates:
-                node.time_before_present = node.marginal_pos_LH.peak_pos
-                if node.up:
-                    node.clock_length = node.up.time_before_present - node.time_before_present
-                    node.branch_length = node.clock_length
+            # assign positions of nodes and branch length
+            # note that marginal reconstruction can result in negative branch lengths
+            node.time_before_present = node.marginal_pos_LH.peak_pos
+            if node.up:
+                node.clock_length = node.up.time_before_present - node.time_before_present
+                node.branch_length = node.clock_length
 
-            # construct the inverse cumulant distribution to evaluate confidence intervals
+            # construct the inverse cumulative distribution to evaluate confidence intervals
             if node.marginal_pos_LH.is_delta:
                 node.marginal_inverse_cdf=interp1d([0,1], node.marginal_pos_LH.peak_pos*np.ones(2), kind="linear")
             else:
@@ -789,10 +798,10 @@ class ClockTree(TreeAnc):
                 if not hasattr(node, "bad_branch") or node.bad_branch is False:
                     self.logger("ClockTree.convert_dates -- WARNING: The node is later than today, but it is not "
                         "marked as \"BAD\", which indicates the error in the "
-                        "likelihood optimization.",4 , warn=True)
+                        "likelihood optimization.", 4, warn=True)
                 else:
                     self.logger("ClockTree.convert_dates -- WARNING: node which is marked as \"BAD\" optimized "
-                        "later than present day",4 , warn=True)
+                        "later than present day", 4, warn=True)
 
             node.numdate = now - years_bp
             node.date = datestring_from_numeric(node.numdate)


=====================================
treetime/distribution.py
=====================================
@@ -102,7 +102,7 @@ class Distribution(object):
             try:
                 peak = y_vals.min()
             except:
-                print("WARNING: Unexpected behavior detected in multipy function,"
+                print("WARNING: Unexpected behavior detected in multiply function,"
                         "if you see this error \n please let us know by filling an issue at: https://github.com/neherlab/treetime/issues")
                 x_vals = [0,1]
                 y_vals = [BIG_NUMBER,BIG_NUMBER]
@@ -126,6 +126,39 @@ class Distribution(object):
 
         return res
 
+    @staticmethod
+    def divide(numerator, denominator):
+        '''
+        divides one distribution object by another. Note that this is in general an ill-defined procedure.
+        this is implemented here for the special case where the numerator is a product that contains the denominator
+        to produce a reduced product without the denominator.
+        '''
+        if numerator.is_delta or denominator.is_delta:
+            raise(ArithmeticError("Can not divide delta functions"))
+        dists = [numerator, denominator]
+        min_width = np.max([k.min_width for k in dists])
+        new_xmin = np.max([k.xmin for k in dists])
+        new_xmax = np.min([k.xmax for k in dists])
+        x_vals = np.unique(np.concatenate([k.x for k in dists]))
+        x_vals = x_vals[(x_vals> new_xmin-TINY_NUMBER)&(x_vals< new_xmax+TINY_NUMBER)]
+        y_vals = numerator.__call__(x_vals) - denominator.__call__(x_vals)
+
+        peak = y_vals.min()
+        ind = (y_vals-peak)<BIG_NUMBER/1000
+        n_points = ind.sum()
+        if n_points == 0:
+            print("WARNING: Unexpected behavior detected in multipy function,"
+                    "if you see this error \n please let us know by filling an issue at: https://github.com/neherlab/treetime/issues")
+            x_vals = [0,1]
+            y_vals = [BIG_NUMBER,BIG_NUMBER]
+            res = Distribution(x_vals, y_vals, is_log=True,
+                                min_width=min_width, kind='linear')
+        elif n_points == 1:
+            res = Distribution.delta_function(x_vals[0])
+        else:
+            res = Distribution(x_vals[ind], y_vals[ind], is_log=True,
+                                min_width=min_width, kind='linear', assume_sorted=True)
+        return res
 
     def __init__(self, x, y, is_log=True, min_width = MIN_INTEGRATION_PEAK,
                  kind='linear', assume_sorted=False):
@@ -293,7 +326,7 @@ class Distribution(object):
                 d = y1-y2
                 right = -x1*y2/d + x2*y1/d
         except:
-            raise ArithmeticError("Region of support of the distribution couldn'n be determined!")
+            raise ArithmeticError("Region of support of the distribution could not be determined!")
 
         return (left,right)
 
@@ -331,12 +364,16 @@ class Distribution(object):
         if factor>=0:
             self._xmin*=factor
             self._xmax*=factor
+            self._fwhm*=factor
+            self._effective_support = [x*factor for x in self._effective_support]
         else:
             tmp = self.xmin
             self._xmin = factor*self.xmax
             self._xmax = factor*tmp
             self._func.x = self._func.x[::-1]
             self._func.y = self._func.y[::-1]
+            self._fwhm *= -factor
+            self._effective_support = [x*factor for x in self._effective_support[::-1]]
 
 
     def integrate(self, return_log=False ,**kwargs):


=====================================
treetime/merger_models.py
=====================================
@@ -186,7 +186,7 @@ class Coalescent(object):
     def branch_merger_rate(self, t):
         '''
         rate at which one particular branch merges with any other branch at time t,
-        in the Kingsman model this is: :math:`\kappa(t) = (k(t)-1)/(2Tc(t))`
+        in the Kingman model this is: :math:`\kappa(t) = (k(t)-1)/(2Tc(t))`
         '''
         # note that we always have a positive merger rate by capping the
         # number of branches at 0.5 from below. in these regions, the
@@ -196,7 +196,7 @@ class Coalescent(object):
     def total_merger_rate(self, t):
         '''
         rate at which any branch merges with any other branch at time t,
-        in the Kingsman model this is: :math:`\lambda(t) = k(t)(k(t)-1)/(2Tc(t))`
+        in the Kingman model this is: :math:`\lambda(t) = k(t)(k(t)-1)/(2Tc(t))`
         '''
         # note that we always have a positive merger rate by capping the
         # number of branches at 0.5 from below. in these regions, the
@@ -234,6 +234,7 @@ class Coalescent(object):
         from treetime.node_interpolator import NodeInterpolator
         if multiplicity is None:
             multiplicity = len(node.clades)
+        # the number of mergers is 'number of children' - 1
         multiplicity -= 1.0
         y = (self.integral_merger_rate(t) - np.log(self.total_merger_rate(t)))*multiplicity
         return NodeInterpolator(t, y, is_log=True)


=====================================
treetime/node_interpolator.py
=====================================
@@ -160,7 +160,7 @@ class NodeInterpolator (Distribution):
 
     @classmethod
     def convolve_fft(cls, node_interp, branch_interp, fft_grid_size=FFT_FWHM_GRID_SIZE, inverse_time=True):
-        fwhm = node_interp.fwhm + branch_interp.fwhm
+
         dt = max(branch_interp.one_mutation*0.005, min(node_interp.fwhm, branch_interp.fwhm)/fft_grid_size)
         b_effsupport = branch_interp.effective_support
         n_effsupport = node_interp.effective_support


=====================================
treetime/seqgen.py
=====================================
@@ -86,7 +86,7 @@ class SeqGen(TreeAnc):
         from Bio.Align import MultipleSeqAlignment
 
         tmp = []
-        for n in self.tree.get_terminals():
+        for n in self.tree.find_clades():
             if n.is_terminal() or internal:
                 tmp.append(SeqRecord.SeqRecord(id=n.name, name=n.name, description='', seq=Seq.Seq(''.join(n.ancestral_sequence.astype('U')))))
 


=====================================
treetime/sequence_data.py
=====================================
@@ -158,7 +158,7 @@ class SequenceData(object):
                     except:
                         continue
 
-        if type(in_aln) is MultipleSeqAlignment:
+        if isinstance(in_aln, MultipleSeqAlignment):
             # check whether the alignment is consistent with a nucleotide alignment.
             self._aln = {}
             for s in in_aln:


=====================================
treetime/treeanc.py
=====================================
@@ -1221,14 +1221,14 @@ class TreeAnc(object):
 
     def optimize_tree_marginal(self, max_iter=10, infer_gtr=False, pc=1.0, damping=0.75,
                                LHtol=0.1, site_specific_gtr=False, **kwargs):
-        self.infer_ancestral_sequences(marginal=True)
+        self.infer_ancestral_sequences(marginal=True, **kwargs)
         oldLH = self.sequence_LH()
         self.logger("TreeAnc.optimize_tree_marginal: initial, LH=%1.2f, total branch_length %1.4f"%
                     (oldLH, self.tree.total_branch_length()), 2)
         for i in range(max_iter):
             if infer_gtr:
                 self.infer_gtr(site_specific=site_specific_gtr, marginal=True, normalized_rate=True, pc=pc)
-                self.infer_ancestral_sequences(marginal=True)
+                self.infer_ancestral_sequences(marginal=True, **kwargs)
 
             old_bl = self.tree.total_branch_length()
             tol = 1e-8 + 0.01**(i+1)
@@ -1252,12 +1252,15 @@ class TreeAnc(object):
 
                     n1.branch_length = update_val*bl_ratio
                     n2.branch_length = update_val*(1-bl_ratio)
+                    n1.mutation_length = n1.branch_length
+                    n2.mutation_length = n2.branch_length
                 else:
                     new_val = self.optimal_marginal_branch_length(n, tol=tol)
                     update_val = new_val*(1-damping**(i+1)) + n.branch_length*damping**(i+1)
                     n.branch_length = update_val
+                    n.mutation_length = n.branch_length
 
-            self.infer_ancestral_sequences(marginal=True)
+            self.infer_ancestral_sequences(marginal=True, **kwargs)
 
             LH = self.sequence_LH()
             deltaLH = LH - oldLH
@@ -1342,8 +1345,7 @@ class TreeAnc(object):
         self.logger("TreeAnc.optimize_tree: sequences...", 1)
         N_diff = self.reconstruct_anc(method=method_anc, infer_gtr=infer_gtr, pc=pc,
                                       marginal=marginal_sequences, **kwargs)
-        self.optimize_branch_len(verbose=0, store_old=False, mode=branch_length_mode)
-
+        self.optimize_branch_lengths_joint(verbose=0, store_old=False, mode=branch_length_mode)
         n = 0
         while n<max_iter:
             n += 1


=====================================
treetime/treetime.py
=====================================
@@ -2,7 +2,7 @@ import numpy as np
 from scipy import optimize as sciopt
 from Bio import Phylo
 from . import config as ttconf
-from . import MissingDataError,UnknownMethodError,NotReadyError
+from . import MissingDataError,UnknownMethodError,NotReadyError,TreeTimeError
 from .utils import tree_layout
 from .clock_tree import ClockTree
 
@@ -54,8 +54,8 @@ class TreeTime(ClockTree):
     def run(self, root=None, infer_gtr=True, relaxed_clock=None, n_iqd = None,
             resolve_polytomies=True, max_iter=0, Tc=None, fixed_clock_rate=None,
             time_marginal='never', sequence_marginal=False, branch_length_mode='auto',
-            vary_rate=False, use_covariation=False, tracelog_file=None, 
-            method_anc = 'probabilistic', **kwargs):
+            vary_rate=False, use_covariation=False, tracelog_file=None,
+            method_anc = 'probabilistic', assign_gamma=None, **kwargs):
 
         """
         Run TreeTime reconstruction. Based on the input parameters, it divides
@@ -75,7 +75,7 @@ class TreeTime(ClockTree):
         infer_gtr : bool
            If True, infer GTR model
 
-        relaxed_clock : dic
+        relaxed_clock : dict
            If not None, use autocorrelated molecular clock model. Specify the
            clock parameters as :code:`{slack:<slack>, coupling:<coupling>}` dictionary.
 
@@ -131,10 +131,14 @@ class TreeTime(ClockTree):
             use_covariation is true by default
 
         method_anc: str, optional
-            Which method should be used to reconstruct ancestral sequences. 
-            Supported values are "parsimony", "fitch", "probabilistic" and "ml". 
+            Which method should be used to reconstruct ancestral sequences.
+            Supported values are "parsimony", "fitch", "probabilistic" and "ml".
             Default is "probabilistic"
 
+        assign_gamma: callable, optional
+            function to specify gamma (branch length scaling, local clock rate modifier)
+            for each branch in tree, not compatible with a relaxed clock model
+
         **kwargs
            Keyword arguments needed by the downstream functions
 
@@ -158,7 +162,7 @@ class TreeTime(ClockTree):
         # determine how to reconstruct and sample sequences
         seq_kwargs = {"marginal_sequences":sequence_marginal or (self.branch_length_mode=='marginal'),
                       "branch_length_mode": self.branch_length_mode,
-                      "sample_from_profile":"root",
+                      "sample_from_profile": "root",
                       "prune_short":kwargs.get("prune_short", True),
                       "reconstruct_tip_states":kwargs.get("reconstruct_tip_states", False)}
         time_marginal_method = reduce_time_marginal_argument(time_marginal) ## for backward compatibility
@@ -172,6 +176,9 @@ class TreeTime(ClockTree):
         if "do_marginal" in kwargs:
             time_marginal=kwargs["do_marginal"]
 
+        if assign_gamma and relaxed_clock:
+            raise TreeTimeError("assign_gamma and relaxed clock are incompatible arguments")
+
         # initially, infer ancestral sequences and infer gtr model if desired
         if self.branch_length_mode=='input':
             if self.aln:
@@ -181,7 +188,6 @@ class TreeTime(ClockTree):
         else:
             self.optimize_tree(infer_gtr=infer_gtr,
                                max_iter=1, method_anc = method_anc, **seq_kwargs)
-        avg_root_to_tip = np.mean([x.dist2root for x in self.tree.get_terminals()])
 
         # optionally reroot the tree either by oldest, best regression or with a specific leaf
         if n_iqd or root=='clock_filter':
@@ -208,9 +214,15 @@ class TreeTime(ClockTree):
             seq_LH = self.tree.sequence_marginal_LH if seq_kwargs['marginal_sequences'] else self.tree.sequence_joint_LH
         self.LH =[[seq_LH, self.tree.positional_LH, 0]]
 
+        # if we reroot, repeat rerooting after initial clock-filter/time tree
+        # re-optimize branch length, and update time tree
         if root is not None and max_iter:
             self.reroot(root='least-squares' if root=='clock_filter' else root, clock_rate=fixed_clock_rate)
             self.logger("###TreeTime.run: rerunning timetree after rerooting",0)
+
+            if self.branch_length_mode!='input':
+                self.optimize_tree(max_iter=0, method_anc = method_anc,**seq_kwargs)
+
             self.make_time_tree(**tt_kwargs)
 
         # iteratively reconstruct ancestral sequences and re-infer
@@ -247,11 +259,15 @@ class TreeTime(ClockTree):
                 # if polytomies are found, rerun the entire procedure
                 n_resolved = self.resolve_polytomies()
                 if n_resolved:
+                    seq_kwargs['prune_short']=False
                     self.prepare_tree()
                     if self.branch_length_mode!='input': # otherwise reoptimize branch length while preserving branches without mutations
                         self.optimize_tree(max_iter=0, method_anc = method_anc,**seq_kwargs)
-
                     need_new_time_tree = True
+            if assign_gamma and callable(assign_gamma):
+                self.logger("### assigning gamma",1)
+                assign_gamma(self.tree)
+                need_new_time_tree = True
 
             if need_new_time_tree:
                 self.make_time_tree(**tt_kwargs)
@@ -279,6 +295,7 @@ class TreeTime(ClockTree):
                 self.logger("###TreeTime.run: CONVERGED",0)
                 break
 
+
         # if the rate is too be varied and the rate estimate has a valid confidence interval
         # rerun the estimation for variations of the rate
         if vary_rate:
@@ -294,8 +311,7 @@ class TreeTime(ClockTree):
         if time_marginal_method in ['only-final', 'confidence-only']:
             self.logger("###TreeTime.run: FINAL ROUND - confidence estimation via marginal reconstruction", 0)
             tt_kwargs['time_marginal'] = True
-            assign_dates = time_marginal_method!='confidence-only'
-            self.make_time_tree(**tt_kwargs, assign_dates=assign_dates)
+            self.make_time_tree(**tt_kwargs)
 
             self.trace_run.append(self.tracelog_run(niter=niter+1, ndiff=0, n_resolved=0,
                                       time_marginal=tt_kwargs['time_marginal'],
@@ -529,7 +545,7 @@ class TreeTime(ClockTree):
         return new_root
 
 
-    def resolve_polytomies(self, merge_compressed=False):
+    def resolve_polytomies(self, merge_compressed=False, resolution_threshold=0.05):
         """
         Resolve the polytomies on the tree.
 
@@ -558,7 +574,7 @@ class TreeTime(ClockTree):
         for n in self.tree.find_clades():
             if len(n.clades) > 2:
                 prior_n_clades = len(n.clades)
-                self._poly(n, merge_compressed)
+                self._poly(n, merge_compressed, resolution_threshold=resolution_threshold)
                 poly_found+=prior_n_clades - len(n.clades)
 
         obsolete_nodes = [n for n in self.tree.find_clades() if len(n.clades)==1 and n.up is not None]
@@ -574,7 +590,7 @@ class TreeTime(ClockTree):
         return poly_found
 
 
-    def _poly(self, clade, merge_compressed):
+    def _poly(self, clade, merge_compressed, resolution_threshold):
 
         """
         Function to resolve polytomies for a given parent node. If the
@@ -628,13 +644,12 @@ class TreeTime(ClockTree):
                 # set zero to large negative value and find optimal pair
                 np.fill_diagonal(cost_gains, -1e11)
                 idxs = np.unravel_index(cost_gains.argmax(),cost_gains.shape)
-                if (idxs[0] == idxs[1]) or cost_gains.max()<0:
+                if (idxs[0] == idxs[1]) or cost_gains.max()<resolution_threshold:
                     self.logger("TreeTime._poly.merge_nodes: node is not fully resolved "+clade.name,4)
                     return LH
 
                 n1, n2 = source_arr[idxs[0]], source_arr[idxs[1]]
                 LH += cost_gains[idxs]
-
                 new_node = Phylo.BaseTree.Clade()
 
                 # fix positions and branch lengths



View it on GitLab: https://salsa.debian.org/med-team/python-treetime/-/compare/584a3e70dc4c373fafdcb78daf7d1e3af2d3bd82...ee80d288f5081dc6f3cd4f76494f7103f0e3b7bf

-- 
View it on GitLab: https://salsa.debian.org/med-team/python-treetime/-/compare/584a3e70dc4c373fafdcb78daf7d1e3af2d3bd82...ee80d288f5081dc6f3cd4f76494f7103f0e3b7bf
You're receiving this email because of your account on salsa.debian.org.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://alioth-lists.debian.net/pipermail/debian-med-commit/attachments/20220717/7e8755a6/attachment-0001.htm>


More information about the debian-med-commit mailing list