[med-svn] [Git][med-team/sepp][master] 4 commits: New upstream version 4.5.4+dfsg

Fri Aug 16 13:52:17 BST 2024


Pierre Gruet pushed to branch master at Debian Med / sepp


Commits:
1b915fe2 by Pierre Gruet at 2024-08-16T12:00:28+02:00
New upstream version 4.5.4+dfsg
- - - - -
f28f46d9 by Pierre Gruet at 2024-08-16T12:00:53+02:00
Update upstream source from tag 'upstream/4.5.4+dfsg'

Update to upstream version '4.5.4+dfsg'
with Debian dir 36ac54ff96482a3703d30fd18e685ac96552dec9
- - - - -
cdb88264 by Pierre Gruet at 2024-08-16T12:09:39+02:00
Refreshing patches

- - - - -
90e9581b by Pierre Gruet at 2024-08-16T14:23:41+02:00
Upload to unstable

- - - - -


30 changed files:

- + .github/workflows/sepp_tests.yml
- − .travis.yml
- CHANGELOG.md
- README.HIPPI.md
- README.SEPP.md
- README.md
- − ci/conda_requirements.txt
- + ci/env_lint.yml
- + ci/environment.yml
- − ci/pip_requirements.txt
- debian/changelog
- debian/patches/configuration_files_in_etc_and_per_user.patch
- debian/patches/deactivating_log_test.patch
- − debian/patches/open-U-obsolete.patch
- − debian/patches/py310_collections_import.patch
- debian/patches/series
- − debian/patches/wrong_import_from_dendropy.patch
- sepp/__init__.py
- sepp/algorithm.py
- sepp/alignment.py
- sepp/config.py
- sepp/ensemble.py
- sepp/exhaustive.py
- sepp/exhaustive_upp.py
- sepp/tree.py
- setup.py
- test/unittest/TestFork.py
- test/unittest/testAlignment.py
- test/unittest/testConfig.py
- test/unittest/testUPP.py


Changes:

=====================================
.github/workflows/sepp_tests.yml
=====================================
@@ -0,0 +1,79 @@
+name: SEPP_github_tests
+
+on:
+  push:
+    branches: [ master ]
+  pull_request:
+    branches: [ master ]
+
+jobs:
+  python_tests:
+    strategy:
+      matrix:
+        python-version: ["3.9", "3.10"]
+    runs-on: ubuntu-latest
+    steps:
+    - name: Checkout Repo
+      uses: actions/checkout at v3
+    - name: setup conda
+      uses: conda-incubator/setup-miniconda at v3
+      with:
+        # This uses *miniforge*, rather than *minicond*. The primary difference
+        # is that the defaults channel is not enabled at all
+        miniforge-version: latest
+        # These properties enable the use of mamba, which is much faster and far
+        # less error prone than conda while being completely compatible with the
+        # conda CLI
+        use-mamba: true
+        mamba-version: "*"
+        python-version: ${{ matrix.python-version }}
+        environment-file: ci/environment.yml
+        auto-activate-base: true
+        activate-environment: sepp_ci
+    - name: install sepp
+      shell: bash -el {0}
+      run: |
+        python setup.py config -c
+        python setup.py install
+    - name: run tests
+      shell: bash -el {0}
+      run: |
+        conda list
+        nosetests -w test/unittest --with-doctest --with-coverage
+    - name: convert coverage
+      shell: bash -el {0}
+      run: |
+        coverage lcov
+    - name: send coverage report.
+      uses: coverallsapp/github-action at master
+      with:
+        github-token: ${{ secrets.GITHUB_TOKEN }}
+        path-to-lcov: "coverage.lcov"
+
+  linting:
+    strategy:
+      matrix:
+        python-version: ["3.9", "3.10"]
+    runs-on: ubuntu-latest
+    steps:
+    - name: Checkout Repo
+      uses: actions/checkout at v3
+    - name: setup conda
+      uses: conda-incubator/setup-miniconda at v3
+      with:
+        # This uses *miniforge*, rather than *minicond*. The primary difference
+        # is that the defaults channel is not enabled at all
+        miniforge-version: latest
+        # These properties enable the use of mamba, which is much faster and far
+        # less error prone than conda while being completely compatible with the
+        # conda CLI
+        use-mamba: true
+        mamba-version: "*"
+        python-version: ${{ matrix.python-version }}
+        environment-file: ci/env_lint.yml
+        auto-activate-base: true
+        activate-environment: sepp_lint
+    - name: linting
+      shell: bash -el {0}
+      run: |
+        flake8  setup.py  split_sequences.py distribute_setup.py run_ensemble.py run_sepp.py run_upp.py merge_script.py test/unittest/ sepp/


=====================================
.travis.yml deleted
=====================================
@@ -1,37 +0,0 @@
-# Check on http://lint.travis-ci.org/ after modifying it!
-sudo: false
-language: c
-os:
-  - linux
-  - osx
-env:
-  - PYVERSION="3.7"
-  - PYVERSION="3.8"
-before_install:
-  - echo "$TRAVIS_OS_NAME"
-  - if [[ "$TRAVIS_OS_NAME" == "linux" ]]; then wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh -O miniconda.sh; fi
-  - if [[ "$TRAVIS_OS_NAME" == "osx" ]]; then wget https://repo.continuum.io/miniconda/Miniconda3-latest-MacOSX-x86_64.sh -O miniconda.sh; fi
-  - bash miniconda.sh -b -p $HOME/miniconda
-  - export PATH="$HOME/miniconda/bin:$PATH"
-  - hash -r
-  - conda config --set always_yes yes --set changeps1 no
-  - conda config --add channels conda-forge
-  - conda config --add channels https://conda.anaconda.org/bioconda
-  - conda config --add channels https://conda.anaconda.org/biocore
-  # Update conda itself
-  - conda update -q conda
-  # Useful for debugging any issues with conda
-  - conda info -a
-install:
-  - conda create -n test_env python=$PYVERSION --file ci/conda_requirements.txt
-  - source activate test_env
-  - pip install -r ci/pip_requirements.txt
-  # install SEPP
-  - python setup.py config -c
-  - python setup.py install
-script:
-  - COVERAGE_FILE=.coverage coverage run -p --concurrency=multiprocessing --rcfile .coveragerc setup.py nosetests -w test/unittest/
-  - coverage combine
-  - flake8  setup.py  split_sequences.py distribute_setup.py run_ensemble.py run_sepp.py run_upp.py merge_script.py test/unittest/ sepp/
-after_success:
-  - coveralls


=====================================
CHANGELOG.md
=====================================
@@ -1,3 +1,10 @@
+* Version 4.5.4:
+  * Slight improvements in logging key information
+* Version 4.5.3:
+    * Make sure SEPP and HMMER version is outputted to log
+    * Update documents to ask users to cite HMMER
+* Version 4.5.2:
+    * New -M functionality
 * Version 4.5.1:
     * fix small issues (broken links, etc.) in 4.5.0
 * Version 4.5.0:


=====================================
README.HIPPI.md
=====================================
@@ -16,7 +16,7 @@ Developers: Nam Nguyen, Michael Nute, Siavash Mirarab, and Tandy Warnow.
 Nguyen, Nam-phuong (2016): `HIPPI Dataset`. University of Illinois at Urbana-Champaign. https://doi.org/10.13012/B2IDB-6795126_V1
 
 ###Publication:
-Nam Nguyen, Michael Nute, Siavash Mirarab, Keerthana Kumar, and Tandy Warnow. `HIPPI: Highly Accurate Protein Family Classification with Ensembles of profile Hidden Markov Models`. Accepted to RECOMB CG 2016 ().
+Nam Nguyen, Michael Nute, Siavash Mirarab, and Tandy Warnow. `HIPPI: Highly Accurate Protein Family Classification with Ensembles of profile Hidden Markov Models`. BMC Genomics 17, 765 (2016). https://doi.org/10.1186/s12864-016-3097-0
 
 
 ### Note and Acknowledgment: 
@@ -39,7 +39,7 @@ Before installing the software you need to make sure the following programs are
 
 Installation Steps:
 -------------------
-HIPPI is a part of the SEPP distribution package.  By installing SEPP, HIPPI is automatically installed. (see [SEPP readme] (https://github.com/smirarab/sepp/blob/master/README.SEPP.md)).  
+HIPPI is a part of the SEPP distribution package.  By installing SEPP, HIPPI is automatically installed (see [SEPP readme] (https://github.com/smirarab/sepp/blob/master/README.SEPP.md)).  
 
 Common Problems:
 -------------------
@@ -65,5 +65,5 @@ We are currently building a pipeline script to streamline this process.
 ---------------------------------------------
 Bugs and Errors
 ---------------------------------------------
-HIPPI is under active research development at UIUC by the Warnow Lab (and especially with her former PhD students Siavash Mirarab and Nam Nguyen). Please report any errors or requests to Michael Nute (nute2 at illinois.edu), Siavash Mirarab (smirarab at gmail.com) and Nam Nguyen (ndn006 at eng.ucsd.edu).
+HIPPI is under active research development at UIUC by the Warnow Lab (and especially with her former PhD students Siavash Mirarab and Nam Nguyen). Please report any errors or requests to Michael Nute (mike.nute at gmail.com), Siavash Mirarab (smirarab at gmail.com), or  Nam Nguyen (nnguyen at boundlessbio.com).
 


=====================================
README.SEPP.md
=====================================
@@ -9,19 +9,31 @@ SEPP stands for `SATe-enabled phylogenetic placement`, and so is a method for th
 - Output: placement of each fragment in `X` into the tree T, and alignment of each fragment in `X` to the alignment `A`.
 
 SEPP operates by using a divide-and-conquer strategy adopted from SATe-II ([Liu et al., Systematic Biology 2012](http://sysbio.oxfordjournals.org/content/61/1/90.full.pdf+html?sid=dd32838d-89dc-4bda-8008-6f948146341f) and [Liu et. al., Science, 2009](http://www.sciencemag.org/content/324/5934/1561.abstract)) to construct an Ensemble of Hidden Markov Models (HMMs) to represent the input multiple sequence alignment `A`.
-It then computes the fit of each query sequence in `X` to each HMM in the ensemble, and uses the highest scoring HMM to add the sequence to the input tree `T`. This technique improves the accuracy of the placements of the query sequences compared to using a single HMM to represent the input alignment. SEPP uses tools in HMMER to construct HMMs, compute the fit of sequences to HMMs, and add sequences to the alignment `A` (code by Sean Eddy). SEPP uses pplacer (code by Erick Matsen) to add query sequences to the input tree `T`, after they are added to the alignment `A`. SEPP is also used in other software, including TIPP (taxonomic identical using phylogenetic placement) and UPP (ultra-large alignments using phylogeny-aware profiles).
+It then computes the fit of each query sequence in `X` to each HMM in the ensemble, and uses the highest scoring HMM to add the sequence to the input tree `T`. This technique improves the accuracy of the placements of the query sequences compared to using a single HMM to represent the input alignment. 
+
+SEPP uses external software and users are encouraged to cite these tools as well (including the version number used):
+
+* SEPP uses tools from [HMMER](http://hmmer.org/) suite to construct HMMs, compute the fit of sequences to HMMs, and add sequences to the alignment `A` (code by Sean Eddy). 
+* SEPP uses pplacer (code by Erick Matsen) to add query sequences to the input tree `T`, after they are added to the alignment `A`. 
+* SEPP is also used in other software, including TIPP (taxonomic identical using phylogenetic placement) and UPP (ultra-large alignments using phylogeny-aware profiles).
 
 Developers: Siavash Mirarab, Tandy Warnow, and Nam Nguyen, with valuable contributions from Uyen Mai, Daniel McDonald and Stefan Janssen.
 
 ### Publication:
-S. Mirarab, N. Nguyen, and T. Warnow, SEPP: SATe-enabled phylogenetic placement, Proceedings of the Pacific Symposium of Biocomputing 2012, pages 247-58 [http://www.ncbi.nlm.nih.gov/pubmed/22174280#](http://www.ncbi.nlm.nih.gov/pubmed/22174280#).
+
+* **SEPP**:
+	* S. Mirarab, N. Nguyen, and T. Warnow, “SEPP: SATe-enabled phylogenetic placement”, Proceedings of the Pacific Symposium of Biocomputing 2012, pages 247-58 [http://www.ncbi.nlm.nih.gov/pubmed/22174280#](http://www.ncbi.nlm.nih.gov/pubmed/22174280#).
+* **HMMER**:
+	* S. R. Eddy, “A new generation of homology search tools based on probabilistic inference.,” International Conference on Genome Informatics, vol. 23, no. 1, pp. 205–11, Oct. 2009. <http://www.ncbi.nlm.nih.gov/pubmed/20180275>
+
+* **PPLACER**:
+	* F. A. Matsen, R. B. Kodner, and E. V. Armbrust, “pplacer: linear time maximum-likelihood and Bayesian phylogenetic placement of sequences onto a fixed reference tree.,” BMC bioinformatics, vol. 11, no. 1, p. 538, Oct. 2010, doi: <http://doi.org/10.1186/1471-2105-11-538>.
 
 
 ### Documentations and related pages
 
 - SEPP on green genes: to run SEPP on green genes, it would be easier to use: [wiki](https://github.com/smirarab/sepp/wiki/SEPP-on-Greengenes)
 - SEPP [tutorial](tutorial/sepp-tutorial.md).
-- 
 
 ### Note and Acknowledgment: 
 - SEPP bundles the following two programs into its distribution:
@@ -50,17 +62,28 @@ Installation Steps:
 
 ### Bioconda
 
-If you use Bioconda, you can try installing SEPP using https://anaconda.org/bioconda/sepp
+If you use Bioconda, you can try installing SEPP using <https://anaconda.org/bioconda/sepp>
 
-### From source code
+~~~bash
+conda install bioconda::sepp
+~~~
 
-SEPP is distributed as Python source code. Once you have the above required software installed, do the following. 
+Note that results from bioconda version can differ from the version installed using source code. 
+This is because the bioconda recipe always installs the latest version of HMMER, 
+and HMMER has changed algorithms since the older version we bundle here. 
+However, in our tests, we have not detected a meaningful difference between accuracy of results from older HMMER
+and the new HMMER. 
 
-**Note:** these installation steps have recently changed
+### From source code
+
+SEPP is distributed as Python source code. Once you have the above required software installed, do the following.
 
 1. Obtain the latest SEPP distribution from git repository (using `git clone` or by simply downloading the Zip file). If you downloaded the zip file, uncompress the distribution file.
 2. Go to the distribution directory
-3. Configure: run `python setup.py config` (or `python setup.py config -c` to avoid using the home directory). 
+3. Configure: run
+    ```python setup.py config``` 
+    or to avoid using the home directory:
+    ```python setup.py config -c``` 
 4. Install: run `python setup.py install`. 
 
 The third step creates a `~/.sepp/` directory, puts the default config file under `~/.sepp/main.config`, and puts all the binary executables under it as well. 
@@ -113,11 +136,17 @@ be used to rename your internal nodes back to the original value. To update the
 cat [the name of .json/.tre/.xml file with mapped names]| python output__rename-json.py > [name of the relabelled file]
 ```
 
-By setting SEPP_DEBUG environmental variable to `True`, you can instruct SEPP to output more information that can be helpful for debugging.  
 
 ---------------------------------------------
 Bugs and Errors
 ---------------------------------------------
-SEPP is under active research development at UIUC by the Warnow Lab (and especially with her former PhD students Siavash Mirarab and Nam Nguyen). Please report any errors to Siavash Mirarab (smirarab at gmail.com) and Nam Nguyen (ndn006 at eng.ucsd.edu).
+By setting 
+
+```bash
+export SEPP_DEBUG=True
+```
+
+you can instruct SEPP to output more information that can be helpful for debugging.  
 
+Please resport bugs under issues. 
 


=====================================
README.md
=====================================
@@ -16,9 +16,11 @@ Each of these related tools has its own README file.
 * **HIPPI** stands for "Highly Accurate Protein Family Classification with Ensembles of HMMs", and addresses the problem of classifying query sequences to protein families.
 
 [README.TIPP.md](https://github.com/TeraTrees/TIPP/)
-* **TIPP** stands for "Taxonomic Identification and Phylogenetic Profiling", and addresses the problem of taxonomic identification and abundance profiling of metagenomic data. We have moved TIPP as a separate package from SEPP. TIPP package can be accessed [here](https://github.com/TeraTrees/TIPP/).
+* **TIPP** stands for "Taxonomic Identification and Phylogenetic Profiling", and addresses the problem of taxonomic identification and abundance profiling of metagenomic data. We have moved TIPP to be a separate package from SEPP. TIPP package can be accessed [here](https://github.com/TeraTrees/TIPP/).
+
+**NOTE:** All these programs heavily rely on [HMMER](http://hmmer.org/). Please cite HMMER when citing these tools as well and mention the version of the HMMER used. 
 
 ---------------------------------------------
 Bugs and Errors
 ---------------------------------------------
-SEPP, TIPP, UPP, HIPPI are under active research development at UIUC by the Warnow Lab and former student Siavash Mirarab (now at UCSD). Please report any errors to Siavash Mirarab (smirarab at ucsd.edu).
+SEPP, TIPP, UPP, HIPPI are under active research development at UIUC by the Warnow Lab and former student Siavash Mirarab (now at UCSD). Please report any errors on the GitHub issues page. 


=====================================
ci/conda_requirements.txt deleted
=====================================
@@ -1,4 +0,0 @@
-nose
-pep8
-flake8
-java-jdk


=====================================
ci/env_lint.yml
=====================================
@@ -0,0 +1,6 @@
+channels:
+  - bioconda
+  - conda-forge
+  - defaults
+dependencies:
+  - flake8


=====================================
ci/environment.yml
=====================================
@@ -0,0 +1,12 @@
+name: sepp
+channels:
+  - bioconda
+  - conda-forge
+  - defaults
+dependencies:
+  - flake8
+  - nose
+  - coverage >= 6  # to ensure lcov option is available
+  - java-jdk
+  - pep8
+  - setuptools


=====================================
ci/pip_requirements.txt deleted
=====================================
@@ -1 +0,0 @@
-coveralls


=====================================
debian/changelog
=====================================
@@ -1,3 +1,10 @@
+sepp (4.5.4+dfsg-1) unstable; urgency=medium
+
+  * New upstream version 4.5.4+dfsg
+  * Refreshing patches
+
+ -- Pierre Gruet <pgt at debian.org>  Fri, 16 Aug 2024 14:23:33 +0200
+
 sepp (4.5.1+really4.5.1+dfsg-6) unstable; urgency=medium
 
   * Fixing wrong import from dendropy (Closes: #1071780)


=====================================
debian/patches/configuration_files_in_etc_and_per_user.patch
=====================================
@@ -5,12 +5,13 @@ Last-Update: 2020-10-08
 
 --- a/sepp/config.py
 +++ b/sepp/config.py
-@@ -48,10 +48,11 @@
+@@ -49,11 +49,11 @@
  
  _LOG = get_logger(__name__)
  
 -root_p = open(os.path.join(os.path.split(
 -    os.path.split(__file__)[0])[0], "home.path")).readlines()[0].strip()
+-print("root_p='%s'" % root_p)
 -main_config_path = os.path.join(root_p, "main.config")
 -
 +home = os.path.expanduser("~")
@@ -23,7 +24,7 @@ Last-Update: 2020-10-08
      global main_config_path
 --- a/sepp/ensemble.py
 +++ b/sepp/ensemble.py
-@@ -162,7 +162,11 @@
+@@ -163,7 +163,11 @@
  
  
  def augment_parser():
@@ -38,7 +39,7 @@ Last-Update: 2020-10-08
          "This script runs the UPP algorithm on set of sequences.  A backbone "
 --- a/sepp/exhaustive_upp.py
 +++ b/sepp/exhaustive_upp.py
-@@ -372,9 +372,11 @@
+@@ -382,9 +382,11 @@
  
  
  def augment_parser():


=====================================
debian/patches/deactivating_log_test.patch
=====================================
@@ -9,14 +9,14 @@ Last-Update: 2020-10-05
          sepp._DEBUG = True
          sepp.reset_loggers()
          sepp.jobs._LOG.debug("test debugging works")
--        assert(sepp.jobs._LOG.getEffectiveLevel() == logging.DEBUG)
-+        #assert(sepp.jobs._LOG.getEffectiveLevel() == logging.DEBUG)
+-        assert (sepp.jobs._LOG.getEffectiveLevel() == logging.DEBUG)
++        #assert (sepp.jobs._LOG.getEffectiveLevel() == logging.DEBUG)
  
          sepp._DEBUG = False
          sepp.reset_loggers()
          sepp.jobs._LOG.debug("test debugging is disabled")
--        assert(sepp.jobs._LOG.getEffectiveLevel() == logging.INFO)
-+        #assert(sepp.jobs._LOG.getEffectiveLevel() == logging.INFO)
+-        assert (sepp.jobs._LOG.getEffectiveLevel() == logging.INFO)
++        #assert (sepp.jobs._LOG.getEffectiveLevel() == logging.INFO)
  
          sepp._DEBUG = sdb
          sepp.reset_loggers()


=====================================
debian/patches/open-U-obsolete.patch deleted
=====================================
@@ -1,49 +0,0 @@
-Description: Drop obsolete "U" mode passed to open()
- The "U" mode has been deprecated in python since python3 and has had no
- effect.  In python 3.11, it is now disallowed.  Drop this mode flag that
- causes runtime failures.
-Author: Steve Langasek <steve.langasek at ubuntu.com>
-Forwarded: https://github.com/smirarab/sepp/issues/124
-Reviewed-by: Pierre Gruet <pgt at debian.org>
-Last-Update: 2023-01-24
-
---- a/sepp/alignment.py
-+++ b/sepp/alignment.py
-@@ -108,7 +108,7 @@
-     file_obj = None
-     if isinstance(src, str):
-         try:
--            file_obj = open(src, "rU")
-+            file_obj = open(src, "r")
-         except IOError:
-             print(("The file `%s` does not exist, exiting gracefully" % src))
-     elif isinstance(src, filetypes):
-@@ -291,7 +291,7 @@
-         If duplicate sequence names are encountered then the old name will
-         be replaced.
-         """
--        file_obj = open(filename, 'rU')
-+        file_obj = open(filename, 'r')
-         return self.read_file_object(file_obj, file_format=file_format)
- 
-     def read_file_object(self, file_obj, file_format='FASTA'):
-@@ -582,7 +582,7 @@
-         columns. Labels insertion columns with special labels and labels the
-         rest of columns (i.e. original columns) sequentially.
-         """
--        handle = open(path, 'rU')
-+        handle = open(path, 'r')
-         insertions = None
-         if aformat.lower() == "stockholm":
-             insertions = self._read_sto(handle)
---- a/sepp/tree.py
-+++ b/sepp/tree.py
-@@ -314,7 +314,7 @@
- 
-     def read_tree_from_file(self, treefile, file_format):
-         dataset = Dataset()
--        dataset.read(open(treefile, 'rU'), schema=file_format)
-+        dataset.read(open(treefile, 'r'), schema=file_format)
-         dendropy_tree = dataset.trees_blocks[0][0]
-         self._tree = dendropy_tree
-         self.n_leaves = self.count_leaves()


=====================================
debian/patches/py310_collections_import.patch deleted
=====================================
@@ -1,25 +0,0 @@
-Description: fix collections import
- In python 3.10[1] deprecated aliases to Collections Abstract Base Classes from
- the collections module have been removed. These imports must be done from
- collections.abc.
- .
- 1. https://docs.python.org/3/whatsnew/3.10.html
-Author: Andreas Hasenack <andreas at canonical.com>
-Bug: https://github.com/smirarab/sepp/issues/117
-Bug-Debian: https://bugs.debian.org/1004968
-Bub-Ubuntu: https://bugs.launchpad.net/ubuntu/+source/sepp/+bug/1959938
-Last-Update: 2022-02-03
-diff --git a/sepp/alignment.py b/sepp/alignment.py
-index ada5de8..61e9adb 100644
---- a/sepp/alignment.py
-+++ b/sepp/alignment.py
-@@ -26,7 +26,8 @@ import re
- 
- from sepp.filemgr import open_with_intermediates
- 
--from collections import Mapping
-+
-+from collections.abc import Mapping
- import copy
- from sepp import get_logger
- import io


=====================================
debian/patches/series
=====================================
@@ -6,6 +6,3 @@ configuration_files_in_etc_and_per_user.patch
 deactivating_log_test.patch
 using_python3_interpreter.patch
 hmmbuild_path_for_testUPP.patch
-py310_collections_import.patch
-open-U-obsolete.patch
-wrong_import_from_dendropy.patch


=====================================
debian/patches/wrong_import_from_dendropy.patch deleted
=====================================
@@ -1,26 +0,0 @@
-Description: reworking syntax of call to convert_node_to_root_polytomy
-Author: Pierre Gruet <pgt at debian.org>
-Bug-Debian: https://bugs.debian.org/1071780
-Forwarded: no
-Last-Update: 2024-05-25
-
---- a/sepp/tree.py
-+++ b/sepp/tree.py
-@@ -22,8 +22,6 @@
- 
- from dendropy import Tree, Taxon, treecalc
- from dendropy import DataSet as Dataset
--from dendropy.datamodel.treemodel import _convert_node_to_root_polytomy as \
--    convert_node_to_root_polytomy
- from sepp import get_logger, sort_by_value
- from sepp.alignment import get_pdistance
- from sepp.decompose_tree import decompose_by_diameter
-@@ -258,7 +256,7 @@
- 
-         nr.edge.length = None
-         nr.parent_node = None
--        convert_node_to_root_polytomy(nr)
-+        nr._convert_node_to_root_polytomy()
-         t1 = PhylogeneticTree(Tree(seed_node=nr))
-         # temp we could speed this up,
-         # by telling the Phylogenetic tree how many leaves it has


=====================================
sepp/__init__.py
=====================================
@@ -27,7 +27,7 @@ __all__ = ['algorithm', 'alignment', 'backtranslate',
            'jobs', 'math_utils', 'problem', 'scheduler',
            'scratch', 'tree', 'get_logger', 'is_temp_kept', 'version']
 
-version = "4.5.1"
+version = "4.5.4"
 _DEBUG = ("SEPP_DEBUG" in os.environ) and \
     (os.environ["SEPP_DEBUG"].lower() == "true")
 


=====================================
sepp/algorithm.py
=====================================
@@ -3,7 +3,7 @@ Created on Oct 2, 2012
 
 @author: smirarab
 """
-from sepp.config import options
+from sepp.config import options, version
 from abc import abstractmethod
 from sepp.scheduler import JobPool
 from sepp.filemgr import directory_has_files_with_prefix, get_temp_file
@@ -17,6 +17,7 @@ import os
 from sepp.problem import RootProblem
 import time
 from sepp.checkpointing import CheckPointManager
+from subprocess import check_output
 
 _LOG = get_logger(__name__)
 
@@ -139,6 +140,22 @@ class AbstractAlgorithm(object):
         raise NotImplementedError()
 
     def run(self):
+        def get_hmmer_version():
+            try:
+                for line in check_output([self.options.hmmbuild.path, "-h"],
+                                         text=True).split('\n'):
+                    if "http://hmmer.org/" in line:
+                        return line.split(' ')[2]
+            except Exception as n:
+                _LOG.info(str(n))
+                return "(Unclear)"
+
+        _LOG.info("%s version %s used with HMMER version %s"
+                  % (self.name, version, get_hmmer_version()))
+        _LOG.info("Will user HMMER located at %s" % self.options.hmmbuild.path)
+
+        _LOG.info("All options: %s" % str(options()))
+
         checkpoint_manager = options().checkpoint
         assert isinstance(checkpoint_manager, CheckPointManager)
 
@@ -268,6 +285,8 @@ class AbstractAlgorithm(object):
             self.options.alignment_file))
         alignment = MutableAlignment()
         alignment.read_file_object(self.options.alignment_file)
+        _LOG.info("Alignment has %d sequences and %d sites" % (
+            alignment.get_num_taxa(), alignment.get_length()))
 
         # fragments = MutableAlignment()
         # fragments.read_file_object(self.options.fragment_file);
@@ -276,6 +295,8 @@ class AbstractAlgorithm(object):
             dendropy.Tree.get_from_stream(self.options.tree_file,
                                           schema="newick",
                                           preserve_underscores=True))
+        _LOG.info("Tree has %d leaves" % (
+            tree.count_leaves()))
 
         return alignment, tree
 


=====================================
sepp/alignment.py
=====================================
@@ -26,7 +26,11 @@ import re
 
 from sepp.filemgr import open_with_intermediates
 
-from collections import Mapping
+try:
+    from collections.abc import Mapping  # noqa
+except ImportError:
+    from collections import Mapping
+
 import copy
 from sepp import get_logger
 import io
@@ -107,7 +111,7 @@ def _read_fasta(src):
     file_obj = None
     if isinstance(src, str):
         try:
-            file_obj = open(src, "rU")
+            file_obj = open(src, "r")
         except IOError:
             print(("The file `%s` does not exist, exiting gracefully" % src))
     elif isinstance(src, filetypes):
@@ -290,7 +294,7 @@ class MutableAlignment(dict, ReadOnlyAlignment, object):
         If duplicate sequence names are encountered then the old name will
         be replaced.
         """
-        file_obj = open(filename, 'rU')
+        file_obj = open(filename, 'r')
         return self.read_file_object(file_obj, file_format=file_format)
 
     def read_file_object(self, file_obj, file_format='FASTA'):
@@ -581,7 +585,7 @@ class ExtendedAlignment(MutableAlignment):
         columns. Labels insertion columns with special labels and labels the
         rest of columns (i.e. original columns) sequentially.
         """
-        handle = open(path, 'rU')
+        handle = open(path, 'r')
         insertions = None
         if aformat.lower() == "stockholm":
             insertions = self._read_sto(handle)
@@ -767,8 +771,8 @@ class ExtendedAlignment(MutableAlignment):
                 if me != me_len and self.is_insertion_column(me):
                     ''' We both have a series of insertion columns'''
                     start = me
-                    while(me != me_len and self.is_insertion_column(me) and
-                          she != she_len and other.is_insertion_column(she)):
+                    while (me != me_len and self.is_insertion_column(me) and
+                           she != she_len and other.is_insertion_column(she)):
                         me += 1
                         she += 1
                         merged_insertion_columns += 1
@@ -801,24 +805,24 @@ class ExtendedAlignment(MutableAlignment):
                 self.col_labels[start:me] = list(
                     range(insertion, insertion-run, -1))
                 insertion -= run
-            elif(she == she_len or (me != me_len and
-                 self.col_labels[me] < other.col_labels[she])):
+            elif (she == she_len or (me != me_len and
+                  self.col_labels[me] < other.col_labels[she])):
                 ''' My column is not present (i.e. was allgap) in the
                     "other"'''
                 start = me
-                while(me < me_len and (she == she_len or me != me_len and
-                      self.col_labels[me] < other.col_labels[she])):
+                while (me < me_len and (she == she_len or me != me_len and
+                       self.col_labels[me] < other.col_labels[she])):
                     me += 1
                 run = me - start
                 ins = bytearray(b"-") * run
                 for v in selfother.values():
                     v[start:start] = ins
-            elif(me == me_len or (she != she_len and
-                 self.col_labels[me] > other.col_labels[she])):
+            elif (me == me_len or (she != she_len and
+                  self.col_labels[me] > other.col_labels[she])):
                 ''' Her column is not present (i.e. was allgap) in "me"'''
                 start = she
-                while(she < she_len and (me == me_len or she != she_len and
-                      self.col_labels[me] > other.col_labels[she])):
+                while (she < she_len and (me == me_len or she != she_len and
+                       self.col_labels[me] > other.col_labels[she])):
                     she += 1
                 run = she - start
                 ins = bytearray(b"-") * run
@@ -829,8 +833,8 @@ class ExtendedAlignment(MutableAlignment):
                 me_len += run
             elif self.col_labels[me] == other.col_labels[she]:
                 ''' A shared column'''
-                while(me < me_len and she < she_len and
-                      self.col_labels[me] == other.col_labels[she]):
+                while (me < me_len and she < she_len and
+                       self.col_labels[me] == other.col_labels[she]):
                     she += 1
                     me += 1
             else:


=====================================
sepp/config.py
=====================================
@@ -34,6 +34,7 @@ module. For an example see exhaustive_upp.
 from argparse import ArgumentParser, Namespace
 from sepp.filemgr import get_default_temp_dir, check_or_make_dir_path
 import sys
+
 try:
     import configparser
 except ImportError:
@@ -50,6 +51,7 @@ _LOG = get_logger(__name__)
 
 root_p = open(os.path.join(os.path.split(
     os.path.split(__file__)[0])[0], "home.path")).readlines()[0].strip()
+print("root_p='%s'" % root_p)
 main_config_path = os.path.join(root_p, "main.config")
 
 
@@ -178,7 +180,7 @@ def _init_parser():
         default=1,
         help=("minimum p-distance before stopping the decomposition"
               "[default: 1]"))
-# uym2 added #
+    # uym2 added #
     decompGroup.add_argument(
         "-M", "--diameter", type=float,
         dest="maxDiam", metavar="DIAMETER",
@@ -192,7 +194,7 @@ def _init_parser():
         default="normal",
         # default = "midpoint",
         help="decomposition strategy "
-        "[default: using tree branch length]")
+             "[default: using tree branch length]")
     # "[default: only include smallest subsets]")
 
     outputGroup = _parser.add_argument_group(
@@ -349,7 +351,6 @@ def _parse_options():
 
     ''' If there is a user-specified config file, read that '''
     if opts.config_file is not None:
-
         config_cmd_defaults = _read_config_file(opts.config_file, opts)
 
         input_args = main_cmd_defaults + config_cmd_defaults + (sys.argv[1:])
@@ -358,7 +359,7 @@ def _parse_options():
             newmessage = message.replace(
                 "arguments:",
                 "arguments (potentially from the config file):").replace(
-                    "--", "")
+                "--", "")
             ArgumentParser.error(parser, newmessage)
 
         parser.error = error_callback


=====================================
sepp/ensemble.py
=====================================
@@ -50,6 +50,7 @@ class EnsembleExhaustiveAlgorithm(ExhaustiveAlgorithm):
     """
     def __init__(self):
         ExhaustiveAlgorithm.__init__(self)
+        self.name = "HIPPI"
         self.symfrac = False
         self.elim = None
         self.filters = True


=====================================
sepp/exhaustive.py
=====================================
@@ -6,12 +6,12 @@ Created on Oct 10, 2012
 from sepp.algorithm import AbstractAlgorithm
 from sepp.config import options
 from sepp.tree import PhylogeneticTree
-from sepp.alignment import MutableAlignment, ExtendedAlignment,\
-    hamming_distance
+from sepp.alignment import (MutableAlignment, ExtendedAlignment,
+                            hamming_distance)
 from sepp.problem import SeppProblem, RootProblem
 from dendropy.datamodel.treemodel import Tree
-from sepp.jobs import HMMBuildJob, HMMSearchJob, HMMAlignJob, PplacerJob,\
-    MergeJsonJob
+from sepp.jobs import (HMMBuildJob, HMMSearchJob, HMMAlignJob, PplacerJob,
+                       MergeJsonJob)
 from sepp.scheduler import JobPool, Join
 from sepp import get_logger
 from sepp.math_utils import lcm
@@ -229,6 +229,7 @@ class ExhaustiveAlgorithm(AbstractAlgorithm):
     """
     def __init__(self):
         AbstractAlgorithm.__init__(self)
+        self.name = "SEPP"
         self.place_nomatch_fragments = False
         ''' Hardcoded E-Lim for hmmsearch '''  # TODO: what to do with this
         self.elim = 99999999


=====================================
sepp/exhaustive_upp.py
=====================================
@@ -8,6 +8,7 @@ import random
 import argparse
 import os
 import shutil
+from math import floor
 from sepp import get_logger
 from sepp.alignment import MutableAlignment, ExtendedAlignment, _write_fasta
 from sepp.exhaustive import JoinAlignJobs, ExhaustiveAlgorithm
@@ -61,6 +62,7 @@ class UPPExhaustiveAlgorithm(ExhaustiveAlgorithm):
 
     def __init__(self):
         ExhaustiveAlgorithm.__init__(self)
+        self.name = "UPP"
         self.pasta_only = False
         self.filtered_taxa = []
 
@@ -72,16 +74,24 @@ class UPPExhaustiveAlgorithm(ExhaustiveAlgorithm):
         fragments = MutableAlignment()
         if options().median_full_length is not None \
                 or options().full_length_range is not None:
-            if options().median_full_length == -1:
+            if options().median_full_length == -1 \
+                    or 0 < options().median_full_length < 1:
+                # for backward compatibility, -1 is mapped to 0.5 quantile.
+                if options().median_full_length == -1:
+                    quantile_value = 0.5
+                else:
+                    quantile_value = options().median_full_length
                 seq_lengths = sorted(
                     [len(seq) for seq in list(sequences.values())])
                 lengths = len(seq_lengths)
-                l2 = int(lengths / 2)
-                if lengths % 2:
-                    options().median_full_length = \
-                        (seq_lengths[l2] + seq_lengths[l2 + 1]) / 2.0
-                else:
+                l2 = int(floor(lengths * quantile_value))
+                #  second condition is to prevent index out of bounds error
+                if lengths % 2 == 1 or l2 == lengths - 1:
                     options().median_full_length = seq_lengths[l2]
+                else:  # lengths % 2 == 0
+                    options().median_full_length = (
+                        seq_lengths[l2] + seq_lengths[l2 + 1]) / 2.0
+
             if options().full_length_range is not None:
                 L = sorted(int(x) for x in options().full_length_range.split())
                 min_length = L[0]
@@ -400,12 +410,14 @@ def augment_parser():
         default=None,
         help="Only consider sequences with lengths within Nmin and Nmax")
     decompGroup.add_argument(
-        "-M", "--median_full_length", type=int,
+        "-M", "--median_full_length", type=float,
         dest="median_full_length", metavar="N",
         default=None,
         help="Consider all fragments that are 25%% longer or shorter than N "
              "to be excluded from the backbone.  If value is -1, then UPP will"
-             " use the median of the sequences as the median full length "
+             " use the median of the sequences as the median full length. "
+             "Use 0 < N < 1 for UPP to use quartiles. e.g.  0.25 for the "
+             "first quartile and 0.75 for the third quartile. "
              "[default: None]")
     decompGroup.add_argument(
         "-T", "--backbone_threshold", type=float,


=====================================
sepp/tree.py
=====================================
@@ -22,8 +22,6 @@
 
 from dendropy import Tree, Taxon, treecalc
 from dendropy import DataSet as Dataset
-from dendropy.datamodel.treemodel import _convert_node_to_root_polytomy as \
-    convert_node_to_root_polytomy
 from sepp import get_logger, sort_by_value
 from sepp.alignment import get_pdistance
 from sepp.decompose_tree import decompose_by_diameter
@@ -258,7 +256,7 @@ for l2 in sys.stdin.readlines():
 
         nr.edge.length = None
         nr.parent_node = None
-        convert_node_to_root_polytomy(nr)
+        nr._convert_node_to_root_polytomy()
         t1 = PhylogeneticTree(Tree(seed_node=nr))
         # temp we could speed this up,
         # by telling the Phylogenetic tree how many leaves it has
@@ -314,7 +312,7 @@ for l2 in sys.stdin.readlines():
 
     def read_tree_from_file(self, treefile, file_format):
         dataset = Dataset()
-        dataset.read(open(treefile, 'rU'), schema=file_format)
+        dataset.read(open(treefile, 'r'), schema=file_format)
         dendropy_tree = dataset.trees_blocks[0][0]
         self._tree = dendropy_tree
         self.n_leaves = self.count_leaves()
@@ -449,5 +447,5 @@ def is_valid_tree(t):
     if num_children == 2:
         # What is with this code?  Why do we check the same variable twice?
         # Bug?  NN
-        assert((not rc[0].child_nodes()) and (not rc[0].child_nodes()))
+        assert ((not rc[0].child_nodes()) and (not rc[0].child_nodes()))
     return True


=====================================
setup.py
=====================================
@@ -29,7 +29,7 @@ from setuptools import find_packages
 from distutils.core import setup, Command
 
 use_setuptools(version="0.6.24")
-version = "4.5.1"
+version = "4.5.4"
 
 
 def get_tools_dir(where):
@@ -162,7 +162,7 @@ setup(name="sepp",
       author_email="smirarab at gmail.com, namphuon at cs.utah.edu",
 
       license="General Public License (GPL)",
-      install_requires=["dendropy >= 4.0.0"],
+      install_requires=["dendropy >= 4.6.0"],
       provides=["sepp"],
       scripts=["run_sepp.py", 'run_upp.py', "split_sequences.py"],
       cmdclass={"config": ConfigSepp, "upp": ConfigUPP},


=====================================
test/unittest/TestFork.py
=====================================
@@ -100,9 +100,9 @@ def run():
 
     # Test one of the jobs, to see if it is successful
     if sample_job.ready() and sample_job.successful():
-        assert(jobs[3].result_set is True)
+        assert (jobs[3].result_set is True)
     else:
-        assert(jobs[3].result_set is False)
+        assert (jobs[3].result_set is False)
 
     errors = pool.get_all_job_errors()
     # print("Following job errors were raised:", errors)


=====================================
test/unittest/testAlignment.py
=====================================
@@ -4,8 +4,8 @@ Created on Sep 19, 2012
 @author: smirarab
 '''
 import unittest
-from sepp.alignment import MutableAlignment, ReadonlySubalignment,\
-    ExtendedAlignment
+from sepp.alignment import (MutableAlignment, ReadonlySubalignment,
+                            ExtendedAlignment)
 from sepp.problem import SeppProblem
 from sepp.filemgr import get_data_path
 from tempfile import mkstemp


=====================================
test/unittest/testConfig.py
=====================================
@@ -145,12 +145,12 @@ class Test(unittest.TestCase):
         sepp._DEBUG = True
         sepp.reset_loggers()
         sepp.jobs._LOG.debug("test debugging works")
-        assert(sepp.jobs._LOG.getEffectiveLevel() == logging.DEBUG)
+        assert (sepp.jobs._LOG.getEffectiveLevel() == logging.DEBUG)
 
         sepp._DEBUG = False
         sepp.reset_loggers()
         sepp.jobs._LOG.debug("test debugging is disabled")
-        assert(sepp.jobs._LOG.getEffectiveLevel() == logging.INFO)
+        assert (sepp.jobs._LOG.getEffectiveLevel() == logging.INFO)
 
         sepp._DEBUG = sdb
         sepp.reset_loggers()


=====================================
test/unittest/testUPP.py
=====================================
@@ -60,12 +60,11 @@ class Test(unittest.TestCase):
         shutil.rmtree(self.x.options.outdir, ignore_errors=True)
 
     def test_id_collision_working(self):
-
         self.x.run()
         self.assertTrue(self.x.results is not None)
-        assert(len(self.x.results) == 490)
-        assert(300 < len(self.x.results['SEQ396']) < 600)
-        assert(len(self.x.results['SEQ554'].replace('-', '')) == 57)
+        assert (len(self.x.results) == 490)
+        assert (300 < len(self.x.results['SEQ396']) < 600)
+        assert (len(self.x.results['SEQ554'].replace('-', '')) == 57)
 
 
 if __name__ == "__main__":



View it on GitLab: https://salsa.debian.org/med-team/sepp/-/compare/33299d5522c2f83598cb20e5ab65f282917efa4e...90e9581b16e22d578d2d9cb9a4c5b754007d8b45

-- 
View it on GitLab: https://salsa.debian.org/med-team/sepp/-/compare/33299d5522c2f83598cb20e5ab65f282917efa4e...90e9581b16e22d578d2d9cb9a4c5b754007d8b45
You're receiving this email because of your account on salsa.debian.org.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://alioth-lists.debian.net/pipermail/debian-med-commit/attachments/20240816/488355b7/attachment-0001.htm>