[med-svn] [Git][med-team/gubbins][upstream] New upstream version 3.3.3

Étienne Mollier (@emollier) gitlab at salsa.debian.org
Sun Feb 4 17:23:18 GMT 2024



Étienne Mollier pushed to branch upstream at Debian Med / gubbins


Commits:
1ef3ddb3 by Étienne Mollier at 2024-02-04T17:35:44+01:00
New upstream version 3.3.3
- - - - -


9 changed files:

- README.md
- + docs/gpt_gubbins_logo.png
- docs/gubbins_plotting.md
- python/gubbins/common.py
- python/gubbins/run_gubbins.py
- python/gubbins/tests/test_dependencies.py
- python/gubbins/tests/test_utils.py
- python/gubbins/treebuilders.py
- python/gubbins/utils.py


Changes:

=====================================
README.md
=====================================
@@ -1,11 +1,13 @@
-# Gubbins
+# Gubbins <img src='docs/gpt_gubbins_logo.png' align="right" height="250" />
 **G**enealogies **U**nbiased **B**y recom**B**inations **I**n **N**ucleotide **S**equences
 
+<!-- badges: start -->
 ![build](https://github.com/nickjcroucher/gubbins/workflows/build/badge.svg)  
 [![License: GPL v2](https://img.shields.io/badge/License-GPL%20v2-brightgreen.svg)](https://github.com/nickjcroucher/gubbins/blob/master/LICENSE)   
 [![status](https://img.shields.io/badge/NAR-10.1093-brightgreen.svg)](https://academic.oup.com/nar/article/43/3/e15/2410982)   
 [![install with bioconda](https://img.shields.io/badge/install%20with-bioconda-brightgreen.svg)](http://bioconda.github.io/recipes/gubbins/README.html)  
 [![codecov](https://codecov.io/gh/nickjcroucher/gubbins/branch/master/graph/badge.svg)](https://codecov.io/gh/nickjcroucher/gubbins)
+<!-- badges: end -->
 
 ## Contents
   * [Introduction](#introduction)
@@ -74,7 +76,7 @@ autoreconf -i
 make
 [sudo] make install
 cd python
-[sudo] python3 -m pip install .
+[sudo] python3 -m pip install [--prefix=$PREFIX] .
 ```
 Use `sudo` to install Gubbins system-wide. If you don't have the permissions, run `configure` with a prefix to install Gubbins in your home directory.
 
@@ -132,6 +134,17 @@ Gubbins is free software, licensed under [GPLv2](https://github.com/nickjcrouche
 ## Feedback/Issues
 There is no specific support for development or maintenance of Gubbins. However, we will try to help you out if you report any issues about usage of the software to the [issues page](https://github.com/nickjcroucher/gubbins/issues).
 
+## Development plan
+Version 3 incorporates a number of features that were explicitly requested by users (e.g. plotting functions), improved the algorithm's accuracy (e.g. using joint ancestral reconstruction) and were commonly used in published analyses (e.g. using IQTREE2 for phylogeny construction).
+
+Future development will prioritise:
+- More efficient phylogenetic processing with modern python libraries
+- Parallelisation of recombination searches
+- Faster sequence reconstruction through hardware acceleration
+- Extension of existing analyses using phylogenetic placement
+
+If you believe there are other improvements that could be added, please describe them on the [issues page](https://github.com/nickjcroucher/gubbins/issues) and tag the suggestion as an "enhancement".
+
 ## Citation
 If you use this software please cite:
 [Croucher N. J., Page A. J., Connor T. R., Delaney A. J., Keane J. A., Bentley S. D., Parkhill J., Harris S.R.


=====================================
docs/gpt_gubbins_logo.png
=====================================
Binary files /dev/null and b/docs/gpt_gubbins_logo.png differ


=====================================
docs/gubbins_plotting.md
=====================================
@@ -97,7 +97,7 @@ These options provide flexibility with regard to the labelling of different comp
 
 ### Example
 
-Using files from the associated [FigShare repository](https://figshare.com/account/projects/130637/articles/24117117), we can replicate the analysis shown in [Kwun, Ion, Cheng *et al*](https://genomemedicine.biomedcentral.com/articles/10.1186/s13073-022-01147-2) with the command:
+Using files from the associated [FigShare repository](https://dx.doi.org/10.6084/m9.figshare.24117117), we can replicate the analysis shown in [Kwun, Ion, Cheng *et al*](https://genomemedicine.biomedcentral.com/articles/10.1186/s13073-022-01147-2) with the command:
 
 ```
 plot_gubbins.R --tree serotype_3.tre --rec serotype_3_recombination.gff --annotation serotype_3_annotation.gff  --meta serotype_3_metadata.csv --max-branch-length 500 --clades serotype_3_clades.csv --markup serotype_3_markup.csv --legend-height 0.35  --tree-axis-expansion 30 --markup-height 0.1 --heatmap-x-nudge 0.05 --heatmap-y-nudge -0.05 --output serotype_3.png


=====================================
python/gubbins/common.py
=====================================
@@ -744,13 +744,13 @@ def return_algorithm_choices(args,i):
 def return_algorithm(algorithm_choice, model, input_args, node_labels = None, extra = None):
     initialised_algorithm = None
     if algorithm_choice == "fasttree":
-        initialised_algorithm = FastTree(threads = input_args.threads, model = model, bootstrap = input_args.bootstrap, verbose = input_args.verbose, additional_args = extra)
+        initialised_algorithm = FastTree(threads = input_args.threads, model = model, seed = input_args.seed, bootstrap = input_args.bootstrap, verbose = input_args.verbose, additional_args = extra)
     elif algorithm_choice == "raxml":
-        initialised_algorithm = RAxML(threads = input_args.threads, model = model, bootstrap = input_args.bootstrap, internal_node_prefix = node_labels, verbose = input_args.verbose, additional_args = extra)
+        initialised_algorithm = RAxML(threads = input_args.threads, model = model, seed = input_args.seed, bootstrap = input_args.bootstrap, internal_node_prefix = node_labels, verbose = input_args.verbose, additional_args = extra)
     elif algorithm_choice == "raxmlng":
-        initialised_algorithm = RAxMLNG(threads = input_args.threads, model = model, bootstrap = input_args.bootstrap, internal_node_prefix = node_labels, verbose = input_args.verbose, additional_args = extra)
+        initialised_algorithm = RAxMLNG(threads = input_args.threads, model = model, seed = input_args.seed, bootstrap = input_args.bootstrap, internal_node_prefix = node_labels, verbose = input_args.verbose, additional_args = extra)
     elif algorithm_choice == "iqtree":
-        initialised_algorithm = IQTree(threads = input_args.threads, model = model, bootstrap = input_args.bootstrap, internal_node_prefix = node_labels, verbose = input_args.verbose, use_best = (model is None and input_args.best_model), additional_args = extra)
+        initialised_algorithm = IQTree(threads = input_args.threads, model = model, seed = input_args.seed, bootstrap = input_args.bootstrap, internal_node_prefix = node_labels, verbose = input_args.verbose, use_best = (model is None and input_args.best_model), additional_args = extra)
     elif algorithm_choice == "rapidnj":
         initialised_algorithm = RapidNJ(threads = input_args.threads, model = model, bootstrap = input_args.bootstrap, verbose = input_args.verbose, additional_args = extra)
     elif algorithm_choice == "star":


=====================================
python/gubbins/run_gubbins.py
=====================================
@@ -76,6 +76,8 @@ def parse_input_args():
                                                       default = False, action = 'store_true')
     treeGroup.add_argument('--sh-test',               help='Perform an SH test of node likelihoods', default = False,
                                                       action = 'store_true')
+    treeGroup.add_argument('--seed',                  help='Set seed for reproducibility of analysis',
+                                                      default = None, type = int)
                                                           
     modelGroup = parser.add_argument_group('Nucleotide substitution model options')
     modelGroup.add_argument('--model',          '-M', help='Nucleotide substitution model (not all available for all '


=====================================
python/gubbins/tests/test_dependencies.py
=====================================
@@ -79,6 +79,18 @@ class TestExternalDependencies(unittest.TestCase):
         self.cleanup('multiple_recombinations')
         assert exit_code == 0
 
+    def test_fasttree_seed(self):
+        exit_code = 1
+        parser = run_gubbins.parse_input_args()
+        common.parse_and_run(parser.parse_args(["--tree-builder", "fasttree",
+                                                    "--verbose", "--iterations", "3",
+                                                    "--seed","42",
+                                                    "--threads", "1",
+                                                    os.path.join(data_dir, 'multiple_recombinations.aln')]))
+        exit_code = self.check_for_output_files('multiple_recombinations')
+        self.cleanup('multiple_recombinations')
+        assert exit_code == 0
+
     # Test resuming a default analysis
     def test_fasttree_resume(self):
         exit_code = 1
@@ -122,6 +134,18 @@ class TestExternalDependencies(unittest.TestCase):
         self.cleanup('multiple_recombinations')
         assert exit_code == 0
 
+    def test_iqtree_seed(self):
+        exit_code = 1
+        parser = run_gubbins.parse_input_args()
+        common.parse_and_run(parser.parse_args(["--tree-builder", "iqtree",
+                                                "--verbose", "--iterations", "3",
+                                                "--seed","42",
+                                                "--threads", "1",
+                                                os.path.join(data_dir, 'multiple_recombinations.aln')]))
+        exit_code = self.check_for_output_files('multiple_recombinations')
+        self.cleanup('multiple_recombinations')
+        assert exit_code == 0
+
     def test_iqtree_custom_model(self):
         exit_code = 1
         parser = run_gubbins.parse_input_args()
@@ -246,6 +270,18 @@ class TestExternalDependencies(unittest.TestCase):
         self.cleanup('multiple_recombinations')
         assert exit_code == 0
 
+    def test_raxml_seed(self):
+        exit_code = 1
+        parser = run_gubbins.parse_input_args()
+        common.parse_and_run(parser.parse_args(["--tree-builder", "raxml",
+                                                    "--verbose", "--iterations", "3",
+                                                    "--seed","42",
+                                                    "--threads", "1",
+                                                    os.path.join(data_dir, 'multiple_recombinations.aln')]))
+        exit_code = self.check_for_output_files('multiple_recombinations')
+        self.cleanup('multiple_recombinations')
+        assert exit_code == 0
+
     def test_raxml_custom_model(self):
         exit_code = 1
         parser = run_gubbins.parse_input_args()
@@ -301,6 +337,19 @@ class TestExternalDependencies(unittest.TestCase):
         self.cleanup('multiple_recombinations')
         assert exit_code == 0
 
+    def test_raxmlng_seed(self):
+        exit_code = 1
+        parser = run_gubbins.parse_input_args()
+        common.parse_and_run(parser.parse_args(["--tree-builder", "raxmlng",
+                                                    "--model","GTR",
+                                                    "--verbose", "--iterations", "3",
+                                                    "--seed","42",
+                                                    "--threads", "1",
+                                                    os.path.join(data_dir, 'multiple_recombinations.aln')]))
+        exit_code = self.check_for_output_files('multiple_recombinations')
+        self.cleanup('multiple_recombinations')
+        assert exit_code == 0
+
     def test_raxmlng_custom_model(self):
         exit_code = 1
         parser = run_gubbins.parse_input_args()


=====================================
python/gubbins/tests/test_utils.py
=====================================
@@ -107,3 +107,9 @@ class TestUtilities(unittest.TestCase):
             printer.print(["AAA", "BBB"])
             printed = f.getvalue()
         assert printed == "AAA-BBB\n"
+
+    def test_seed(self):
+        set_seed_val = utils.set_seed(42)
+        assert set_seed_val == "42"
+        random_seed_val = utils.set_seed(None)
+        assert(int(random_seed_val) < 10001)


=====================================
python/gubbins/treebuilders.py
=====================================
@@ -20,7 +20,6 @@
 import sys
 import os
 import subprocess
-from random import randint
 
 from Bio import SeqIO
 
@@ -129,7 +128,7 @@ class RapidNJ:
 class FastTree:
     """Class for operations with the FastTree executable"""
 
-    def __init__(self, threads: int, bootstrap = 0, model='GTRCAT', verbose=False, additional_args = None):
+    def __init__(self, threads: int, bootstrap = 0, model='GTRCAT', seed = None, verbose=False, additional_args = None):
         """Initialises the object"""
         self.verbose = verbose
         self.threads = threads
@@ -139,9 +138,10 @@ class FastTree:
         self.alignment_suffix = ".snp_sites.aln"
         self.bootstrap = bootstrap
         self.additional_args = additional_args
+        self.seed = utils.set_seed(seed)
 
         # Identify executable
-        self.potential_executables = ["FastTree", "fasttree"]
+        self.potential_executables = ["FastTreeMP","fasttreeMP","FastTree", "fasttree"]
         self.executable = utils.choose_executable(self.potential_executables)
         if self.executable is None:
             sys.exit("No usable version of FastTree could be found.")
@@ -164,6 +164,7 @@ class FastTree:
             command.extend(["-gtr"])
         else:
             command.extend([self.model])
+        command.extend(["-seed",self.seed])
         # Additional arguments
         if self.additional_args is not None:
             command.extend([self.additional_args])
@@ -256,7 +257,7 @@ class FastTree:
 class IQTree:
     """Class for operations with the IQTree executable"""
 
-    def __init__(self, threads: 1, model: str, bootstrap = 0, internal_node_prefix="", verbose=False, use_best=False, additional_args = None):
+    def __init__(self, threads: 1, model: str, bootstrap = 0, seed = None, internal_node_prefix="", verbose=False, use_best=False, additional_args = None):
         """Initialises the object"""
         self.verbose = verbose
         self.threads = threads
@@ -271,6 +272,7 @@ class IQTree:
         self.internal_node_prefix = internal_node_prefix
         self.bootstrap = bootstrap
         self.use_best = use_best
+        self.seed = utils.set_seed(seed)
         self.additional_args = additional_args
     
         # Construct base command
@@ -303,6 +305,7 @@ class IQTree:
             command.extend(["-m","GTR+G4"])
         else:
             command.extend(["-m",self.model])
+        command.extend(["-seed",self.seed])
         # Additional arguments
         if self.additional_args is not None:
             command.extend([self.additional_args])
@@ -432,7 +435,7 @@ class IQTree:
 class RAxML:
     """Class for operations with the RAxML executable"""
 
-    def __init__(self, threads: 1, model='GTRCAT', bootstrap = 0, internal_node_prefix="", verbose=False, additional_args = None):
+    def __init__(self, threads: 1, model='GTRCAT', bootstrap = 0, seed = None, internal_node_prefix="", verbose=False, additional_args = None):
         """Initialises the object"""
         self.verbose = verbose
         self.threads = threads
@@ -446,6 +449,7 @@ class RAxML:
         self.alignment_suffix = ".phylip"
         self.internal_node_prefix = internal_node_prefix
         self.bootstrap = bootstrap
+        self.seed = utils.set_seed(seed)
         self.additional_args = additional_args
 
         self.single_threaded_executables = ['raxmlHPC-AVX2', 'raxmlHPC-AVX', 'raxmlHPC-SSE3', 'raxmlHPC']
@@ -465,9 +469,6 @@ class RAxML:
         if self.threads > 1:
             command.extend(["-T", str(self.threads)])
 
-        # Set a seed
-        command.extend(["-p",str(randint(0, 10000))])
-
         # Add flags
         command.extend(["-safe"])
         if self.model == 'JC':
@@ -482,6 +483,7 @@ class RAxML:
             command.extend(["-m","GTRGAMMA"])
         else:
             command.extend(["-m", self.model])
+        command.extend(["-p",self.seed])
         # Additional arguments
         if self.additional_args is not None:
             command.extend([self.additional_args])
@@ -579,9 +581,7 @@ class RAxML:
         command = self.base_command.copy()
         command.extend(["-s", alignment_filename, "-n", basename + ".bootstrapped_trees"])
         command.extend(["-w",tmp])
-        p_seed = str(randint(0, 10000))
-        command.extend(["-p",p_seed])
-        command.extend(["-x",p_seed])
+        command.extend(["-x",self.seed])
         command.extend(["-#",str(self.bootstrap)])
         # Output
         if not self.verbose:
@@ -592,8 +592,6 @@ class RAxML:
     def sh_test(self, alignment_filename: str, input_tree: str, basename: str, tmp: str) -> str:
         """Runs a single branch support test"""
         command = self.base_command.copy()
-        p_seed = str(randint(0, 10000))
-        command.extend(["-p",p_seed])
         command.extend(["-f", "J"])
         command.extend(["-s", alignment_filename, "-n", input_tree + ".sh_support"])
         command.extend(["-t", input_tree])
@@ -610,7 +608,7 @@ class RAxML:
 class RAxMLNG:
     """Class for operations with the RAxML executable"""
 
-    def __init__(self, threads: 1, model: str, bootstrap = 0, internal_node_prefix = "", verbose = False, additional_args = None):
+    def __init__(self, threads: 1, model: str, bootstrap = 0, seed = None, internal_node_prefix = "", verbose = False, additional_args = None):
         """Initialises the object"""
         self.verbose = verbose
         self.threads = threads
@@ -624,6 +622,7 @@ class RAxMLNG:
         self.alignment_suffix = ".phylip"
         self.internal_node_prefix = internal_node_prefix
         self.bootstrap = bootstrap
+        self.seed = utils.set_seed(seed)
         self.additional_args = additional_args
 
         self.single_threaded_executables = ['raxml-ng']
@@ -655,6 +654,7 @@ class RAxMLNG:
             command.extend(["GTR+G"])
         else:
             command.extend([self.model])
+        command.extend(["--seed",self.seed])
         # Additional arguments
         if self.additional_args is not None:
             command.extend([self.additional_args])


=====================================
python/gubbins/utils.py
=====================================
@@ -23,6 +23,7 @@ import subprocess
 import re
 import numpy as np
 import collections
+from random import randint
 try:
     from multiprocessing.managers import SharedMemoryManager
     NumpyShared = collections.namedtuple('NumpyShared', ('name', 'shape', 'dtype'))
@@ -195,3 +196,11 @@ def extend_args(var,add):
         var.extend([add])
         var = " ".join(var)
     return var
+
+def set_seed(seed):
+    """Set seed when specified"""
+    if seed is None:
+        seed = str(randint(0, 10000))
+    else:
+        seed = str(seed)
+    return seed



View it on GitLab: https://salsa.debian.org/med-team/gubbins/-/commit/1ef3ddb3c4df977af600223e5113b0f4ed3b980d

-- 
View it on GitLab: https://salsa.debian.org/med-team/gubbins/-/commit/1ef3ddb3c4df977af600223e5113b0f4ed3b980d
You're receiving this email because of your account on salsa.debian.org.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://alioth-lists.debian.net/pipermail/debian-med-commit/attachments/20240204/23ece8fc/attachment-0001.htm>


More information about the debian-med-commit mailing list