[med-svn] [Git][med-team/busco][master] 6 commits: New upstream version 4.1.2

Steffen Möller gitlab at salsa.debian.org
Mon Jul 27 15:50:07 BST 2020



Steffen Möller pushed to branch master at Debian Med / busco


Commits:
796d64b1 by Steffen Moeller at 2020-07-27T16:47:05+02:00
New upstream version 4.1.2
- - - - -
efa8c80f by Steffen Moeller at 2020-07-27T16:47:05+02:00
routine-update: New upstream version

- - - - -
83624ec5 by Steffen Moeller at 2020-07-27T16:47:08+02:00
Update upstream source from tag 'upstream/4.1.2'

Update to upstream version '4.1.2'
with Debian dir 0e4645142a4444220eb2b00d9371fcc9c19d8b37
- - - - -
c97cf68b by Steffen Moeller at 2020-07-27T16:47:09+02:00
routine-update: debhelper-compat 13

- - - - -
e8352179 by Steffen Moeller at 2020-07-27T16:47:16+02:00
Set upstream metadata fields: Repository, Repository-Browse.

Changes-By: lintian-brush
Fixes: lintian: upstream-metadata-missing-repository
See-also: https://lintian.debian.org/tags/upstream-metadata-missing-repository.html

- - - - -
1c4781f6 by Steffen Moeller at 2020-07-27T16:47:34+02:00
routine-update: Ready to upload to unstable

- - - - -


25 changed files:

- CHANGELOG
- README.md
- bin/busco
- config/config.ini
- debian/changelog
- debian/control
- debian/upstream/metadata
- src/busco/Analysis.py
- src/busco/AutoLineage.py
- src/busco/BuscoAnalysis.py
- src/busco/BuscoConfig.py
- src/busco/BuscoDownloadManager.py
- src/busco/BuscoLogger.py
- src/busco/BuscoPlacer.py
- src/busco/BuscoRunner.py
- src/busco/BuscoTools.py
- src/busco/GeneSetAnalysis.py
- src/busco/GenomeAnalysis.py
- src/busco/Toolset.py
- src/busco/TranscriptomeAnalysis.py
- − src/busco/ViralAnalysis.py
- src/busco/_version.py
- src/busco/run_BUSCO.py
- test_data/bacteria/expected_log.txt
- test_data/eukaryota/expected_log.txt


Changes:

=====================================
CHANGELOG
=====================================
@@ -1,3 +1,17 @@
+4.1.2
+- Issue #295 fixed
+
+4.1.1
+- Issue #287 fixed
+
+4.1.0
+- Reintroduce restart mode (Issues #203, #229, #251)
+- Fix augustus hanging problem (Issues #224, #232, #266)
+- Allow multiple cores for BLAST 2.10.1+
+- Issue #271 fixed
+- Issue #247 fixed
+- Issue #234 fixed
+
 4.0.6
 - Fix Augustus GFF parsing bug. Remove constraint on Augustus version.
 


=====================================
README.md
=====================================
@@ -1,5 +1,7 @@
 **BUSCOv4 - Benchmarking sets of Universal Single-Copy Orthologs.**
 
+For full documentation please consult the user guide: https://busco.ezlab.org/busco_userguide.html
+
 Main changes in v4:
 
 - Automated selection of lineages issued from https://www.orthodb.org/ release 10
@@ -14,30 +16,22 @@ To install, clone the repository and enter ``sudo python3 setup.py install`` or
 
 More details in the user guide: https://busco.ezlab.org/busco_userguide.html#manual-installation
 
-Do not forget to edit the ``config/config.ini`` file to match your environment. The script `scripts/busco_configurator.py` can help you filling it. You also have to set the ``BUSCO_CONFIG_FILE`` 
-environment variable to define the path (including the filename) to that ``config.ini`` file. It can be located anywhere.
+Do not forget to edit the ``config/config.ini`` file to match your environment. The script `scripts/busco_configurator.py` can help with this. 
+You can set the ``BUSCO_CONFIG_FILE`` environment variable to define the path (including the filename) to that ``config.ini`` file. 
 
 ```
 export BUSCO_CONFIG_FILE="/path/to/myconfig.ini"
 ```
+Alternatively you can pass the config file path as a command line argument using ``--config /path/to/config.ini``.
+
 
 If you have trouble installing one of the many third-party tools, try the official Docker container: https://hub.docker.com/r/ezlabgva/busco/tags
 
 Report problems on the BUSCO issue board at https://gitlab.com/ezlab/busco/issues
 
-To get help on BUSCO use: ``busco -h`` and ``python3 scripts/generate_plot.py -h``
-
-**!!!** Don't use "odb9" datasets with BUSCOv4. If you need to reproduce previous analyses, use BUSCOv3 (https://gitlab.com/ezlab/busco/-/tags/3.0.2)
+To get help with BUSCO use: ``busco -h`` and ``python3 scripts/generate_plot.py -h``
 
-Note: For v4.0.2 and before, when running auto-lineage, the initial results for eukaryotes were incomplete. This was 
-deliberate, as these initial results are used merely to determine whether the genome scores highest against the 
-bacteria, archaea or eukaryota datasets. If the eukaryota dataset was selected, BUSCO then attempts to place the input 
-assembly on the eukaryote phylogenetic tree before running a complete BUSCO assessment using the selected child dataset. 
-Unless the top-level eukaryota dataset was selected as the best match for the input file, the eukaryota dataset run 
-would not complete. So while the specific dataset run returned accurate results, the generic eukaryota dataset run 
-should be considered unreliable. 
-This has been changed in v4.0.3. The eukaryota run now always completes so the final generic eukaryota results can be 
-considered reliable.
+**!!!** Do not use "odb9" datasets with BUSCOv4. If you need to reproduce previous analyses, use BUSCOv3 (https://gitlab.com/ezlab/busco/-/tags/3.0.2)
 
 **How to cite BUSCO**
 


=====================================
bin/busco
=====================================
@@ -8,14 +8,16 @@ except ImportError as err:
         pattern_search = re.search("cannot import name '(?P<module_name>[\w]+)", err.msg)
         missing_module = pattern_search.group("module_name")
         if missing_module == "run_BUSCO":
-            print("BUSCO must be installed before it is run. Please enter 'python setup.py install (--user)'. See the user guide for more information.")
+            print("BUSCO must be installed before it is run. Please enter 'python setup.py install (--user)'. "
+                  "See the user guide for more information.")
         elif missing_module == "Bio":
             print("Please install BioPython (https://biopython.org/) before running BUSCO.")
         elif missing_module == "numpy":
             print("Please install NumPy before running BUSCO.")
         else:
             print("Unable to find module {}. Please make sure it is installed. See the user guide and the GitLab issue "
-                  "board (https://gitlab.com/ezlab/busco/issues) if you need further assistance.".format(missing_module))
+                  "board (https://gitlab.com/ezlab/busco/issues) if you need further assistance."
+                  "".format(missing_module))
 
     except:
         print(err.msg)
@@ -23,4 +25,4 @@ except ImportError as err:
               "GitLab issue board (https://gitlab.com/ezlab/busco/issues) if you need further assistance.")
     raise SystemExit(0)
 
-run_BUSCO.main()
\ No newline at end of file
+run_BUSCO.main()


=====================================
config/config.ini
=====================================
@@ -5,7 +5,7 @@
 # Many of the options in the busco_run section can alternatively be set using command line arguments. See the help prompt (busco -h) for details.
 # WARNING: passing a parameter through the command line overrides the value specified in this file.
 #
-# You need to set the path to this file in the environment variable BUSCO_CONFIG_PATH
+# You need to set the path to this file in the environment variable BUSCO_CONFIG_FILE
 # as follows:
 # export BUSCO_CONFIG_FILE="/path/to/myconfig.ini"
 #
@@ -32,6 +32,8 @@
 ;cpu = 16
 # Force rewrite if files already exist (True/False)
 ;force = False
+# Restart a previous BUSCO run (True/False)
+;restart = False
 # Blast e-value
 ;evalue = 1e-3
 # How many candidate regions (contigs, scaffolds) to consider for each BUSCO
@@ -56,11 +58,11 @@
 ;update-data = True
 
 [tblastn]
-path = /ncbi-blast-2.2.31+/bin/
+path = /ncbi-blast-2.10.1+/bin/
 command = tblastn
 
 [makeblastdb]
-path = /ncbi-blast-2.2.31+/bin/
+path = /ncbi-blast-2.10.1+/bin/
 command = makeblastdb
 
 [augustus]


=====================================
debian/changelog
=====================================
@@ -1,3 +1,12 @@
+busco (4.1.2-1) unstable; urgency=medium
+
+  * Team upload.
+  * New upstream version
+  * debhelper-compat 13 (routine-update)
+  * Set upstream metadata fields: Repository, Repository-Browse.
+
+ -- Steffen Moeller <moeller at debian.org>  Mon, 27 Jul 2020 16:47:16 +0200
+
 busco (4.0.6-2) unstable; urgency=medium
 
   * Rules-Requires-Root: no (routine-update)


=====================================
debian/control
=====================================
@@ -3,7 +3,7 @@ Section: science
 Priority: optional
 Maintainer: Debian Med Packaging Team <debian-med-packaging at lists.alioth.debian.org>
 Uploaders: Andreas Tille <tille at debian.org>
-Build-Depends: debhelper-compat (= 12),
+Build-Depends: debhelper-compat (= 13),
                dh-python,
                python3,
                python3-setuptools


=====================================
debian/upstream/metadata
=====================================
@@ -15,3 +15,5 @@ Reference:
 Registry:
  - Name: conda:bioconda
    Entry: busco
+Repository: https://gitlab.com/ezlab/busco.git
+Repository-Browse: https://gitlab.com/ezlab/busco


=====================================
src/busco/Analysis.py
=====================================
@@ -1,11 +1,8 @@
 from Bio import SeqIO
 from busco.BuscoTools import TBLASTNRunner, MKBLASTRunner
-from busco.Toolset import Tool
 from busco.BuscoLogger import BuscoLogger
-from busco.BuscoLogger import LogDecorator as log
-import subprocess
 import os
-from abc import ABCMeta, abstractmethod
+from abc import ABCMeta
 
 logger = BuscoLogger.get_logger(__name__)
 
@@ -17,26 +14,13 @@ class NucleotideAnalysis(metaclass=ABCMeta):
     # explanation of ambiguous codes found here: https://www.dnabaser.com/articles/IUPAC%20ambiguity%20codes.html
     AMBIGUOUS_CODES = ["Y", "R", "W", "S", "K", "M", "D", "V", "H", "B"]
 
-    MAX_FLANK = 20000
-
-    def __init__(self, config):
-        # Variables inherited from BuscoAnalysis
-        self._config = None
-        self._cpus = None
-        self._input_file = None
-
-        super().__init__(config)  # Initialize BuscoAnalysis
-        self._long = self._config.getboolean("busco_run", "long")
-        self._flank = self._define_flank()
-        self._ev_cutoff = self._config.getfloat("busco_run", "evalue")
-        self._region_limit = self._config.getint("busco_run", "limit")
-        self.blast_cpus = self._cpus
+    def __init__(self):
 
+        super().__init__()  # Initialize BuscoAnalysis
         if not self.check_nucleotide_file(self._input_file):
             raise SystemExit("Please provide a nucleotide file as input")
 
     def check_nucleotide_file(self, filename):
-
         i = 0
         for record in SeqIO.parse(filename, "fasta"):
             for letter in record.seq.upper():
@@ -51,66 +35,43 @@ class NucleotideAnalysis(metaclass=ABCMeta):
 
         return True
 
-    def _define_flank(self):
-        """
-        TODO: Add docstring
-        :return:
-        """
-        try:
-            size = os.path.getsize(self._input_file) / 1000  # size in mb
-            flank = int(size / 50)  # proportional flank size
-            # Ensure value is between 5000 and MAX_FLANK
-            flank = min(max(flank, 5000), type(self).MAX_FLANK)
-        except IOError:  # Input data is only validated during run_analysis. This will catch any IO issues before that.
-            raise SystemExit("Impossible to read the fasta file {}".format(self._input_file))
-
-        return flank
-
-    @abstractmethod
-    def init_tools(self):  # todo: This should be an abstract method
-        """
-        Initialize all required tools for Genome Eukaryote Analysis:
-        MKBlast, TBlastn, Augustus and Augustus scripts: GFF2GBSmallDNA, new_species, etraining
-        :return:
-        """
+    def init_tools(self):
         super().init_tools()
+        self.mkblast_runner = MKBLASTRunner()
+        self.tblastn_runner = TBLASTNRunner()
 
-
-    def check_tool_dependencies(self):
-        super().check_tool_dependencies()
-
-    def _get_blast_version(self):
-        mkblastdb_version_call = subprocess.check_output([self._mkblast_tool.cmd, "-version"], shell=False)
-        mkblastdb_version = ".".join(mkblastdb_version_call.decode("utf-8").split("\n")[0].split()[1].rsplit(".")[:-1])
-
-        tblastn_version_call = subprocess.check_output([self._tblastn_tool.cmd, "-version"], shell=False)
-        tblastn_version = ".".join(tblastn_version_call.decode("utf-8").split("\n")[0].split()[1].rsplit(".")[:-1])
-
-        if mkblastdb_version != tblastn_version:
-            logger.warning("You are using version {} of mkblastdb and version {} of tblastn.".format(mkblastdb_version, tblastn_version))
-
-        return tblastn_version
+        if self.mkblast_runner.version != self.tblastn_runner.version:
+            logger.warning("You are using version {} of makeblastdb and version {} of tblastn.".format(
+                self.mkblast_runner.version, self.tblastn_runner.version))
 
     def _run_mkblast(self):
-        self.mkblast_runner = MKBLASTRunner(self._mkblast_tool, self._input_file, self.main_out, self._cpus)
-        self.mkblast_runner.run()
+        if self.restart and self.mkblast_runner.check_previous_completed_run():
+            logger.info("Skipping makeblastdb as BLAST DB already exists at {}".format(self.mkblast_runner.output_db))
+        else:
+            self.restart = False  # Turn off restart mode if this is the entry point
+            self.config.set("busco_run", "restart", str(self.restart))
+            self.mkblast_runner.run()
+        if len(os.listdir(os.path.split(self.mkblast_runner.output_db)[0])) == 0:
+            raise SystemExit("makeblastdb failed to create a BLAST DB at {}".format(self.mkblast_runner.output_db))
 
     def _run_tblastn(self, missing_and_frag_only=False, ancestral_variants=False):
 
         incomplete_buscos = (self.hmmer_runner.missing_buscos + list(self.hmmer_runner.fragmented_buscos.keys())
                              if missing_and_frag_only else None)  # This parameter is only used on the re-run
 
-        self.tblastn_runner = TBLASTNRunner(self._tblastn_tool, self._input_file, self.run_folder, self._lineage_dataset,
-                                            self.mkblast_runner.output_db, self._ev_cutoff, self.blast_cpus,
-                                            self._region_limit, self._flank, missing_and_frag_only, ancestral_variants,
-                                            incomplete_buscos)
-
-        self.tblastn_runner.run()
-        coords = self.tblastn_runner._get_coordinates()
-        coords = self.tblastn_runner._filter_best_matches(coords)  # Todo: remove underscores from non-hidden methods
-        self.tblastn_runner._write_coordinates_to_file(coords)  # writes to "coordinates.tsv"
-        self.tblastn_runner._write_contigs(coords)
-        return coords
+        self.tblastn_runner.configure_runner(self.mkblast_runner.output_db, missing_and_frag_only,
+                                             ancestral_variants, incomplete_buscos)
+        if self.restart and self.tblastn_runner.check_previous_completed_run():
+            logger.info("Skipping tblastn as results already exist at {}".format(self.tblastn_runner.blast_filename))
+        else:
+            self.restart = False
+            self.config.set("busco_run", "restart", str(self.restart))
+            self.tblastn_runner.run()
+        self.tblastn_runner.get_coordinates()
+        self.tblastn_runner.filter_best_matches()
+        self.tblastn_runner.write_coordinates_to_file()  # writes to "coordinates.tsv"
+        self.tblastn_runner.write_contigs()
+        return
 
 
 class ProteinAnalysis:
@@ -118,8 +79,8 @@ class ProteinAnalysis:
     LETTERS = ["F", "L", "I", "M", "V", "S", "P", "T", "A", "Y", "X", "H", "Q", "N", "K", "D", "E", "C", "W", "R", "G"]
     NUCL_LETTERS = ["A", "C", "T", "G", "N"]
 
-    def __init__(self, config):
-        super().__init__(config)
+    def __init__(self):
+        super().__init__()
         if not self.check_protein_file(self._input_file):
             raise SystemExit('Please provide a protein file as input')
 


=====================================
src/busco/AutoLineage.py
=====================================
@@ -71,7 +71,7 @@ class AutoSelectLineage:
         root_runners = self.run_lineages_list(self.all_lineages)
         self.get_best_match_lineage(root_runners)
         self.config.set("busco_run", "domain_run_name", os.path.basename(self.best_match_lineage_dataset))
-        BuscoRunner.final_results.append(self.selected_runner.analysis.hmmer_results_lines)
+        BuscoRunner.final_results.append(self.selected_runner.analysis.hmmer_runner.hmmer_results_lines)
         BuscoRunner.results_datasets.append(os.path.basename(self.best_match_lineage_dataset))
         return
 
@@ -80,9 +80,6 @@ class AutoSelectLineage:
         for l in lineages_list:
             self.current_lineage = "{}_{}".format(l, self.dataset_version)
             autoconfig = BuscoConfigAuto(self.config, self.current_lineage)
-            # The following line creates a direct reference, so whenever one analysis run adds a tool to this list it
-            # is automatically updated here too.
-            autoconfig.persistent_tools = self.config.persistent_tools
             busco_run = BuscoRunner(autoconfig)
             busco_run.run_analysis(callback=self.callback)
             root_runners.append(busco_run)  # Save all root runs so they can be recalled if chosen
@@ -136,7 +133,7 @@ class AutoSelectLineage:
 
     def cleanup_disused_runs(self, disused_runners):
         for runner in disused_runners:
-            runner.analysis._cleanup()
+            runner.analysis.cleanup()
 
 
     def get_lineage_dataset(self):  # todo: rethink structure after BuscoPlacer is finalized and protein mode with mollicutes is fixed.
@@ -159,14 +156,16 @@ class AutoSelectLineage:
             else:
                 logger.info("Mollicutes dataset is a better match for your data. Testing subclades...")
                 self._run_3_datasets(self.selected_runner)
-                BuscoRunner.final_results.append(self.selected_runner.analysis.hmmer_results_lines)
+                BuscoRunner.final_results.append(self.selected_runner.analysis.hmmer_runner.hmmer_results_lines)
                 BuscoRunner.results_datasets.append(os.path.basename(self.best_match_lineage_dataset))
-        elif ("geno" in self.selected_runner.mode and self.selected_runner.analysis.code_4_selected and
-              os.path.basename(self.selected_runner.config.get("busco_run", "lineage_dataset")).startswith("bacteria")):
+        elif ("geno" in self.selected_runner.mode
+              and self.selected_runner.analysis.prodigal_runner.current_gc == "4"
+              and os.path.basename(
+                    self.selected_runner.config.get("busco_run", "lineage_dataset")).startswith("bacteria")):
             logger.info("The results from the Prodigal gene predictor indicate that your data belongs to the "
                         "mollicutes clade. Testing subclades...")
             self._run_3_datasets()
-            BuscoRunner.final_results.append(self.selected_runner.analysis.hmmer_results_lines)
+            BuscoRunner.final_results.append(self.selected_runner.analysis.hmmer_runner.hmmer_results_lines)
             BuscoRunner.results_datasets.append(os.path.basename(self.best_match_lineage_dataset))
         else:
             self.run_busco_placer()
@@ -181,12 +180,12 @@ class AutoSelectLineage:
         self.f_percents = []
         runners = self.run_lineages_list(["mollicutes"])
         runners.append(self.selected_runner)
-        self.s_buscos.append(self.selected_runner.analysis.single_copy)
-        self.d_buscos.append(self.selected_runner.analysis.multi_copy)
-        self.f_buscos.append(self.selected_runner.analysis.only_fragments)
-        self.s_percents.append(self.selected_runner.analysis.s_percent)
-        self.d_percents.append(self.selected_runner.analysis.d_percent)
-        self.f_percents.append(self.selected_runner.analysis.f_percent)
+        self.s_buscos.append(self.selected_runner.analysis.hmmer_runner.single_copy)
+        self.d_buscos.append(self.selected_runner.analysis.hmmer_runner.multi_copy)
+        self.f_buscos.append(self.selected_runner.analysis.hmmer_runner.only_fragments)
+        self.s_percents.append(self.selected_runner.analysis.hmmer_runner.s_percent)
+        self.d_percents.append(self.selected_runner.analysis.hmmer_runner.d_percent)
+        self.f_percents.append(self.selected_runner.analysis.hmmer_runner.f_percent)
         self.get_best_match_lineage(runners)
         return
 
@@ -215,12 +214,12 @@ class AutoSelectLineage:
     def _run_3_datasets(self, mollicutes_runner=None):
         if mollicutes_runner:
             datasets = ["mycoplasmatales", "entomoplasmatales"]
-            self.s_buscos = [mollicutes_runner.analysis.single_copy]
-            self.d_buscos = [mollicutes_runner.analysis.multi_copy]
-            self.f_buscos = [mollicutes_runner.analysis.only_fragments]
-            self.s_percents = [mollicutes_runner.analysis.s_percent]
-            self.d_percents = [mollicutes_runner.analysis.d_percent]
-            self.f_percents = [mollicutes_runner.analysis.f_percent]
+            self.s_buscos = [mollicutes_runner.analysis.hmmer_runner.single_copy]
+            self.d_buscos = [mollicutes_runner.analysis.hmmer_runner.multi_copy]
+            self.f_buscos = [mollicutes_runner.analysis.hmmer_runner.only_fragments]
+            self.s_percents = [mollicutes_runner.analysis.hmmer_runner.s_percent]
+            self.d_percents = [mollicutes_runner.analysis.hmmer_runner.d_percent]
+            self.f_percents = [mollicutes_runner.analysis.hmmer_runner.f_percent]
             dataset_runners = [mollicutes_runner]
         else:
             datasets = ["mollicutes", "mycoplasmatales", "entomoplasmatales"]


=====================================
src/busco/BuscoAnalysis.py
=====================================
@@ -4,24 +4,18 @@
 .. module:: BuscoAnalysis
    :synopsis: BuscoAnalysis implements general BUSCO analysis specifics
 .. versionadded:: 3.0.0
-.. versionchanged:: 3.0.1
+.. versionchanged:: 4.0.7
 
 Copyright (c) 2016-2020, Evgeny Zdobnov (ez at ezlab.org)
 Licensed under the MIT license. See LICENSE.md file.
-
 """
 
 from abc import ABCMeta, abstractmethod
-import busco
-from busco.BuscoConfig import BuscoConfig, BuscoConfigMain, BuscoConfigAuto
+from busco.BuscoConfig import BuscoConfig, BuscoConfigAuto
 from busco.BuscoTools import HMMERRunner
-import inspect
 import os
 from busco.BuscoLogger import BuscoLogger
 from busco.BuscoLogger import LogDecorator as log
-from busco.Toolset import Tool
-import subprocess
-from Bio import SeqIO
 
 logger = BuscoLogger.get_logger(__name__)
 
@@ -31,115 +25,93 @@ class BuscoAnalysis(metaclass=ABCMeta):
     This abstract base class (ABC) defines methods required for most of BUSCO analyses and has to be extended
     by each specific analysis class
     """
+    config = None
 
-
-    def __init__(self, config):
+    def __init__(self):
         """
         1) load parameters
         2) load and validate tools
         3) check data and dataset integrity
         4) Ready for analysis
-
-        :param config: Values of all parameters to be used during the analysis
-        :type config: BuscoConfig
         """
-        self._config = config
+        super().__init__()
 
         # Get paths
-        self.lineage_results_dir = self._config.get("busco_run", "lineage_results_dir")
-        self.main_out = self._config.get("busco_run", "main_out")  # todo: decide which are hidden attributes
-        self.working_dir = (os.path.join(self.main_out, "auto_lineage")
-                            if isinstance(self._config, BuscoConfigAuto)
-                            else self.main_out)
-        self.run_folder = os.path.join(self.working_dir, self.lineage_results_dir)
-        self.log_folder = os.path.join(self.main_out, "logs")
-        self._input_file = self._config.get("busco_run", "in")
-        self._lineage_dataset = self._config.get("busco_run", "lineage_dataset")
-        self._lineage_name = os.path.basename(self._lineage_dataset)
-        self._datasets_version = self._config.get("busco_run", "datasets_version")
-        super().__init__()
+        self._lineage_results_dir = self.config.get("busco_run", "lineage_results_dir")
+        self.main_out = self.config.get("busco_run", "main_out")
+        self._working_dir = (os.path.join(self.main_out, "auto_lineage")
+                             if isinstance(self.config, BuscoConfigAuto)
+                             else self.main_out)
+        self._run_folder = os.path.join(self._working_dir, self._lineage_results_dir)
+        self._log_folder = os.path.join(self.main_out, "logs")
 
         # Get other useful variables
-        self._cpus = self._config.getint("busco_run", "cpu")
-        self._domain = self._config.get("busco_run", "domain")
+        self._input_file = self.config.get("busco_run", "in")
+        self._lineage_dataset = self.config.get("busco_run", "lineage_dataset")
+        self._lineage_name = os.path.basename(self._lineage_dataset)
+        self._domain = self.config.get("busco_run", "domain")
         self._has_variants_file = os.path.exists(os.path.join(self._lineage_dataset, "ancestral_variants"))
-        self._dataset_creation_date = self._config.get("busco_run", "creation_date")
-        self._dataset_nb_species = self._config.get("busco_run", "number_of_species")
-        self._dataset_nb_buscos = self._config.get("busco_run", "number_of_BUSCOs")
+        self._dataset_creation_date = self.config.get("busco_run", "creation_date")
+        self.restart = self.config.getboolean("busco_run", "restart")
+
+        self.gene_details = None  # Dictionary containing coordinate information for predicted genes.
 
-        # Get Busco downloader
-        self.downloader = self._config.downloader
+        self._lineages_download_path = os.path.join(self.config.get("busco_run", "download_path"), "lineages")
+
+        self.hmmer_runner = None
 
         # Create optimized command line call for the given input
-        self.busco_type = "main" if isinstance(self._config, BuscoConfigMain) else "auto"
+        # self.busco_type = "main" if isinstance(self._config, BuscoConfigMain) else "auto"
         # if self.busco_type == "main":
         #     self.set_rerun_busco_command(self._config.clargs)  # todo: rework rerun command
 
-        # Variables storing BUSCO results
-        self._missing_busco_list = []
-        self._fragmented_busco_list = []
-        self._gene_details = None  # Dictionary containing coordinate information for predicted genes.
-        self.s_percent = None
-        self.d_percent = None
-        self.f_percent = None
-        self.all_single_copy_buscos = {}
-        self._log_count = 0  # Dummy variable used to skip logging for intermediate eukaryote pipeline results.
-
-    # TODO: catch unicode encoding exception and report invalid character line instead of doing content validation
-    # todo: check config file exists before parsing
-
     @abstractmethod
-    def _cleanup(self):
+    def cleanup(self):
         # Delete any non-decompressed files in busco_downloads
         try:
-            for dataset_name in os.listdir(os.path.join(self._config.get("busco_run", "download_path"), "lineages")):
+            for dataset_name in os.listdir(self._lineages_download_path):
                 if dataset_name.endswith((".gz", ".tar")):
                     os.remove(dataset_name)
         except OSError:
             pass
 
-    def _check_data_integrity(self):
-        self._check_dataset_integrity()
-        if not os.stat(self._input_file).st_size > 0:
-            raise SystemExit("Input file is empty.")
-        with open(self._input_file) as f:
-            for line in f:
-                if line.startswith(">"):
-                    self._check_fasta_header(line)
-        return
-
-    def get_checkpoint(self): # TODO: rework checkpoint system
-        """
-        This function return the checkpoint if the checkpoint.tmp file exits or None if absent
-        :return: the checkpoint name
-        :rtype: int
-        """
-        checkpt_name = None
-        checkpoint_file = os.path.join(self.run_folder, "checkpoint.tmp")
-        if os.path.exists(checkpoint_file):
-            with open(checkpoint_file, "r") as check_file:
-                line = check_file.readline()
-                self._random = line.split(".")[-1] # Reset random suffix
-            checkpt_name = int(line.split(".")[0])
-        return checkpt_name
-
     @abstractmethod
-    @log("Running BUSCO using lineage dataset {0} ({1}, {2})", logger, attr_name=["_lineage_name", "_domain", "_dataset_creation_date"], on_func_exit=True)
+    @log("Running BUSCO using lineage dataset {0} ({1}, {2})", logger,
+         attr_name=["_lineage_name", "_domain", "_dataset_creation_date"], on_func_exit=True)
     def run_analysis(self):
         """
         Abstract method, override to call all needed steps for running the child analysis.
         """
-        self.create_dirs()
+        self._create_dirs()
         self.init_tools()
-        self.check_tool_dependencies()
         self._check_data_integrity()
 
+    @log("***** Run HMMER on gene sequences *****", logger)
+    def run_hmmer(self, input_sequences):
+        """
+        This function runs hmmsearch.
+        """
+        files = sorted(os.listdir(os.path.join(self._lineage_dataset, "hmms")))
+        busco_ids = [os.path.splitext(f)[0] for f in files]  # Each Busco ID has a HMM file of the form "<busco_id>.hmm"
+        self.hmmer_runner.configure_runner(input_sequences, busco_ids, self._mode, self.gene_details)
+        if self.restart and self.hmmer_runner.check_previous_completed_run():
+            logger.info("Skipping HMMER run as output already processed")
+        else:
+            self.restart = False
+            self.config.set("busco_run", "restart", str(self.restart))
+            self.hmmer_runner.run()
+        self.hmmer_runner.process_output()
+        self.hmmer_runner.write_hmmer_results()
+        self.hmmer_runner.produce_hmmer_summary()
+        return
+
     @log("Checking dataset for HMM profiles", logger, debug=True)
     def _check_dataset_integrity(self):
         """
         Check the input dataset for hmm profiles, both files and folder are available
-        Note: score and length cutoffs are checked when read,
-        See _load_scores and _load_lengths
+        Note: score and length cutoffs are checked when read by hmmer_runner: see _load_scores and _load_lengths
+        Note: dataset.cfg file is not mandatory for offline mode
+        # todo: implement a check for dataset.cfg file if not using offline mode
 
         :raises SystemExit: if the dataset is missing files or folders
         """
@@ -157,18 +129,24 @@ class BuscoAnalysis(metaclass=ABCMeta):
                     raise SystemExit("The dataset you provided lacks elements in {}".format(
                         os.path.join(self._lineage_dataset, "prfl")))
 
-        # note: score and length cutoffs are checked when read,
-        # see _load_scores and _load_lengths
-        # ancestral would cause blast to fail, and be detected, see _blast() # TODO: clarify comment
-        # dataset.cfg is not mandatory
-
         if not self._has_variants_file:
             logger.warning("The dataset you provided does not contain the file ancestral_variants, likely because it "
                            "is an old version. All blast steps will use the file \"ancestral\" instead")
 
         return
 
-    def _check_fasta_header(self, header):
+    def _check_data_integrity(self):
+        self._check_dataset_integrity()
+        if not os.stat(self._input_file).st_size > 0:
+            raise SystemExit("Input file is empty.")
+        with open(self._input_file) as f:
+            for line in f:
+                if line.startswith(">"):
+                    self._check_fasta_header(line)
+        return
+
+    @staticmethod
+    def _check_fasta_header(header):
         """
         This function checks problematic characters in fasta headers,
         and warns the user and stops the execution
@@ -183,7 +161,6 @@ class BuscoAnalysis(metaclass=ABCMeta):
                     "which will crash BUSCO. Please clean the header of your "
                     "input file." % (char, header.strip()))
 
-
         for char in BuscoConfig.FORBIDDEN_HEADER_CHARS_BEFORE_SPLIT:
             if char in header.split()[0]:
                 raise SystemExit(
@@ -197,19 +174,7 @@ class BuscoAnalysis(metaclass=ABCMeta):
                 "\">\" which will crash Reader. Please clean the header of "
                 "your input file." % (header.strip()))
 
-    def check_tool_dependencies(self):
-        """
-        check dependencies on tools
-        :raises SystemExit: if a Tool version is not supported
-        """
-        # check hmm version
-        if not self._get_hmmer_version() >= BuscoConfig.HMMER_VERSION:
-            raise SystemExit(
-                "HMMer version detected is not supported, please use HMMer v.{} +".format(BuscoConfig.HMMER_VERSION))
-        return
-
-    @abstractmethod
-    def create_dirs(self):
+    def _create_dirs(self):
         """
         Create the run (main) directory, log directory and the temporary directories
         :return:
@@ -223,8 +188,8 @@ class BuscoAnalysis(metaclass=ABCMeta):
         Create a subfolder of the main output folder that contains all log files from BUSCO and the external tools used.
         :return:
         """
-        if not os.path.exists(self.log_folder):
-            os.mkdir(self.log_folder)
+        if not os.path.exists(self._log_folder):
+            os.mkdir(self._log_folder)
         return
 
     def _create_main_dir(self):
@@ -233,76 +198,23 @@ class BuscoAnalysis(metaclass=ABCMeta):
         :raises SystemExit: if write permissions are not available to the specified location
         """
         try:
-            os.makedirs(self.run_folder)
+            os.makedirs(self._run_folder)
         except FileExistsError:
-            raise SystemExit("Something went wrong. BUSCO stopped before overwriting run folder {}".format(self.run_folder))
+            if not self.restart:
+                raise SystemExit("Something went wrong. BUSCO stopped before overwriting run folder "
+                                 "{}".format(self._run_folder))
         except PermissionError:
             raise SystemExit(
                 "Cannot write to the output directory, please make sure "
-                "you have write permissions to {}".format(self.run_folder))
+                "you have write permissions to {}".format(self._run_folder))
         return
 
-    # @log("Temp directory is {}", logger, attr_name="_tmp", on_func_exit=True)
-    # def _create_tmp_dir(self):
-    #     """
-    #     This function creates the tmp directory
-    #     :raises
-    #     SystemExit: if the user cannot write in the tmp directory
-    #     """
-    #     try:
-    #         if not os.path.exists(self._tmp):
-    #             os.makedirs(self._tmp)
-    #
-    #     except OSError:
-    #         raise SystemExit(
-    #             "Cannot write to the temp directory, please make sure "
-    #             "you have write permissions to {}".format(self._tmp))
-    #     return
-
-
-
-    def _get_busco_percentages(self):
-        self.single_copy = len(self.hmmer_runner.single_copy_buscos)  # int
-        self.multi_copy = len(self.hmmer_runner.multi_copy_buscos)  # int
-        self.only_fragments = len(self.hmmer_runner.fragmented_buscos)  # int
-        self.total_buscos = len(self.hmmer_runner.cutoff_dict)
-
-        # Get percentage of each kind of BUSCO match
-        self.s_percent = abs(round((self.single_copy / self.total_buscos) * 100, 1))
-        self.d_percent = abs(round((self.multi_copy / self.total_buscos) * 100, 1))
-        self.f_percent = abs(round((self.only_fragments / self.total_buscos) * 100, 1))
-
-        return self.single_copy, self.multi_copy, self.only_fragments, self.total_buscos
-
-    def _get_hmmer_version(self):
-        """
-        check the Tool has the correct version
-        :raises SystemExit: if the version is not correct
-        """
-        hmmer_version = subprocess.check_output([self._hmmer_tool.cmd, "-h"], shell=False)
-        hmmer_version = hmmer_version.decode("utf-8")
-        try:
-            hmmer_version = hmmer_version.split("\n")[1].split()[2]
-            hmmer_version = float(hmmer_version[:3])
-        except ValueError:
-            # to avoid a crash with a super old version
-            hmmer_version = hmmer_version.split("\n")[1].split()[1]
-            hmmer_version = float(hmmer_version[:3])
-        finally:
-            return hmmer_version
-
     @log("Check all required tools are accessible...", logger, debug=True)
     def init_tools(self):
         """
         Init the tools needed for the analysis. HMMER is needed for all BUSCO analysis types.
         """
-        try:
-            assert(isinstance(self._hmmer_tool, Tool))
-        except AttributeError:
-            self._hmmer_tool = Tool("hmmsearch", self._config)
-        except AssertionError:
-            raise SystemExit("HMMer should be a tool")
-
+        self.hmmer_runner = HMMERRunner()
         return
 
     @property
@@ -310,242 +222,59 @@ class BuscoAnalysis(metaclass=ABCMeta):
     def _mode(self):
         pass
 
-    # @log("This is not an incomplete run that can be restarted", logger, iswarn=True)
-    # # Todo: decide if mini functions are necessary to facilitate decorator logging
-    # def _not_incomplete_run(self):
-    #     self._restart = False
-
-    def _produce_hmmer_summary(self):
-        single_copy, multi_copy, only_fragments, total_buscos = self._get_busco_percentages()
-
-        self.hmmer_results_lines = []
-        self.hmmer_results_lines.append("***** Results: *****\n\n")
-        self.one_line_summary = "C:{}%[S:{}%,D:{}%],F:{}%,M:{}%,n:{}\t{}\n".format(
-            round(self.s_percent + self.d_percent, 1), self.s_percent, self.d_percent,
-            self.f_percent, abs(round(100 - self.s_percent - self.d_percent - self.f_percent, 1)), total_buscos, "   ")
-        self.hmmer_results_lines.append(self.one_line_summary)
-        self.hmmer_results_lines.append("{}\tComplete BUSCOs (C)\t\t\t{}\n".format(single_copy + multi_copy, "   "))
-        self.hmmer_results_lines.append("{}\tComplete and single-copy BUSCOs (S)\t{}\n".format(single_copy, "   "))
-        self.hmmer_results_lines.append("{}\tComplete and duplicated BUSCOs (D)\t{}\n".format(multi_copy, "   "))
-        self.hmmer_results_lines.append("{}\tFragmented BUSCOs (F)\t\t\t{}\n".format(only_fragments, "   "))
-        self.hmmer_results_lines.append("{}\tMissing BUSCOs (M)\t\t\t{}\n".format(
-            total_buscos - single_copy - multi_copy - only_fragments, "   "))
-        self.hmmer_results_lines.append("{}\tTotal BUSCO groups searched\t\t{}\n".format(total_buscos, "   "))
-
-        with open(os.path.join(self.run_folder, "short_summary.txt"), "w") as summary_file:
-
-            self._write_output_header(summary_file, no_table_header=True)
-            summary_file.write("# Summarized benchmarking in BUSCO notation for file {}\n"
-                               "# BUSCO was run in mode: {}\n\n".format(self._input_file, self._mode))
-
-            for line in self.hmmer_results_lines:
-                summary_file.write("\t{}".format(line))
-
-            if self._config.getboolean("busco_run", "auto-lineage") and isinstance(self._config, BuscoConfigMain) \
-                    and hasattr(self._config, "placement_files"):
-                summary_file.write("\nPlacement file versions:\n")
-                for placement_file in self._config.placement_files:
-                    summary_file.write("{}\n".format(placement_file))
-
-
-        if isinstance(self._config, BuscoConfigAuto):  # todo: rework this if/else block
-            self._one_line_hmmer_summary()
-        elif self._domain == "eukaryota" and self._log_count == 0:
-            self._log_count += 1
-            self._produce_full_hmmer_summary_debug()
-        else:
-            self._one_line_hmmer_summary()
-        return
-
-    @log("{}", logger, attr_name="hmmer_results_lines", apply="join", on_func_exit=True)
-    def _produce_full_hmmer_summary(self):
-        return
-
-    @log("{}", logger, attr_name="hmmer_results_lines", apply="join", on_func_exit=True, debug=True)
-    def _produce_full_hmmer_summary_debug(self):
-        return
-
-    @log("{}", logger, attr_name="one_line_summary", on_func_exit=True)
-    def _one_line_hmmer_summary(self):
-        self.one_line_summary = "Results:\t{}".format(self.one_line_summary)
-        return
-
-    @log("***** Run HMMER on gene sequences *****", logger)
-    def run_hmmer(self, input_sequences):
-        """
-        This function runs hmmsearch.
-        """
-        self._hmmer_tool.total = 0
-        self._hmmer_tool.nb_done = 0
-        hmmer_output_dir = os.path.join(self.run_folder, "hmmer_output")
-        if not os.path.exists(hmmer_output_dir):
-            os.makedirs(hmmer_output_dir)
-
-        files = sorted(os.listdir(os.path.join(self._lineage_dataset, "hmms")))
-        busco_ids = [os.path.splitext(f)[0] for f in files]  # Each Busco ID has a HMM file of the form "<busco_id>.hmm"
-
-        self.hmmer_runner = HMMERRunner(self._hmmer_tool, input_sequences, busco_ids, hmmer_output_dir,
-                            self._lineage_dataset, self._mode, self._cpus, self._gene_details, self._datasets_version)
-        self.hmmer_runner.load_buscos()
-        self.hmmer_runner.run()
-        self.hmmer_runner.process_output()
-        # self.all_single_copy_buscos.update(self.hmmer_runner.single_copy_buscos)
-        self._write_hmmer_results()
-        self._produce_hmmer_summary()
-        return
-
-    def _write_buscos_to_file(self, sequences_aa, sequences_nt=None):
-        """
-        Write BUSCO matching sequences to output fasta files. Each sequence is printed in a separate file and both
-        nucleotide and amino acid versions are created.
-        :param busco_type: one of ["single_copy", "multi_copy", "fragmented"]
-        :return:
-        """
-        for busco_type in ["single_copy", "multi_copy", "fragmented"]:
-            if busco_type == "single_copy":
-                output_dir = os.path.join(self.run_folder, "busco_sequences", "single_copy_busco_sequences")
-                busco_matches = self.hmmer_runner.single_copy_buscos
-            elif busco_type == "multi_copy":
-                output_dir = os.path.join(self.run_folder, "busco_sequences", "multi_copy_busco_sequences")
-                busco_matches = self.hmmer_runner.multi_copy_buscos
-            elif busco_type == "fragmented":
-                output_dir = os.path.join(self.run_folder, "busco_sequences", "fragmented_busco_sequences")
-                busco_matches = self.hmmer_runner.fragmented_buscos
-
-            if not os.path.exists(output_dir):  # todo: move all create_dir commands to one place
-                os.makedirs(output_dir)
-
-            for busco, gene_matches in busco_matches.items():
-                try:
-                    aa_seqs, nt_seqs = zip(*[(sequences_aa[gene_id], sequences_nt[gene_id]) for gene_id in gene_matches])
-                    with open(os.path.join(output_dir, "{}.fna".format(busco)), "w") as f2:
-                        SeqIO.write(nt_seqs, f2, "fasta")
-                except TypeError:
-                    aa_seqs = [sequences_aa[gene_id] for gene_id in gene_matches]
-                with open(os.path.join(output_dir, "{}.faa".format(busco)), "w") as f1:
-                    SeqIO.write(aa_seqs, f1, "fasta")
-
-        return
-
     # def _run_tarzip_hmmer_output(self):  # todo: rewrite using tarfile
     #     """
     #     This function tarzips "hmmer_output" results folder
     #     """
     #     self._p_open(["tar", "-C", "%s" % self.run_folder, "-zcf", "%shmmer_output.tar.gz" % self.run_folder,
     #                   "hmmer_output", "--remove-files"], "bash", shell=False)
+    #
+    # @log("To reproduce this run: {}", logger, attr_name="_rerun_cmd", on_func_exit=True)
+    # def set_rerun_busco_command(self, clargs):  # todo: reconfigure
+    #     """
+    #     This function sets the command line to call to reproduce this run
+    #     """
+    #
+    #     # Find python script path
+    #     entry_point = ""
+    #     frame_ind = -1
+    #     while "run_BUSCO.py" not in entry_point:
+    #         entry_point = inspect.stack()[frame_ind].filename
+    #         frame_ind -= 1
+    #
+    #     # Add required parameters and other options
+    #     self._rerun_cmd = "python %s -i %s -o %s -l %s -m %s -c %s" % (entry_point, self._input_file, os.path.basename(self.main_out),
+    #                                                                    self._lineage_dataset, self._mode, self._cpus)
+    #
+    #     try:
+    #         if self._long:
+    #             self._rerun_cmd += " --long"
+    #         if self._region_limit != BuscoConfig.DEFAULT_ARGS_VALUES["limit"]:
+    #             self._rerun_cmd += " --limit %s" % self._region_limit
+    #         # if self._tmp != BuscoConfig.DEFAULT_ARGS_VALUES["tmp_path"]:
+    #         #     self._rerun_cmd += " -t %s" % self._tmp
+    #         if self._ev_cutoff != BuscoConfig.DEFAULT_ARGS_VALUES["evalue"]:
+    #             self._rerun_cmd += " -e %s" % self._ev_cutoff
+    #         # if self._tarzip:
+    #         #     self._rerun_cmd += " -z"
+    #     except AttributeError:
+    #         pass
+    #
+    #     # Include any command line arguments issued by the user
+    #     # arg_aliases = {"-i": "--in", "-o": "--out", "-l": "--lineage_dataset", "-m": "--mode", "-c": "--cpu",
+    #     #                "-e": "--evalue", "-f": "--force", "-sp": "--species", "-z": "--tarzip",
+    #     #                "-r": "--restart", "-q": "--quiet", "-v": "--version", "-h": "--help"}
+    #     arg_aliases.update(dict(zip(arg_aliases.values(), arg_aliases.keys())))
+    #     for a, arg in enumerate(clargs):
+    #         if arg.startswith("-") and not arg in self._rerun_cmd:
+    #             if arg in arg_aliases:
+    #                 if arg_aliases[arg] in self._rerun_cmd:
+    #                     continue
+    #             if a + 1 < len(clargs) and not clargs[a + 1].startswith("-"):
+    #                 self._rerun_cmd += " %s %s" % (arg, clargs[a + 1])
+    #             else:
+    #                 self._rerun_cmd += " %s" % arg
+    #     return
 
-
-
-    @log("To reproduce this run: {}", logger, attr_name="_rerun_cmd", on_func_exit=True)
-    def set_rerun_busco_command(self, clargs):  # todo: reconfigure
-        """
-        This function sets the command line to call to reproduce this run
-        """
-
-        # Find python script path
-        entry_point = ""
-        frame_ind = -1
-        while "run_BUSCO.py" not in entry_point:
-            entry_point = inspect.stack()[frame_ind].filename
-            frame_ind -= 1
-
-        # Add required parameters and other options
-        self._rerun_cmd = "python %s -i %s -o %s -l %s -m %s -c %s" % (entry_point, self._input_file, os.path.basename(self.main_out),
-                                                                       self._lineage_dataset, self._mode, self._cpus)
-
-        try:
-            if self._long:
-                self._rerun_cmd += " --long"
-            if self._region_limit != BuscoConfig.DEFAULT_ARGS_VALUES["limit"]:
-                self._rerun_cmd += " --limit %s" % self._region_limit
-            # if self._tmp != BuscoConfig.DEFAULT_ARGS_VALUES["tmp_path"]:
-            #     self._rerun_cmd += " -t %s" % self._tmp
-            if self._ev_cutoff != BuscoConfig.DEFAULT_ARGS_VALUES["evalue"]:
-                self._rerun_cmd += " -e %s" % self._ev_cutoff
-            # if self._tarzip:
-            #     self._rerun_cmd += " -z"
-        except AttributeError:
-            pass
-
-        # Include any command line arguments issued by the user
-        # arg_aliases = {"-i": "--in", "-o": "--out", "-l": "--lineage_dataset", "-m": "--mode", "-c": "--cpu",
-        #                "-e": "--evalue", "-f": "--force", "-sp": "--species", "-z": "--tarzip",
-        #                "-r": "--restart", "-q": "--quiet", "-v": "--version", "-h": "--help"}
-        arg_aliases.update(dict(zip(arg_aliases.values(), arg_aliases.keys())))
-        for a, arg in enumerate(clargs):
-            if arg.startswith("-") and not arg in self._rerun_cmd:
-                if arg in arg_aliases:
-                    if arg_aliases[arg] in self._rerun_cmd:
-                        continue
-                if a + 1 < len(clargs) and not clargs[a + 1].startswith("-"):
-                    self._rerun_cmd += " %s %s" % (arg, clargs[a + 1])
-                else:
-                    self._rerun_cmd += " %s" % arg
-        return
-
-    def _write_hmmer_results(self):
-        """
-        Create two output files: one with information on all BUSCOs for the given dataset and the other with a list of
-        all BUSCOs that were not found.
-        :return:
-        """
-
-        with open(os.path.join(self.run_folder, "full_table.tsv"), "w") as f_out:
-
-            output_lines = self.hmmer_runner._create_output_content()
-            self._write_output_header(f_out)
-
-            with open(os.path.join(self.run_folder, "missing_busco_list.tsv"), "w") as miss_out:
-
-                self._write_output_header(miss_out, missing_list=True)
-
-                missing_buscos_lines, missing_buscos = self.hmmer_runner._list_missing_buscos()
-                output_lines += missing_buscos_lines
-
-                for missing_busco in sorted(missing_buscos):
-                    miss_out.write("{}\n".format(missing_busco))
-
-            sorted_output_lines = self._sort_lines(output_lines)
-            for busco in sorted_output_lines:
-                f_out.write(busco)
-        return
-
-    @staticmethod
-    def _sort_lines(lines):
-        sorted_lines = sorted(lines, key=lambda x: int(x.split("\t")[0].split("at")[0]))
-        return sorted_lines
-
-
-
-
-    def _write_output_header(self, file_object, missing_list=False, no_table_header=False):
-        """
-        Write a standardized file header containing information on the BUSCO run.
-        :param file_object: Opened file object ready for writing
-        :type file_object: file
-        :return:
-        """
-        file_object.write("# BUSCO version is: {} \n"
-                          "# The lineage dataset is: {} (Creation date: {}, number of species: {}, number of BUSCOs: {}"
-                          ")\n".format(busco.__version__, self._lineage_name, self._dataset_creation_date,
-                                     self._dataset_nb_species, self._dataset_nb_buscos))
-        # if isinstance(self._config, BuscoConfigMain):  # todo: wait until rerun command properly implemented again
-        #     file_object.write("# To reproduce this run: {}\n#\n".format(self._rerun_cmd))
-
-        if no_table_header:
-            pass
-        elif missing_list:
-            file_object.write("# Busco id\n")
-        elif self._mode == "proteins" or self._mode == "transcriptome":
-            if self.hmmer_runner.extra_columns:
-                file_object.write("# Busco id\tStatus\tSequence\tScore\tLength\tOrthoDB url\tDescription\n")
-            else:
-                file_object.write("# Busco id\tStatus\tSequence\tScore\tLength\n")
-        elif self._mode == "genome":
-            if self.hmmer_runner.extra_columns:
-                file_object.write("# Busco id\tStatus\tSequence\tGene Start\tGene End\tScore\tLength\tOrthoDB url\tDescription\n")
-            else:
-                file_object.write("# Busco id\tStatus\tSequence\tGene Start\tGene End\tScore\tLength\n")
-
-        return
-
+    # TODO: catch unicode encoding exception and report invalid character line instead of doing content validation
+    # todo: check config file exists before parsing


=====================================
src/busco/BuscoConfig.py
=====================================
@@ -15,7 +15,7 @@ logger = BuscoLogger.get_logger(__name__)
 
 class BaseConfig(ConfigParser):
 
-    DEFAULT_ARGS_VALUES = {"out_path": os.getcwd(), "cpu": 1, "force": False, "evalue": 1e-3,
+    DEFAULT_ARGS_VALUES = {"out_path": os.getcwd(), "cpu": 1, "force": False, "restart": False, "evalue": 1e-3,
                            "limit": 3, "long": False, "quiet": False,
                            "download_path": os.path.join(os.getcwd(), "busco_downloads"), "datasets_version": "odb10",
                            "offline": False, "download_base_url": "https://busco-data.ezlab.org/v4/data/",
@@ -49,6 +49,10 @@ class BaseConfig(ConfigParser):
         self.downloader = BuscoDownloadManager(self)
         return
 
+    # @log("Setting value in config")
+    # def set(self, *args, **kwargs):
+    #     super().set(*args, **kwargs)
+
 
 class PseudoConfig(BaseConfig):
 
@@ -200,9 +204,10 @@ class BuscoConfigMain(BuscoConfig, BaseConfig):
     MANDATORY_USER_PROVIDED_PARAMS = ["in", "out", "mode"]
 
     CONFIG_STRUCTURE = {"busco_run": ["in", "out", "out_path", "mode", "auto-lineage", "auto-lineage-prok",
-                                  "auto-lineage-euk", "cpu", "force", "download_path", "datasets_version", "evalue",
-                                  "limit", "long", "quiet", "offline", "download_base_url", "lineage_dataset",
-                                  "update-data", "augustus_parameters", "augustus_species", "main_out"],
+                                      "auto-lineage-euk", "cpu", "force", "restart", "download_path",
+                                      "datasets_version", "evalue", "limit", "long", "quiet", "offline",
+                                      "download_base_url", "lineage_dataset", "update-data", "augustus_parameters",
+                                      "augustus_species", "main_out"],
                         "tblastn": ["path", "command"],
                         "makeblastdb": ["path", "command"],
                         "prodigal": ["path", "command"],
@@ -241,7 +246,6 @@ class BuscoConfigMain(BuscoConfig, BaseConfig):
         self._check_required_input_exists()
 
         self._init_downloader()
-        self.persistent_tools = []
 
         self.log_config()
 
@@ -259,7 +263,7 @@ class BuscoConfigMain(BuscoConfig, BaseConfig):
             lineage_dataset = self.get("busco_run", "lineage_dataset")
             datasets_version = self.get("busco_run", "datasets_version")
             if "_odb" in lineage_dataset:
-                dataset_version = lineage_dataset.rsplit("_")[-1]
+                dataset_version = lineage_dataset.rsplit("_")[-1].rstrip("/")
                 if datasets_version != dataset_version:
                     logger.warning("There is a conflict in your config. You specified a dataset from {0} while "
                                    "simultaneously requesting the datasets_version parameter be {1}. Proceeding with "
@@ -368,11 +372,15 @@ class BuscoConfigMain(BuscoConfig, BaseConfig):
         if os.path.exists(self.main_out):
             if self.getboolean("busco_run", "force"):
                 self._force_remove_existing_output_dir(self.main_out)
+            elif self.getboolean("busco_run", "restart"):
+                logger.info("Attempting to restart the run using the following directory: {}".format(self.main_out))
             else:
                 raise SystemExit("A run with the name {} already exists...\n"
                                  "\tIf you are sure you wish to overwrite existing files, "
                                  "please use the -f (force) option".format(self.main_out))
-
+        elif self.getboolean("busco_run", "restart"):
+            logger.warning("Restart mode not available as directory {} does not exist.".format(self.main_out))
+            self.set("busco_run", "restart", "False")
 
         return
 


=====================================
src/busco/BuscoDownloadManager.py
=====================================
@@ -50,7 +50,8 @@ class BuscoDownloadManager:
 
     def _create_main_download_dir(self):
         if not os.path.exists(self.local_download_path):
-            os.makedirs(self.local_download_path)
+            # exist_ok=True to allow for multiple parallel BUSCO runs each trying to create this folder simultaneously
+            os.makedirs(self.local_download_path, exist_ok=True)
 
     def _load_versions(self):
         try:
@@ -83,7 +84,8 @@ class BuscoDownloadManager:
         # if the category folder does not exist, create it
         category_folder = os.path.join(self.local_download_path, category)
         if not os.path.exists(category_folder):
-            os.mkdir(category_folder)
+            # exist_ok=True to allow for multiple parallel BUSCO runs, each trying to create this folder
+            os.makedirs(category_folder, exist_ok=True)
         return
 
     @staticmethod
@@ -100,7 +102,10 @@ class BuscoDownloadManager:
         return dataset_date
 
     def _check_existing_version(self, local_filepath, category, data_basename):
-        latest_update = type(self).version_files[data_basename][0]
+        try:
+            latest_update = type(self).version_files[data_basename][0]
+        except KeyError:
+            raise SystemExit("{} is not a valid option for '{}'".format(data_basename, category))
         path_basename, extension = os.path.splitext(data_basename)
 
         if category == "lineages":
@@ -136,19 +141,22 @@ class BuscoDownloadManager:
                 if os.path.exists(local_dataset):
                     return local_dataset
                 else:
-                    raise SystemExit("Unable to run BUSCO in offline mode. Dataset {} does not exist.".format(local_dataset))
+                    raise SystemExit("Unable to run BUSCO in offline mode. Dataset {} does not "
+                                     "exist.".format(local_dataset))
             else:
                 basename, extension = os.path.splitext(data_name)
                 placement_files = sorted(glob.glob(os.path.join(
                     self.local_download_path, category, "{}.*{}".format(basename, extension))))
                 if len(placement_files) > 0:
-                    return placement_files[-1]  # todo: for offline mode, log which files are being used (in case of more than one glob match)
+                    return placement_files[-1]
+                    # todo: for offline mode, log which files are being used (in case of more than one glob match)
                 else:
-                    raise SystemExit("Unable to run BUSCO placer in offline mode. Cannot find necessary placement files in {}".format(self.local_download_path))
+                    raise SystemExit("Unable to run BUSCO placer in offline mode. Cannot find necessary placement "
+                                     "files in {}".format(self.local_download_path))
         data_basename = os.path.basename(data_name)
         local_filepath = os.path.join(self.local_download_path, category, data_basename)
-        present, up_to_date, latest_version, local_filepath, hash = self._check_existing_version(local_filepath, category,
-                                                                                  data_basename)
+        present, up_to_date, latest_version, local_filepath, hash = self._check_existing_version(
+            local_filepath, category, data_basename)
 
         if (not up_to_date and self.update_data) or not present:
             # download
@@ -169,7 +177,8 @@ class BuscoDownloadManager:
 
         return local_filepath
 
-    def _rename_old_version(self, local_filepath):
+    @staticmethod
+    def _rename_old_version(local_filepath):
         if os.path.exists(local_filepath):
             try:
                 os.rename(local_filepath, "{}.old".format(local_filepath))
@@ -179,7 +188,7 @@ class BuscoDownloadManager:
                     timestamp = time.time()
                     os.rename(local_filepath, "{}.old.{}".format(local_filepath, timestamp))
                     logger.info("Renaming {} into {}.old.{}".format(local_filepath, local_filepath, timestamp))
-                except OSError as e:
+                except OSError:
                     pass
         return
 
@@ -189,7 +198,8 @@ class BuscoDownloadManager:
             urllib.request.urlretrieve(remote_filepath, local_filepath)
             observed_hash = type(self)._md5(local_filepath)
             if observed_hash != expected_hash:
-                logger.error("md5 hash is incorrect: {} while {} expected".format(str(observed_hash), str(expected_hash)))
+                logger.error("md5 hash is incorrect: {} while {} expected".format(str(observed_hash),
+                                                                                  str(expected_hash)))
                 logger.info("deleting corrupted file {}".format(local_filepath))
                 os.remove(local_filepath)
                 raise SystemExit("BUSCO was unable to download or update all necessary files")
@@ -208,7 +218,6 @@ class BuscoDownloadManager:
                 hash_md5.update(chunk)
         return hash_md5.hexdigest()
 
-
     @log("Decompressing file {}", logger, func_arg=1)
     def _decompress_file(self, local_filepath):
         unzipped_filename = local_filepath.replace(".gz", "")


=====================================
src/busco/BuscoLogger.py
=====================================
@@ -83,11 +83,12 @@ class LogDecorator:
 
                 try:
                     string_arg = getattr(obj_inst, self.attr_name)
-                    # string_arg = attr
                     if self.apply == 'join' and isinstance(string_arg, list):
+                        string_arg = [str(arg) for arg in string_arg]  # Ensure all parameters are joinable strings
                         string_arg = ' '.join(string_arg)
                     elif self.apply == "basename" and isinstance(string_arg, str):
                         string_arg = os.path.basename(string_arg)
+
                     log_msg = self.msg.format(string_arg)
                 except TypeError:  # if there are multiple attributes specified
                     string_args = (getattr(obj_inst, attr) for attr in self.attr_name)
@@ -247,10 +248,14 @@ class BuscoLogger(logging.getLoggerClass()):
         self._err_hdlr.setFormatter(self._normal_formatter)
         self.addHandler(self._err_hdlr)
 
-        if not os.access(os.getcwd(), os.W_OK):
-            raise SystemExit("No permission to write in the current directory.")
-        # Random id used in filename to avoid complications for parallel BUSCO runs.
-        self._file_hdlr = logging.FileHandler("busco_{}.log".format(type(self).random_id), mode="a")
+        try:
+            # Random id used in filename to avoid complications for parallel BUSCO runs.
+            self._file_hdlr = logging.FileHandler("busco_{}.log".format(type(self).random_id), mode="a")
+        except IOError as e:
+            errStr = "No permission to write in the current directory: {}".format(os.getcwd()) if e.errno == 13 \
+                else "IO error({0}): {1}".format(e.errno, e.strerror)
+            raise SystemExit(errStr)
+
         self._file_hdlr.setLevel(logging.DEBUG)
         self._file_hdlr.setFormatter(self._verbose_formatter)
         self.addHandler(self._file_hdlr)


=====================================
src/busco/BuscoPlacer.py
=====================================
@@ -35,9 +35,13 @@ class BuscoPlacer:
         self._params = config
         self.mode = self._config.get("busco_run", "mode")
         self.cpus = self._config.get("busco_run", "cpu")
+        self.restart = self._config.getboolean("busco_run", "restart")
         self.run_folder = run_folder
         self.placement_folder = os.path.join(run_folder, "placement_files")
-        os.mkdir(self.placement_folder)
+        if self.restart:
+            os.makedirs(self.placement_folder, exist_ok=True)
+        else:
+            os.mkdir(self.placement_folder)
         self.downloader = self._config.downloader
         self.datasets_version = self._config.get("busco_run", "datasets_version")
         self.protein_seqs = protein_seqs
@@ -92,12 +96,14 @@ class BuscoPlacer:
         return dataset, placement_file_versions
 
     def _init_tools(self):
-        try:
-            assert isinstance(self._sepp, Tool)
-        except AttributeError:
-            self._sepp = Tool("sepp", self._config)
-        except AssertionError:
-            raise SystemExit("SEPP should be a tool")
+        setattr(SEPPRunner, "config", self._config)
+        self.sepp_runner = SEPPRunner()
+        # try:
+        #     assert isinstance(self._sepp, Tool)
+        # except AttributeError:
+        #     self._sepp = Tool("sepp", self._config)
+        # except AssertionError:
+        #     raise SystemExit("SEPP should be a tool")
 
     def _pick_dataset(self):
 
@@ -273,10 +279,16 @@ class BuscoPlacer:
 
     @log("Place the markers on the reference tree...", logger)
     def _run_sepp(self):
-        self.sepp_runner = SEPPRunner(self._sepp, self.run_folder, self.placement_folder, self.tree_nwk_file,
-                                      self.tree_metadata_file, self.supermatrix_file, self.downloader,
-                                      self.datasets_version, self.cpus)
-        self.sepp_runner.run()
+        # self.sepp_runner = SEPPRunner(self._sepp, self.run_folder, self.placement_folder, self.tree_nwk_file,
+        #                               self.tree_metadata_file, self.supermatrix_file, self.downloader,
+        #                               self.datasets_version, self.cpus)
+        self.sepp_runner.configure_runner(self.tree_nwk_file, self.tree_metadata_file, self.supermatrix_file, self.downloader)
+        if self.restart and self.sepp_runner.check_previous_completed_run():
+            logger.info("Skipping SEPP run as it has already been completed")
+        else:
+            self.restart = False
+            self._config.set("busco_run", "restart", str(self.restart))
+            self.sepp_runner.run()
 
     def _extract_marker_sequences(self):
         """


=====================================
src/busco/BuscoRunner.py
=====================================
@@ -1,16 +1,18 @@
+from busco.BuscoAnalysis import BuscoAnalysis
 from busco.GenomeAnalysis import GenomeAnalysisEukaryotes
 from busco.TranscriptomeAnalysis import TranscriptomeAnalysis
 from busco.GeneSetAnalysis import GeneSetAnalysis
 from busco.GenomeAnalysis import GenomeAnalysisProkaryotes
 from busco.BuscoLogger import BuscoLogger
 from busco.BuscoConfig import BuscoConfigMain
-from busco.BuscoTools import NoGenesError
+from busco.BuscoTools import NoGenesError, BaseRunner
 from configparser import NoOptionError
 import os
 import shutil
 
 logger = BuscoLogger.get_logger(__name__)
 
+
 class BuscoRunner:
 
     mode_dict = {"euk_genome": GenomeAnalysisEukaryotes, "prok_genome": GenomeAnalysisProkaryotes,
@@ -23,6 +25,8 @@ class BuscoRunner:
     def __init__(self, config):
 
         self.config = config
+        setattr(BaseRunner, "config", config)
+        setattr(BuscoAnalysis, "config", config)
 
         self.mode = self.config.get("busco_run", "mode")
         self.domain = self.config.get("busco_run", "domain")
@@ -33,25 +37,24 @@ class BuscoRunner:
             elif self.domain == "eukaryota":
                 self.mode = "euk_genome"
         analysis_type = type(self).mode_dict[self.mode]
-        self.analysis = analysis_type(self.config)
+        self.analysis = analysis_type()
         self.prok_fail_count = 0  # Needed to check if both bacteria and archaea return no genes.
 
     def run_analysis(self, callback=(lambda *args: None)):
         try:
             self.analysis.run_analysis()
-            s_buscos = self.analysis.single_copy
-            d_buscos = self.analysis.multi_copy
-            f_buscos = self.analysis.only_fragments
-            s_percent = self.analysis.s_percent
-            d_percent = self.analysis.d_percent
-            f_percent = self.analysis.f_percent
-            if isinstance(self.config, BuscoConfigMain):
-                self.analysis._cleanup()
+            s_buscos = self.analysis.hmmer_runner.single_copy
+            d_buscos = self.analysis.hmmer_runner.multi_copy
+            f_buscos = self.analysis.hmmer_runner.only_fragments
+            s_percent = self.analysis.hmmer_runner.s_percent
+            d_percent = self.analysis.hmmer_runner.d_percent
+            f_percent = self.analysis.hmmer_runner.f_percent
+            self.analysis.cleanup()
 
         except NoGenesError as nge:
             no_genes_msg = "{0} did not recognize any genes matching the dataset {1} in the input file. " \
-                           "If this is unexpected, check your input file and your installation of {0}\n".format(
-                nge.gene_predictor, self.analysis._lineage_name)
+                           "If this is unexpected, check your input file and your " \
+                           "installation of {0}\n".format(nge.gene_predictor, self.analysis._lineage_name)
             fatal = (isinstance(self.config, BuscoConfigMain)
                      or (self.config.getboolean("busco_run", "auto-lineage-euk") and self.mode == "euk_genome")
                      or (self.config.getboolean("busco_run", "auto-lineage-prok") and self.mode == "prok_genome")
@@ -62,11 +65,10 @@ class BuscoRunner:
                 logger.warning(no_genes_msg)
                 s_buscos = d_buscos = f_buscos = s_percent = d_percent = f_percent = 0.0
                 if self.mode == "prok_genome":
-                    self.config.persistent_tools.append(self.analysis.prodigal_runner)
                     self.prok_fail_count += 1
 
         except SystemExit as se:
-            self.analysis._cleanup()
+            self.analysis.cleanup()
             raise se
         return callback(s_buscos, d_buscos, f_buscos, s_percent, d_percent, f_percent)
 
@@ -96,14 +98,16 @@ class BuscoRunner:
                 missing_in_parasitic_buscos = [entry.strip() for entry in parasitic_file.readlines()]
             if len(self.analysis.hmmer_runner.missing_buscos) >= 0.8*len(missing_in_parasitic_buscos) \
                     and len(missing_in_parasitic_buscos) > 0:
-                intersection = [mb for mb in self.analysis.hmmer_runner.missing_buscos if mb in missing_in_parasitic_buscos]
-                percent_missing_in_parasites = round(100*len(intersection)/len(self.analysis.hmmer_runner.missing_buscos), 1)
+                intersection = [mb for mb in self.analysis.hmmer_runner.missing_buscos
+                                if mb in missing_in_parasitic_buscos]
+                percent_missing_in_parasites = round(
+                    100*len(intersection)/len(self.analysis.hmmer_runner.missing_buscos), 1)
                 if percent_missing_in_parasites >= 80.0:
                     corrected_summary = self._recalculate_parasitic_scores(len(missing_in_parasitic_buscos))
                     positive_parasitic_line = "\n!!! The missing BUSCOs match the pattern of a parasitic-reduced " \
                                               "genome. {}% of your missing BUSCOs are typically missing in these. " \
                                               "A corrected score would be: \n{}\n".format(percent_missing_in_parasites,
-                                                                                       corrected_summary)
+                                                                                          corrected_summary)
                     final_output_results.append(positive_parasitic_line)
                     if not self.config.getboolean("busco_run", "auto-lineage"):
                         auto_lineage_line = "\nConsider using the auto-lineage mode to select a more specific lineage."
@@ -115,28 +119,29 @@ class BuscoRunner:
         return final_output_results
 
     def _recalculate_parasitic_scores(self, num_missing_in_parasitic):
-        total_buscos = self.analysis.total_buscos - num_missing_in_parasitic
-        single_copy = self.analysis.single_copy
-        multi_copy = self.analysis.multi_copy
-        fragmented_copy = self.analysis.only_fragments
+        total_buscos = self.analysis.hmmer_runner.total_buscos - num_missing_in_parasitic
+        single_copy = self.analysis.hmmer_runner.single_copy
+        multi_copy = self.analysis.hmmer_runner.multi_copy
+        fragmented_copy = self.analysis.hmmer_runner.only_fragments
         s_percent = abs(round(100*single_copy/total_buscos, 1))
         d_percent = abs(round(100*multi_copy/total_buscos, 1))
         f_percent = abs(round(100*fragmented_copy/total_buscos, 1))
 
         one_line_summary = "C:{}%[S:{}%,D:{}%],F:{}%,M:{}%,n:{}\t\n".format(
-            round(s_percent + d_percent, 1), s_percent, d_percent, f_percent, round(100-s_percent-d_percent-f_percent, 1), total_buscos)
+            round(s_percent + d_percent, 1), s_percent, d_percent, f_percent,
+            round(100-s_percent-d_percent-f_percent, 1), total_buscos)
         return one_line_summary
 
-
-
     def organize_final_output(self):
         main_out_folder = self.config.get("busco_run", "main_out")
 
         try:
             domain_results_folder = self.config.get("busco_run", "domain_run_name")
-            root_domain_output_folder = os.path.join(main_out_folder, "auto_lineage", "run_{}".format(domain_results_folder))
+            root_domain_output_folder = os.path.join(main_out_folder, "auto_lineage",
+                                                     "run_{}".format(domain_results_folder))
             root_domain_output_folder_final = os.path.join(main_out_folder, "run_{}".format(domain_results_folder))
             os.rename(root_domain_output_folder, root_domain_output_folder_final)
+            os.symlink(root_domain_output_folder_final, root_domain_output_folder)
             shutil.copyfile(os.path.join(root_domain_output_folder_final, "short_summary.txt"),
                             os.path.join(main_out_folder, "short_summary.generic.{}.{}.txt".format(
                                 domain_results_folder.replace("run_", ""), os.path.basename(main_out_folder))))
@@ -155,8 +160,10 @@ class BuscoRunner:
                                 lineage_results_folder.replace("run_", ""), os.path.basename(main_out_folder))))
         return
 
-    @staticmethod  # This is deliberately a staticmethod so it can be called from run_BUSCO() even if BuscoRunner has not yet been initialized.
+    @staticmethod
     def move_log_file(config):
+        # This is deliberately a staticmethod so it can be called from run_BUSCO() even if BuscoRunner has not yet
+        # been initialized.
         try:
             log_folder = os.path.join(config.get("busco_run", "main_out"), "logs")
             if not os.path.exists(log_folder):
@@ -166,11 +173,7 @@ class BuscoRunner:
             logger.warning("Unable to move 'busco_{}.log' to the 'logs' folder.".format(BuscoLogger.random_id))
         return
 
-
-    def finish(self, elapsed_time, root_lineage=False):
-        # if root_lineage:
-        #     logger.info("Generic lineage selected. Results reproduced here.\n"
-        #                 "{}".format(" ".join(self.analysis.hmmer_results_lines)))
+    def finish(self, elapsed_time):
 
         final_output_results = self.format_results()
         logger.info("".join(final_output_results))
@@ -265,5 +268,3 @@ class SmartBox:
             box_lines.append("\t{}".format(line))
         box_lines.append("\t{}".format(self.add_horizontal()))
         return "\n".join(box_lines)
-
-


=====================================
src/busco/BuscoTools.py
=====================================
The diff for this file was not included because it is too large.

=====================================
src/busco/GeneSetAnalysis.py
=====================================
@@ -24,17 +24,17 @@ class GeneSetAnalysis(ProteinAnalysis, BuscoAnalysis):
     """
     _mode = 'proteins'
 
-    def __init__(self, config):
+    def __init__(self):
         """
         Initialize an instance.
         :param params: Values of all parameters that have to be defined
         :type params: PipeConfig
         """
-        super().__init__(config)
+        super().__init__()
         self.sequences_aa = {record.id: record for record in list(SeqIO.parse(self._input_file, "fasta"))}
 
-    def _cleanup(self):
-        super()._cleanup()
+    def cleanup(self):
+        super().cleanup()
 
     def run_analysis(self):
         """
@@ -42,11 +42,7 @@ class GeneSetAnalysis(ProteinAnalysis, BuscoAnalysis):
         """
         super().run_analysis()
         self.run_hmmer(self._input_file)
-        self._write_buscos_to_file(self.sequences_aa)
-        self._cleanup()
+        self.hmmer_runner.write_buscos_to_file(self.sequences_aa)
         # if self._tarzip:
         #     self._run_tarzip_hmmer_output()
         return
-
-    def create_dirs(self):
-        super().create_dirs()


=====================================
src/busco/GenomeAnalysis.py
=====================================
@@ -12,19 +12,16 @@ Licensed under the MIT license. See LICENSE.md file.
 """
 from busco.BuscoAnalysis import BuscoAnalysis
 from busco.Analysis import NucleotideAnalysis
-from busco.BuscoTools import ProdigalRunner, AugustusRunner, GFF2GBRunner, NewSpeciesRunner, ETrainingRunner, OptimizeAugustusRunner, NoGenesError
-from busco.BuscoConfig import BuscoConfigAuto
+from busco.BuscoTools import ProdigalRunner, AugustusRunner, GFF2GBRunner, NewSpeciesRunner, ETrainingRunner, \
+    OptimizeAugustusRunner, NoGenesError
 import os
 import shutil
 from busco.BuscoLogger import BuscoLogger
 from busco.BuscoLogger import LogDecorator as log
-from busco.Toolset import Tool
 import time
 from abc import ABCMeta, abstractmethod
 from configparser import NoOptionError
 
-
-
 logger = BuscoLogger.get_logger(__name__)
 
 
@@ -32,25 +29,13 @@ class GenomeAnalysis(NucleotideAnalysis, BuscoAnalysis, metaclass=ABCMeta):
 
     _mode = "genome"
 
-    def __init__(self, config):
-        super().__init__(config)
+    def __init__(self):
+        super().__init__()
 
     @abstractmethod
     def run_analysis(self):
         super().run_analysis()
 
-
-    @abstractmethod
-    def create_dirs(self):
-        super().create_dirs()
-
-    def check_tool_dependencies(self):
-        """
-        check dependencies on tools
-        :raises SystemExit: if a Tool is not available
-        """
-        super().check_tool_dependencies()
-
     def init_tools(self):
         """
         Initialize tools needed for Genome Analysis.
@@ -58,7 +43,6 @@ class GenomeAnalysis(NucleotideAnalysis, BuscoAnalysis, metaclass=ABCMeta):
         """
         super().init_tools()
 
-
     # def _run_tarzip_augustus_output(self): # Todo: rewrite using tarfile
     #     """
     #     This function tarzips results folder
@@ -87,20 +71,12 @@ class GenomeAnalysis(NucleotideAnalysis, BuscoAnalysis, metaclass=ABCMeta):
     #                   "%ssingle_copy_busco_sequences.tar.gz" % self.main_out,
     #                   "single_copy_busco_sequences", "--remove-files"], "bash", shell=False)
 
-    def set_rerun_busco_command(self, clargs):
-        """
-        This function sets the command line to call to reproduce this run
-        """
-        clargs.extend(["-sp", self._target_species])
-        super().set_rerun_busco_command(clargs)
-
-    def _write_full_table_header(self, out):
-        """
-        This function adds a header line to the full table file
-        :param out: a full table file
-        :type out: file
-        """
-        out.write("# Busco id\tStatus\tContig\tStart\tEnd\tScore\tLength\n")
+    # def set_rerun_busco_command(self, clargs):
+    #     """
+    #     This function sets the command line to call to reproduce this run
+    #     """
+    #     clargs.extend(["-sp", self._target_species])
+    #     super().set_rerun_busco_command(clargs)
 
 
 class GenomeAnalysisProkaryotes(GenomeAnalysis):
@@ -108,107 +84,32 @@ class GenomeAnalysisProkaryotes(GenomeAnalysis):
     This class runs a BUSCO analysis on a genome.
     """
 
-    def __init__(self, config):
+    def __init__(self):
         """
         Initialize an instance.
-        :param config: Values of all parameters that have to be defined
-        :type config: PipeConfig
         """
-        super().__init__(config)
-        self.load_persistent_tools()
-
-        # Get genetic_code from dataset.cfg file
-        # bacteria/archaea=11; Entomoplasmatales,Mycoplasmatales=4
-        try:
-            self._genetic_code = self._config.get("prodigal", "prodigal_genetic_code").split(",")
-        except NoOptionError:
-            self._genetic_code = ["11"]
-
-        if len(self._genetic_code) > 1:
-            try:
-                self.ambiguous_cd_range = [float(self._config.get("prodigal", "ambiguous_cd_range_lower")),
-                                           float(self._config.get("prodigal", "ambiguous_cd_range_upper"))]
-            except NoOptionError:
-                raise SystemExit("Dataset config file does not contain required information. Please upgrade datasets.")
-
-        else:
-            self.ambiguous_cd_range = [None, 0]
-
-        self.code_4_selected = False
-        self.prodigal_output_dir = os.path.join(self.main_out, "prodigal_output")
+        super().__init__()
+        self.prodigal_runner = None
 
-    def _cleanup(self):
-        # tmp_path = os.path.join(self.prodigal_output_dir, "tmp")
-        # if os.path.exists(tmp_path):
-        #     shutil.rmtree(tmp_path)
-        super()._cleanup()
+    def cleanup(self):
+        super().cleanup()
 
     def run_analysis(self):
         """
         This function calls all needed steps for running the analysis.
         """
-        # Initialize tools and check dependencies
         super().run_analysis()
-
-        if not os.path.exists(self.prodigal_output_dir):  # If prodigal has already been run on the input, don't run it again
-            os.makedirs(self.prodigal_output_dir)
-            self._run_prodigal()
-            self._config.persistent_tools.append(self.prodigal_runner)
-
-        elif any(g not in self.prodigal_runner.genetic_code for g in self._genetic_code):
-            self.prodigal_runner.genetic_code = self._genetic_code
-            self.prodigal_runner.cd_lower, self.prodigal_runner.cd_upper = self.ambiguous_cd_range
-            self._run_prodigal()
-
-        else:
-            # Prodigal has already been run on input. Don't run again, just load necessary params.
-            # First determine which GC to use
-            self.prodigal_runner.select_optimal_results(self._genetic_code, self.ambiguous_cd_range)
-            tmp_file = self.prodigal_runner.gc_run_results[self.prodigal_runner.gc]["tmp_name"]
-            log_file = self.prodigal_runner.gc_run_results[self.prodigal_runner.gc]["log_file"]
-            self.prodigal_runner._organize_prodigal_files(tmp_file, log_file)
-
-        self.code_4_selected = self.prodigal_runner.gc == "4"
-        self.sequences_nt = self.prodigal_runner.gc_run_results[self.prodigal_runner.gc]["seqs_nt"]
-        self.sequences_aa = self.prodigal_runner.gc_run_results[self.prodigal_runner.gc]["seqs_aa"]
-        self._gene_details = self.prodigal_runner.gc_run_results[self.prodigal_runner.gc]["gene_details"]
+        self._run_prodigal()
         self.run_hmmer(self.prodigal_runner.output_faa)
-        self._write_buscos_to_file(self.sequences_aa, self.sequences_nt)
+        self.hmmer_runner.write_buscos_to_file(self.sequences_aa, self.sequences_nt)
         return
 
-    def load_persistent_tools(self):
-        """
-        For multiple runs, load Prodigal Runner in the same state as the previous run, to avoid having to run Prodigal
-        on the input again.
-        :return:
-        """
-        for tool in self._config.persistent_tools:
-            if isinstance(tool, ProdigalRunner):
-                self.prodigal_runner = tool
-            else:
-                raise SystemExit("Unrecognized persistent tool.")
-
-    def create_dirs(self):
-        super().create_dirs()
-
-    def check_tool_dependencies(self):
-        """
-        check dependencies on tools
-        :raises SystemExit: if a Tool is not available
-        """
-        super().check_tool_dependencies()
-
     def init_tools(self):
         """
         Init the tools needed for the analysis
         """
         super().init_tools()
-        try:
-            assert(isinstance(self._prodigal_tool, Tool))
-        except AttributeError:
-            self._prodigal_tool = Tool("prodigal", self._config)
-        except AssertionError:
-            raise SystemExit("Prodigal should be a tool")
+        self.prodigal_runner = ProdigalRunner()
 
     @log("***** Run Prodigal on input to predict and extract genes *****", logger)
     def _run_prodigal(self):
@@ -216,69 +117,64 @@ class GenomeAnalysisProkaryotes(GenomeAnalysis):
         Run Prodigal on input file to detect genes.
         :return:
         """
-        if not hasattr(self, "prodigal_runner"):
-            self.prodigal_runner = ProdigalRunner(self._prodigal_tool, self._input_file, self.prodigal_output_dir,
-                                                  self._genetic_code, self.ambiguous_cd_range, self.log_folder)
-        self.prodigal_runner.run()
-        self.code_4_selected = self.prodigal_runner.code_4_selected
-        return
+        if self.restart and self.prodigal_runner.check_previous_completed_run():
+            logger.info("Skipping Prodigal run as it has already completed")
+            self.prodigal_runner.get_gene_details()
+        else:
+            self.restart = False
+            self.config.set("busco_run", "restart", str(self.restart))
+            self.prodigal_runner.run()
+        self.gene_details = self.prodigal_runner.gene_details
+        self.sequences_nt = self.prodigal_runner.sequences_nt
+        self.sequences_aa = self.prodigal_runner.sequences_aa
 
-    def _write_full_table_header(self, out):
-        """
-        This function adds a header line to the full table file
-        :param out: a full table file
-        :type out: file
-        """
-        out.write("# Busco id\tStatus\tContig\tStart\tEnd\tScore\tLength\n")
+        return
 
 
 class GenomeAnalysisEukaryotes(GenomeAnalysis):
     """
-    This class runs a BUSCO analysis on a euk_genome.
-    Todo: reintroduce restart mode with checkpoints
+    This class runs a BUSCO analysis on a eukaryote genome.
     """
-    def __init__(self, config):
-        """
-        Retrieve the augustus config path, mandatory for genome
-        Cannot be specified through config because some augustus perl scripts use it as well
-        BUSCO could export it if absent, but do not want to mess up with the user env,
-        let's just tell the user to do it for now.
+    def __init__(self):
+        super().__init__()
 
-        :param config: Values of all parameters that have to be defined
-        :type config: PipeConfig
-        """
-        self._augustus_config_path = os.environ.get("AUGUSTUS_CONFIG_PATH")
+        self._long = self.config.getboolean("busco_run", "long")
         try:
-            self._target_species = config.get("busco_run", "augustus_species")
+            self._target_species = self.config.get("busco_run", "augustus_species")
         except KeyError:
             raise SystemExit("Something went wrong. Eukaryota datasets should specify an augustus species.")
         try:
-            self._augustus_parameters = config.get("busco_run", "augustus_parameters").replace(',', ' ')
+            self._augustus_parameters = self.config.get("busco_run", "augustus_parameters").replace(',', ' ')
         except NoOptionError:
             self._augustus_parameters = ""
-        super().__init__(config)
-        self._check_file_dependencies()
         self.mkblast_runner = None
         self.tblastn_runner = None
         self.augustus_runner = None
+        self.gff2gb_runner = None
+        self.new_species_runner = None
+        self.etraining_runner = None
+        self.optimize_augustus_runner = None
+
         self.sequences_nt = {}
         self.sequences_aa = {}
+        self.gene_details = {}
 
-    def create_dirs(self):
-        super().create_dirs()
-
-    def check_tool_dependencies(self):
-        blast_version = self._get_blast_version()
-        if blast_version not in ["2.2", "2.3"]:  # Known problems with multithreading on BLAST 2.4-2.9.
-            if blast_version == "2.9" and self._tblastn_tool.cmd.endswith(
-                    "tblastn_June13"):  # NCBI sent a binary with this name that avoids the multithreading problems.
-                pass
-            else:
-                logger.warning("You are using BLAST version {}. This is known to yield inconsistent results when "
-                               "multithreading. BLAST will run on a single core as a result. For performance improvement, "
-                               "please revert to BLAST 2.2 or 2.3.".format(blast_version))
-                self.blast_cpus = 1
-        super().check_tool_dependencies()
+    def cleanup(self):
+        """
+        This function cleans temporary files
+        """
+        try:
+            augustus_tmp = self.augustus_runner.tmp_dir  # Should be already done if AugustusRunner ran correctly
+            if os.path.exists(augustus_tmp):
+                shutil.rmtree(augustus_tmp)
+        except OSError:
+            pass
+        try:
+            if self._target_species.startswith("BUSCO"):
+                self.augustus_runner.move_retraining_parameters()
+        except OSError:
+            pass
+        super().cleanup()
 
     def init_tools(self):
         """
@@ -287,289 +183,131 @@ class GenomeAnalysisEukaryotes(GenomeAnalysis):
         :return:
         """
         super().init_tools()
-        try:
-            assert(isinstance(self._mkblast_tool, Tool))
-        except AttributeError:
-            self._mkblast_tool = Tool("makeblastdb", self._config)
-        except AssertionError:
-            raise SystemExit("mkblast should be a tool")
 
-        try:
-            assert(isinstance(self._tblastn_tool, Tool))
-        except AttributeError:
-            self._tblastn_tool = Tool("tblastn", self._config)
-        except AssertionError:
-            raise SystemExit("tblastn should be a tool")
-        try:
-            assert(isinstance(self._augustus_tool, Tool))
-        except AttributeError:
-            self._augustus_tool = Tool("augustus", self._config, augustus_out=True)
-            # For some reason Augustus appears to send a return code before it writes to stdout, so we have to
-            # sleep briefly to allow the output to be written to the file. Otherwise we have a truncated output which
-            # will cause an error.
-            # self._augustus_tool.sleep = 0.4
-        except AssertionError:
-            raise SystemExit("Augustus should be a tool")
-
-        try:
-            assert(isinstance(self._gff2gbSmallDNA_tool, Tool))
-        except AttributeError:
-            self._gff2gbSmallDNA_tool = Tool("gff2gbSmallDNA.pl", self._config)
-        except AssertionError:
-            raise SystemExit("gff2gbSmallDNA.pl should be a tool")
-
-        try:
-            assert(isinstance(self._new_species_tool, Tool))
-        except AttributeError:
-            self._new_species_tool = Tool("new_species.pl", self._config)
-        except AssertionError:
-            raise SystemExit("new_species.pl should be a tool")
-
-        try:
-            assert(isinstance(self._etraining_tool, Tool))
-        except AttributeError:
-            self._etraining_tool = Tool("etraining", self._config)
-        except AssertionError:
-            raise SystemExit("etraining should be a tool")
+        self.augustus_runner = AugustusRunner()
+        self.gff2gb_runner = GFF2GBRunner()
+        self.new_species_runner = NewSpeciesRunner()
+        self.etraining_runner = ETrainingRunner()
 
         if self._long:
-            try:
-                assert (isinstance(self._optimize_augustus_tool, Tool))
-            except AttributeError:
-                self._optimize_augustus_tool = Tool("optimize_augustus.pl", self._config)
-            except AssertionError:
-                raise SystemExit("optimize_augustus should be a tool")
+            self.optimize_augustus_runner = OptimizeAugustusRunner()
 
         return
 
-    @log("Running Augustus gene predictor on BLAST search results.", logger)
-    def _run_augustus(self, coords, rerun=False):
-        output_dir = os.path.join(self.run_folder, "augustus_output")
-        if not os.path.exists(output_dir):  # TODO: consider grouping all create_dir calls into one function for all tools
-            os.mkdir(output_dir)
-        # if self.augustus_runner:
-        #     self.augustus_runner.coords = coords
-        #     self.augustus_runner.target_species = self._target_species
-        # else:
-        self.augustus_runner = AugustusRunner(self._augustus_tool, output_dir, self.tblastn_runner.output_seqs, self._target_species,
-                                              self._lineage_dataset, self._augustus_parameters, coords,
-                                              self._cpus, self.log_folder, self.sequences_aa, self.sequences_nt, rerun)
-        self.augustus_runner.run()
-        self.sequences_nt = self.augustus_runner.sequences_nt
-        self.sequences_aa = self.augustus_runner.sequences_aa
+    def run_analysis(self):
+        """This function calls all needed steps for running the analysis."""
+        super().run_analysis()
+        self._run_mkblast()
+        self._run_tblastn()
+        self._run_augustus(self.tblastn_runner.coords)
+        self.gene_details = self.augustus_runner.gene_details
+        self.run_hmmer(self.augustus_runner.output_sequences)
+        self._rerun_analysis()
 
     def _rerun_augustus(self, coords):
-        self._augustus_tool.total = 0  # Reset job count
-        self._augustus_tool.nb_done = 0
         missing_and_fragmented_buscos = self.hmmer_runner.missing_buscos + list(
             self.hmmer_runner.fragmented_buscos.keys())
         logger.info("Re-running Augustus with the new metaparameters, number of target BUSCOs: {}".format(
             len(missing_and_fragmented_buscos)))
-        missing_and_fragmented_coords = {busco: coords[busco] for busco in coords if busco in missing_and_fragmented_buscos}
+        missing_and_fragmented_coords = {busco: coords[busco] for busco in coords if busco in
+                                         missing_and_fragmented_buscos}
         logger.debug('Trained species folder is {}'.format(self._target_species))
         self._run_augustus(missing_and_fragmented_coords, rerun=True)
         return
 
-    def _set_checkpoint(self, id=None):
-        """
-        This function update the checkpoint file with the provided id or delete
-        it if none is provided
-        :param id: the id of the checkpoint
-        :type id: int
-        """
-        checkpoint_filename = os.path.join(self.run_folder, "checkpoint.tmp")
-        if id:
-            with open(checkpoint_filename, "w") as checkpt_file:
-                checkpt_file.write("{}.{}".format(id, self._mode))
-        else:
-            if os.path.exists(checkpoint_filename):
-                os.remove(checkpoint_filename)
-        return
-
-    def _run_gff2gb(self):
-        self.gff2gb = GFF2GBRunner(self._gff2gbSmallDNA_tool, self.run_folder, self._input_file,
-                                   self.hmmer_runner.single_copy_buscos, self._cpus)
-        self.gff2gb.run()
-        return
-
-    def _run_new_species(self):
-        new_species_name = "BUSCO_{}".format(os.path.basename(self.main_out))
-        self.new_species_runner = NewSpeciesRunner(self._new_species_tool, self._domain, new_species_name, self._cpus)
-        # create new species config file from template
-        self.new_species_runner.run()
-        return new_species_name
-
-    def _run_etraining(self):
-        # train on new training set (complete single copy buscos)
-        self.etraining_runner = ETrainingRunner(self._etraining_tool, self.main_out, self.run_folder, self._cpus, self._augustus_config_path)
-        self.etraining_runner.run()
-        return
-
-    def run_analysis(self):
-        """
-                This function calls all needed steps for running the analysis.
-                Todo: reintroduce checkpoints and restart option.
-        """
-
-        super().run_analysis()
-        self._run_mkblast()
-        coords = self._run_tblastn()
-        self._run_augustus(coords)
-        self._gene_details = self.augustus_runner.gene_details
-        self.run_hmmer(self.augustus_runner.output_sequences)
-        self.rerun_analysis()
-
     @log("Starting second step of analysis. The gene predictor Augustus is retrained using the results from the "
          "initial run to yield more accurate results.", logger)
-    def rerun_analysis(self):
-
-        # self._fix_restart_augustus_folder()  # todo: reintegrate this when checkpoints are restored
-        coords = self._run_tblastn(missing_and_frag_only=True, ancestral_variants=self._has_variants_file)
-
-        logger.info("Training Augustus using Single-Copy Complete BUSCOs:")
-        logger.info("Converting predicted genes to short genbank files")
+    def _rerun_analysis(self):
 
+        self.augustus_runner.make_gff_files(self.hmmer_runner.single_copy_buscos)
+        self._run_tblastn(missing_and_frag_only=True, ancestral_variants=self._has_variants_file)
         self._run_gff2gb()
-
-        logger.info("All files converted to short genbank files, now running the training scripts")
-        new_species_name = self._run_new_species()
-        self._target_species = new_species_name  # todo: check new species folder can be read/written - detect any silent Augustus issues
-
-        self._merge_gb_files()
-
+        self._run_new_species()
+        self.config.set("busco_run", "augustus_species", self.new_species_runner.new_species_name)
+        self._target_species = self.new_species_runner.new_species_name
         self._run_etraining()
 
         if self._long:
-            self._run_optimize_augustus(new_species_name)
+            self._run_optimize_augustus(self.new_species_runner.new_species_name)
             self._run_etraining()
 
         try:
-            self._rerun_augustus(coords)
-            self._gene_details.update(self.augustus_runner.gene_details)
+            self._rerun_augustus(self.tblastn_runner.coords)
+            self.gene_details.update(self.augustus_runner.gene_details)
             self.run_hmmer(self.augustus_runner.output_sequences)
-            self._write_buscos_to_file(self.sequences_aa, self.sequences_nt)
+            self.hmmer_runner.write_buscos_to_file(self.sequences_aa, self.sequences_nt)
         except NoGenesError:
             logger.warning("No genes found on Augustus rerun.")
 
-        # self._move_retraining_parameters()
-        # if self._tarzip:
+        # if self._tarzip:  # todo: zip folders with a lot of output
         #     self._run_tarzip_augustus_output()
         #     self._run_tarzip_hmmer_output()
         # remove the checkpoint, run is done
         # self._set_checkpoint()
         return
 
-    def _check_file_dependencies(self):  # todo: currently only implemented for GenomeAnalysisEukaryotes, checking Augustus dirs. Does it need to be rolled out for all analyses?
-        """
-        check dependencies on files and folders
-        properly configured.
-        :raises SystemExit: if Augustus config path is not writable or
-        not set at all
-        :raises SystemExit: if Augustus config path does not contain
-        the needed species
-        present
-        """
-        try:
-            augustus_species_dir = os.path.join(self._augustus_config_path, "species")
-            if not os.access(augustus_species_dir, os.W_OK):
-                raise SystemExit("Cannot write to Augustus species folder, please make sure you have write "
-                                 "permissions to {}".format(augustus_species_dir))
-
-        except TypeError:
-            raise SystemExit(
-                "The environment variable AUGUSTUS_CONFIG_PATH is not set")
-
-
-        if not os.path.exists(os.path.join(augustus_species_dir, self._target_species)):
-            raise SystemExit(
-                "Impossible to locate the species \"{0}\" in Augustus species folder"
-                " ({1}), check that AUGUSTUS_CONFIG_PATH is properly set"
-                " and contains this species. \n\t\tSee the help if you want "
-                "to provide an alternative species".format(self._target_species, augustus_species_dir))
-
-    def set_rerun_busco_command(self, clargs):
-        """
-        This function sets the command line to call to reproduce this run
-        """
-        clargs.extend(["-sp", self._target_species])
-        if self._augustus_parameters:
-            clargs.extend(["--augustus_parameters", "\"%s\"" % self._augustus_parameters])
-        super().set_rerun_busco_command(clargs)
-
-    def _cleanup(self):
-        """
-        This function cleans temporary files
-        """
-        try:
-            augustus_tmp = self.augustus_runner.tmp_dir  # Should be already done if AugustusRunner ran correctly
-            if os.path.exists(augustus_tmp):
-                shutil.rmtree(augustus_tmp)
-        except:
-            pass
-        try:
-            if self._target_species.startswith("BUSCO"):
-                self._move_retraining_parameters()
-        except:
-            pass
-        super()._cleanup()
-
-
-    def _fix_restart_augustus_folder(self):
-        """
-        This function resets and checks the augustus folder to make a restart
-        possible in phase 2
-        :raises SystemExit: if it is not possible to fix the folders
-        # Todo: reintegrate this when restart option is added back
-        """
-        if os.path.exists(os.path.join(self.augustus_runner.output_folder, "predicted_genes_run1")) \
-                and os.path.exists(os.path.join(self.main_out, "hmmer_output_run1")):
-            os.remove(os.path.join(self.main_out, "augustus_output", "predicted_genes", "*"))
-            os.rmdir(os.path.join(self.main_out, "augustus_output", "predicted_genes"))
-
-            os.rename(os.path.join(self.main_out, "augustus_output", "predicted_genes_run1"),
-                      os.path.join(self.main_out, "augustus_output", "predicted_genes"))
-
-            os.remove(os.path.join(self.main_out, "hmmer_output", "*"))
-            os.rmdir(os.path.join(self.main_out, "hmmer_output"))
-
-            os.rename(os.path.join(self.main_out, "hmmer_output_run1"), os.path.join(self.main_out, "hmmer_output"))
+    @log("Running Augustus gene predictor on BLAST search results.", logger)
+    def _run_augustus(self, coords, rerun=False):
+        self.augustus_runner.configure_runner(self.tblastn_runner.output_seqs, coords, self.sequences_aa,
+                                              self.sequences_nt, rerun)
 
+        if self.restart and self.augustus_runner.check_previous_completed_run():
+            run = "2nd" if rerun else "1st"
+            logger.info("Skipping {} augustus run as output already processed".format(run))
+        else:
+            self.restart = False
+            self.config.set("busco_run", "restart", str(self.restart))
+            self.augustus_runner.run()
+        self.augustus_runner.process_output()
+        self.sequences_nt = self.augustus_runner.sequences_nt
+        self.sequences_aa = self.augustus_runner.sequences_aa
 
-        elif (os.path.exists(os.path.join(self.main_out, "augustus_output", "predicted_genes"))
-              and os.path.exists(os.path.join(self.main_out, "hmmer_output"))):
-            pass
+    def _run_etraining(self):
+        """Train on new training set (complete single copy buscos)"""
+        self.etraining_runner.configure_runner(self.new_species_runner.new_species_name)
+        if self.restart and self.etraining_runner.check_previous_completed_run():
+            logger.info("Skipping etraining as it has already been done")
         else:
-            raise SystemExit("Impossible to restart the run, necessary folders are missing. Use the -f option instead of -r")
+            self.restart = False
+            self.config.set("busco_run", "restart", str(self.restart))
+            self.etraining_runner.run()
         return
 
-    def _move_retraining_parameters(self):
-        """
-        This function moves retraining parameters from augustus species folder
-        to the run folder
-        """
-        augustus_species_path = os.path.join(self._augustus_config_path, "species", self._target_species)
-        if os.path.exists(augustus_species_path):
-            new_path = os.path.join(self.augustus_runner.output_folder, "retraining_parameters", self._target_species)
-            shutil.move(augustus_species_path, new_path)
+    @log("Converting predicted genes to short genbank files", logger)
+    def _run_gff2gb(self):
+        self.gff2gb_runner.configure_runner(self.hmmer_runner.single_copy_buscos)
+        if self.restart and self.gff2gb_runner.check_previous_completed_run():
+            logger.info("Skipping gff2gb conversion as it has already been done")
         else:
-            logger.warning("Augustus did not produce a retrained species folder.")
+            self.restart = False
+            self.config.set("busco_run", "restart", str(self.restart))
+            self.gff2gb_runner.run()
         return
 
-    def _merge_gb_files(self):
-        logger.debug("concat all gb files...")
-        # Concatenate all GB files into one large file
-        with open(os.path.join(self.augustus_runner.output_folder, "training_set.db"), "w") as outfile:
-            gb_dir_path = os.path.join(self.augustus_runner.output_folder, "gb")
-            for fname in os.listdir(gb_dir_path):
-                with open(os.path.join(gb_dir_path, fname), "r") as infile:
-                    outfile.writelines(infile.readlines())
+    @log("All files converted to short genbank files, now training Augustus using Single-Copy Complete BUSCOs", logger)
+    def _run_new_species(self):
+        """Create new species config file from template"""
+        if self.restart and self.new_species_runner.check_previous_completed_run():
+            logger.info("Skipping new species creation as it has already been done")
+        else:
+            self.restart = False
+            self.config.set("busco_run", "restart", str(self.restart))
+            self.new_species_runner.run()
         return
 
     def _run_optimize_augustus(self, new_species_name):
-        # long mode (--long) option - runs all the Augustus optimization
-        # scripts (adds ~1 day of runtime)
+        """ long mode (--long) option - runs all the Augustus optimization scripts (adds ~1 day of runtime)"""
         logger.warning("Optimizing augustus metaparameters, this may take a very long time, started at {}".format(
             time.strftime("%m/%d/%Y %H:%M:%S")))
-        self.optimize_augustus_runner = OptimizeAugustusRunner(self._optimize_augustus_tool, self.augustus_runner.output_folder, new_species_name, self._cpus)
+        self.optimize_augustus_runner.configure_runner(self.augustus_runner.output_folder, new_species_name)
         self.optimize_augustus_runner.run()
         return
+
+    # def set_rerun_busco_command(self, clargs):
+    #     """
+    #     This function sets the command line to call to reproduce this run
+    #     """
+    #     clargs.extend(["-sp", self._target_species])
+    #     if self._augustus_parameters:
+    #         clargs.extend(["--augustus_parameters", "\"%s\"" % self._augustus_parameters])
+    #     super().set_rerun_busco_command(clargs)


=====================================
src/busco/Toolset.py
=====================================
@@ -13,9 +13,12 @@ Licensed under the MIT license. See LICENSE.md file.
 """
 import os
 import subprocess
-import threading
+from subprocess import TimeoutExpired
+# import threading
+from multiprocessing import Process, Pool, Value, Lock
 import time
 from shutil import which
+from abc import ABCMeta, abstractmethod
 from busco.BuscoLogger import BuscoLogger, ToolLogger
 from busco.BuscoLogger import LogDecorator as log
 from busco.BuscoLogger import StreamLogger
@@ -23,12 +26,12 @@ import logging
 
 logger = BuscoLogger.get_logger(__name__)
 
-class Job(threading.Thread):
+class Job(Process):#threading.Thread):
     """
     Build and executes one work item in an external process
     """
 
-    def __init__(self, tool_name, cmd, job_outlogger, job_errlogger, **kwargs):
+    def __init__(self, tool_name, cmd, job_outlogger, job_errlogger, timeout, **kwargs):
         """
         :param name: a name of an executable / script ("a tool") to be run
         :type cmd: list
@@ -42,6 +45,7 @@ class Job(threading.Thread):
         self.cmd_line = [cmd]
         self.job_outlogger = job_outlogger
         self.job_errlogger = job_errlogger
+        self.timeout = timeout
         self.kwargs = kwargs
 
     def add_parameter(self, parameter):
@@ -60,14 +64,22 @@ class Job(threading.Thread):
         """
         with StreamLogger(logging.DEBUG, self.job_outlogger, **self.kwargs) as out:  # kwargs only provided to out to capture augustus stdout
             with StreamLogger(logging.ERROR, self.job_errlogger) as err:
-                # Stick with Popen(), communicate() and wait() instead of just run() to ensure compatibility with
-                # Python versions < 3.5.
-                p = subprocess.Popen(self.cmd_line, shell=False, stdout=out, stderr=err)
-                p.wait()
+                try:
+                    # Stick with Popen(), communicate() and wait() instead of just run() to ensure compatibility with
+                    # Python versions < 3.5.
+                    p = subprocess.Popen(self.cmd_line, shell=False, stdout=out, stderr=err)
+                    p.wait(self.timeout)
+                except TimeoutExpired:
+                    p.kill()
+                    logger.warning("The following job was killed as it was taking too long (>1hr) to "
+                                   "complete.\n{}".format(" ".join(self.cmd_line)))
+
         self.job_outlogger._file_hdlr.close()
         self.job_outlogger.removeHandler(self.job_outlogger._file_hdlr)
         self.job_errlogger._file_hdlr.close()
         self.job_errlogger.removeHandler(self.job_errlogger._file_hdlr)
+        with cnt.get_lock():
+            cnt.value += 1
 
 class ToolException(Exception):
     """
@@ -80,12 +92,12 @@ class ToolException(Exception):
         return self.value
 
 
-class Tool:
+class Tool(metaclass=ABCMeta):
     """
     Collection of utility methods used by all tools
     """
 
-    def __init__(self, name, config, **kwargs):
+    def __init__(self):
         """
         Initialize job list for a tool
         :param name: the name of the tool to execute
@@ -94,51 +106,45 @@ class Tool:
         :type config: configparser.ConfigParser
         """
 
-        self.name = name
-        self.config = config
         self.cmd = None
-        if not self.check_tool_available():
-            raise ToolException("{} tool cannot be found. Please check the 'path' and 'command' parameters "
-                                "provided in the config file. Do not include the command in the path!".format(self.name))
-
-        self.logfile_path_out = self._set_logfile_path()
-        self.logfile_path_err = self.logfile_path_out.replace('_out.log', '_err.log')
-        self.kwargs = kwargs
+        # self.name = name
+        # if not self.check_tool_available():
+        #     raise ToolException("{} tool cannot be found. Please check the 'path' and 'command' parameters "
+        #                         "provided in the config file. Do not include the command in the path!".format(self.name))
+        if self.name == "augustus":
+            self.kwargs = {"augustus_out": True}
+            self.timeout = 3600
+        else:
+            self.kwargs = {}
+            self.timeout = None
         self.jobs_to_run = []
         self.jobs_running = []
         self.nb_done = 0
         self.total = 0
-        self.count_jobs_created = True
-        self.logged_header = False
+        self.cpus = None
+        self.chunksize = None
+        # self.count_jobs_created = True
+        # self.logged_header = False
 
-    def check_tool_available(self):
-        """
-        Check tool's availability.
-        1. The section ['name'] is available in the config
-        2. This section contains keys 'path' and 'command'
-        3. The string resulted from contatination of values of these two keys
-        represents the full path to the command
-        :param name: the name of the tool to execute
-        :type name: str
-        :param config: initialized instance of ConfigParser
-        :type config: configparser.ConfigParser
-        :return: True if the tool can be run, False if it is not the case
-        :rtype: bool
-        """
-        if not self.config.has_section(self.name):
-            raise ToolException("Section for the tool [{}] is not present in the config file".format(self.name))
+        # self.logfile_path_out = os.path.join(self.config.get("busco_run", "main_out"), "logs", "{}_out.log".format(self.name))
+        # self.logfile_path_err = self.logfile_path_out.replace('_out.log', '_err.log')
 
-        if not self.config.has_option(self.name, 'path'):
-            raise ToolException("Key \'path\' in the section [{}] is not present in the config file".format(self.name))
+    @abstractmethod
+    def configure_job(self):
+        pass
 
-        if self.config.has_option(self.name, 'command'):
-            executable = self.config.get(self.name, 'command')
-        else:
-            executable = self.name
+    @abstractmethod
+    def generate_job_args(self):
+        pass
 
-        self.cmd = os.path.join(self.config.get(self.name, 'path'), executable)
+    @property
+    @abstractmethod
+    def name(self):
+        raise NotImplementedError
 
-        return which(self.cmd) is not None # True if tool available
+    @abstractmethod
+    def write_checkpoint_file(self):
+        pass
 
     def create_job(self):
         """
@@ -146,10 +152,10 @@ class Tool:
         """
         self.tool_outlogger = ToolLogger(self.logfile_path_out)
         self.tool_errlogger = ToolLogger(self.logfile_path_err)
-        job = Job(self.name, self.cmd[:], self.tool_outlogger, self.tool_errlogger, **self.kwargs)
+        job = Job(self.name, self.cmd[:], self.tool_outlogger, self.tool_errlogger, self.timeout, **self.kwargs)
         self.jobs_to_run.append(job)
-        if self.count_jobs_created:
-            self.total += 1
+        # if self.count_jobs_created:
+        #     self.total += 1
         return job
 
     def remove_job(self, job):
@@ -160,45 +166,44 @@ class Tool:
         """
         self.jobs_to_run.remove(job)
 
-    def _set_logfile_path(self):
-        return os.path.join(self.config.get("busco_run", "main_out"), "logs", "{}_out.log".format(self.name))
-
-    @log("Running {} job(s) on {}", logger, attr_name=['total', 'name'])
     def log_jobs_to_run(self):
-        self.logged_header = True
+        logger.info("Running {} job(s) on {}, starting at {}".format(self.total, self.name,
+                                                                     time.strftime('%m/%d/%Y %H:%M:%S')))
+        return
 
-    def run_jobs(self, max_threads):
-        """
-        This method run all jobs created for the Tool and redirect
-        the standard output and error to the current logger
-        :param max_threads: the number or threads to run simultaneously
-        :type max_threads: int
-        :param log_it: whether to log the progress for the tasks. Default True
-        :type log_it: boolean
-        """
-        if not self.logged_header:
+    @log("No jobs to run on {}", logger, attr_name="name", iswarn=True)
+    def log_no_jobs(self):
+        return
+
+    def run_jobs(self):
+        if self.total > 0:
             self.log_jobs_to_run()
+        else:
+            self.log_no_jobs()
+            return
 
-        # Wait for all threads to finish and log progress
-        already_logged = 0
-        while len(self.jobs_to_run) > 0 or len(self.jobs_running) > 0:
-            time.sleep(0.001)
-            for j in self.jobs_to_run:
-                if len(self.jobs_running) < max_threads:
-                    self.jobs_running.append(j)
-                    self.jobs_to_run.remove(j)
-                    j.start()
-            for j in self.jobs_running:
-                # j.join()
-                if not j.is_alive():
-                    self.jobs_running.remove(j)
-                    self.nb_done += 1
-
-            if (self.nb_done == self.total or int(self.nb_done % float(self.total/10)) == 0) and self.nb_done != already_logged:
-                already_logged = self._track_progress()
-        # self.total = 0  # Reset for tools that are run twice (tblastn, augustus)
+        if self.cpus is None:  # todo: need a different way to ensure self.cpus is nonzero number.
+            raise SystemExit("Number of CPUs not specified.")
 
+        with Pool(self.cpus, initializer=type(self).init_globals, initargs=(Value('i', 0),)) as job_pool:
+            job_pool.map(self.run_job, self.generate_job_args(), chunksize=self.chunksize)
+        self.write_checkpoint_file()
+
+    def run_job(self, args):
+        args = (args,) if isinstance(args, str) else tuple(args or (args,))  # Ensure args are tuples that can be unpacked. If no args, args=None, which is falsy, and this evaluates to (None,)
+        job = self.configure_job(*args)
+        job.run()
+        self.nb_done = cnt.value
+        if (self.nb_done == self.total or int(
+                self.nb_done % float(self.total / 10)) == 0):
+             self._track_progress()
 
     @log('[{0}]\t{1} of {2} task(s) completed', logger, attr_name=['name', 'nb_done', 'total'], on_func_exit=True)
     def _track_progress(self):
-        return self.nb_done
\ No newline at end of file
+        return
+
+    @classmethod
+    def init_globals(cls, counter):
+        """Counter code adapted from the answer here: https://stackoverflow.com/a/53621343/4844311"""
+        global cnt
+        cnt = counter


=====================================
src/busco/TranscriptomeAnalysis.py
=====================================
@@ -11,8 +11,6 @@ Licensed under the MIT license. See LICENSE.md file.
 
 """
 import os
-import time
-
 from busco.BuscoAnalysis import BuscoAnalysis
 from busco.BuscoLogger import BuscoLogger
 from busco.BuscoLogger import LogDecorator as log
@@ -20,13 +18,13 @@ from Bio.Seq import reverse_complement, translate
 from Bio import SeqIO
 from Bio.SeqRecord import SeqRecord
 from busco.Analysis import NucleotideAnalysis
-from busco.Toolset import Tool
 
 
 logger = BuscoLogger.get_logger(__name__)
 
 # todo: catch multiple buscos on one transcript
 
+
 class TranscriptomeAnalysis(NucleotideAnalysis, BuscoAnalysis):
     """
     Analysis on a transcriptome.
@@ -34,14 +32,11 @@ class TranscriptomeAnalysis(NucleotideAnalysis, BuscoAnalysis):
 
     _mode = "transcriptome"
 
-    def __init__(self, config):
+    def __init__(self):
         """
         Initialize an instance.
-        :param config: Values of all parameters that have to be defined
-        :type config: BuscoConfig
         """
-        super().__init__(config)
-
+        super().__init__()
 
     def run_analysis(self):
         """
@@ -58,57 +53,29 @@ class TranscriptomeAnalysis(NucleotideAnalysis, BuscoAnalysis):
         # if checkpoint < 1:
 
         self._run_mkblast()
-        coords = self._run_tblastn(ancestral_variants=self._has_variants_file)
+        self._run_tblastn(ancestral_variants=self._has_variants_file)
 
-        protein_seq_files = self._translate_seqs(coords)
+        protein_seq_files = self._translate_seqs(self.tblastn_runner.coords)
 
         self.run_hmmer(protein_seq_files)
-        # Note BUSCO matches are not written to file, as we have not yet developed a suitable protocol for Transcriptomes
-        self._cleanup()
+        # Note BUSCO matches are not written to file, as we have not yet developed a suitable protocol for
+        # Transcriptomes
         # if self._tarzip:
         #     self._run_tarzip_hmmer_output()
         #     self._run_tarzip_translated_proteins()
         return
 
-    def create_dirs(self): # todo: remove this as abstract method, review all abstract methods
-        super().create_dirs()
-
-    def init_tools(self): # todo: This should be an abstract method
-
+    def init_tools(self):
         super().init_tools()
-        try:
-            assert(isinstance(self._mkblast_tool, Tool))
-        except AttributeError:
-            self._mkblast_tool = Tool("makeblastdb", self._config)
-        except AssertionError:
-            raise SystemExit("mkblast should be a tool")
-
-        try:
-            assert(isinstance(self._tblastn_tool, Tool))
-        except AttributeError:
-            self._tblastn_tool = Tool("tblastn", self._config)
-        except AssertionError:
-            raise SystemExit("tblastn should be a tool")
-
-    def check_tool_dependencies(self):
-        blast_version = self._get_blast_version()
-        if blast_version not in ["2.2", "2.3"]:  # Known problems with multithreading on BLAST 2.4-2.9.
-            if blast_version == "2.9" and self._tblastn_tool.cmd.endswith(
-                    "tblastn_June13"):  # NCBI sent a binary with this name that avoids the multithreading problems.
-                pass
-            else:
-                logger.warning("You are using BLAST version {}. This is known to yield inconsistent results when "
-                               "multithreading. BLAST will run on a single core as a result. For performance improvement, "
-                               "please revert to BLAST 2.2 or 2.3.".format(blast_version))
-                self.blast_cpus = 1
-
-    def _cleanup(self):
+
+    def cleanup(self):
         """
         This function cleans temporary files.
         """
-        super()._cleanup()
+        super().cleanup()
 
-    def six_frame_translation(self, seq):
+    @staticmethod
+    def six_frame_translation(seq):
         """
         Gets the sixframe translation for the provided sequence
         :param seq: the sequence to be translated
@@ -132,7 +99,8 @@ class TranscriptomeAnalysis(NucleotideAnalysis, BuscoAnalysis):
             translated_seqs[descriptions[-(i+1)]] = (translate(anti[i:i + fragment_length], stop_symbol="X"))
         return translated_seqs
 
-    def _reformats_seq_id(self, seq_id):
+    @staticmethod
+    def _reformats_seq_id(seq_id):
         """
         This function reformats the sequence id to its original values
         :param seq_id: the seq id to reformats
@@ -160,8 +128,10 @@ class TranscriptomeAnalysis(NucleotideAnalysis, BuscoAnalysis):
             protein_seq_files.append(output_filename)
             translated_records = []
             for contig_name in contig_info:
-                tmp_filename = os.path.join(self.tblastn_runner.output_seqs, "{}.temp".format(contig_name[:100]))  # Avoid very long filenames
-                for record in SeqIO.parse(tmp_filename, "fasta"):  # These files will only ever have one sequence, but BioPython examples always parse them in an iterator.
+                tmp_filename = os.path.join(self.tblastn_runner.output_seqs, "{}.temp".format(
+                    contig_name[:100]))  # Avoid very long filenames
+                for record in SeqIO.parse(tmp_filename, "fasta"):  # These files will only ever have one sequence,
+                    # but BioPython examples always parse them in an iterator.
                     translated_seqs = self.six_frame_translation(record.seq)
                     for desc_id in translated_seqs:  # There are six possible translated sequences
                         prot_seq = translated_seqs[desc_id]
@@ -172,7 +142,6 @@ class TranscriptomeAnalysis(NucleotideAnalysis, BuscoAnalysis):
 
         return protein_seq_files
 
-
     # def _run_tarzip_translated_proteins(self):
     #     """
     #     This function tarzips results folder


=====================================
src/busco/ViralAnalysis.py deleted
=====================================
@@ -1,146 +0,0 @@
-#!/usr/bin/env python3
-# coding: utf-8
-"""
-.. module:: ViralAnalysis
-   :synopsis: ViralAnalysis implements genome analysis specifics
-.. versionadded:: 3.0.0
-.. versionchanged:: 3.0.0
-
-Copyright (c) 2016-2020, Evgeny Zdobnov (ez at ezlab.org)
-Licensed under the MIT license. See LICENSE.md file.
-
-"""
-from busco.BuscoAnalysis import BuscoAnalysis
-from busco.BuscoLogger import BuscoLogger
-
-logger = BuscoLogger.get_logger(__name__)
-
-class ViralAnalysis(BuscoAnalysis):
-    """
-    This class runs a BUSCO analysis on a gene set.
-    """
-
-    _mode = "proteins"
-
-
-    def __init__(self, params):
-        """
-        Initialize an instance.
-        :param params: Values of all parameters that have to be defined
-        :type params: PipeConfig
-        """
-        super().__init__(params)
-        # data integrity checks not done by the parent class
-        if self.check_protein_file():
-            ViralAnalysis._logger.error("Please provide a genome file as input or run BUSCO in protein mode (--mode proteins)")
-            raise SystemExit
-
-    def run_analysis(self):
-        """
-        This function calls all needed steps for running the analysis.
-        """
-        super().run_analysis()
-        self._translate_virus()
-        self._sequences = self.translated_proteins
-        self._run_hmmer()
-        # if self._tarzip:
-        #     self._run_tarzip_hmmer_output()
-
-    def _init_tools(self):
-        """
-        Init the tools needed for the analysis
-        """
-        super()._init_tools()
-
-    def _run_hmmer(self):
-        """
-        This function runs hmmsearch.
-        """
-        super()._run_hmmer()
-
-    def _sixpack(self, seq):
-        """
-        Gets the sixframe translation for the provided sequence
-        :param seq: the sequence to be translated
-        :type seq: str
-        :return: the six translated sequences
-        :rtype: list
-        """
-        s1 = seq
-        s2 = seq[1:]
-        s3 = seq[2:]
-        rev = ""
-        for letter in seq[::-1]:
-            try:
-                rev += BuscoAnalysis.COMP[letter]
-            except KeyError:
-                rev += BuscoAnalysis.COMP["N"]
-        r1 = rev
-        r2 = rev[1:]
-        r3 = rev[2:]
-        transc = []
-        frames = [s1, s2, s3, r1, r3, r2]
-        for sequence in frames:
-            part = ""
-            new = ""
-            for letter in sequence:
-                if len(part) == 3:
-                    try:
-                        new += BuscoAnalysis.CODONS[part]
-                    except KeyError:
-                        new += "X"
-                    part = ""
-                    part += letter
-                else:
-                    part += letter
-            if len(part) == 3:
-                try:
-                    new += BuscoAnalysis.CODONS[part]
-                except KeyError:
-                    new += "X"
-            transc.append(new)
-        return transc
-    
-    def _translate_virus(self):
-        """
-        Prepares viral genomes for a BUSCO
-        protein analysis.
-        1) Translate any sequences in 6 frames
-        2) Split the sequences on the stops
-        3) Remove sequences shorter than 50aa
-        :return: file name 
-        :rtype: string
-        """
-        with open(self._sequences, "r") as f1:
-            with open(self.mainout + "translated_proteins.faa", "w") as o1:
-                seqs = {}
-                # parse file, retrieve all sequences
-                for line in f1:
-                    if line.startswith(">"):
-                        header = line.strip()
-                        seqs[header] = ""
-                    else:
-                        seqs[header] += line.strip()
-
-                # feed sequences to 6 frame translator
-                # then split on STOP codons
-                ctg_ct = 1
-                for seqid in seqs:
-                    seq_6f = self._sixpack(seqs[seqid])
-                    nb_frame = 1
-                    for frame in seq_6f:
-                        valid_ts_ct = 1
-                        # chop at the stop codons
-                        chopped_seqs = frame.split("X")
-                        for short_seq in chopped_seqs:
-                            # must have at least 50 A.A. to be considered further
-                            if len(short_seq) >= 50:
-                                # ctg nb, frame nb, transcript nb
-                                o1.write(">seq_n%s_f%s_t%s\n" % (ctg_ct, nb_frame, valid_ts_ct))
-                                o1.write("%s\n" % short_seq)
-                                valid_ts_ct += 1
-                            else:
-                                pass
-                        nb_frame += 1
-                    ctg_ct += 1
-        self.translated_proteins = self.main_out + "translated_proteins.faa"


=====================================
src/busco/_version.py
=====================================
@@ -6,4 +6,4 @@ Copyright (c) 2016-2020, Evgeny Zdobnov (ez at ezlab.org)
 Licensed under the MIT license. See LICENSE.md file.
 
 """
-__version__ = "4.0.6"
+__version__ = "4.1.2"


=====================================
src/busco/run_BUSCO.py
=====================================
@@ -66,7 +66,8 @@ def _parse_args():
 
     optional.add_argument(
         '--out_path', dest='out_path', required=False, metavar='OUTPUT_PATH',
-        help='Optional location for results folder, excluding results folder name. Default is current working directory.')
+        help='Optional location for results folder, excluding results folder name. '
+             'Default is current working directory.')
 
     optional.add_argument(
         '-e', '--evalue', dest='evalue', required=False, metavar='N', type=float,
@@ -89,6 +90,10 @@ def _parse_args():
         help='Force rewriting of existing files. '
              'Must be used when output files with the provided name already exist.')
 
+    optional.add_argument(
+        '-r', '--restart', action='store_true', required=False, dest='restart',
+        help='Continue a run that had already partially completed.')
+
     optional.add_argument(
         '--limit', dest='limit', metavar='REGION_LIMIT', required=False,
         type=int, help='How many candidate regions (contig or transcript) to consider per BUSCO (default: %s)'
@@ -117,7 +122,8 @@ def _parse_args():
     #     action="store_true")
 
     optional.add_argument(
-        '--auto-lineage', dest='auto-lineage', action="store_true", required=False, help='Run auto-lineage to find optimum lineage path')
+        '--auto-lineage', dest='auto-lineage', action="store_true", required=False,
+        help='Run auto-lineage to find optimum lineage path')
 
     optional.add_argument(
         '--auto-lineage-prok', dest='auto-lineage-prok', action="store_true", required=False,
@@ -144,7 +150,8 @@ def _parse_args():
 
     optional.add_argument('-h', '--help', action=CleanHelpAction, help="Show this help message and exit")
 
-    optional.add_argument('--list-datasets', action=ListLineagesAction, help="Print the list of available BUSCO datasets")
+    optional.add_argument('--list-datasets', action=ListLineagesAction,
+                          help="Print the list of available BUSCO datasets")
 
     return vars(parser.parse_args())
 
@@ -160,7 +167,9 @@ def main():
     params = _parse_args()
     run_BUSCO(params)
 
- at log('***** Start a BUSCO v{} analysis, current time: {} *****'.format(busco.__version__, time.strftime('%m/%d/%Y %H:%M:%S')), logger)
+
+ at log('***** Start a BUSCO v{} analysis, current time: {} *****'.format(busco.__version__,
+                                                                       time.strftime('%m/%d/%Y %H:%M:%S')), logger)
 def run_BUSCO(params):
     start_time = time.time()
 
@@ -173,14 +182,16 @@ def run_BUSCO(params):
 
         lineage_basename = os.path.basename(config.get("busco_run", "lineage_dataset"))
         main_out_folder = config.get("busco_run", "main_out")
-        lineage_results_folder = os.path.join(main_out_folder, "auto_lineage", config.get("busco_run", "lineage_results_dir"))
+        lineage_results_folder = os.path.join(main_out_folder, "auto_lineage",
+                                              config.get("busco_run", "lineage_results_dir"))
 
         if config.getboolean("busco_run", "auto-lineage"):
             if lineage_basename.startswith(("bacteria", "archaea", "eukaryota")):
                 busco_run = config_manager.runner
 
-            # It is possible that the following lineages were arrived at either by the Prodigal genetic code shortcut or by
-            # BuscoPlacer. If the former, the run will have already been completed. If the latter it still needs to be done.
+            # It is possible that the following lineages were arrived at either by the Prodigal genetic code shortcut
+            # or by BuscoPlacer. If the former, the run will have already been completed. If the latter it still needs
+            # to be done.
             elif lineage_basename.startswith(("mollicutes", "mycoplasmatales", "entomoplasmatales")) and \
                     os.path.exists(lineage_results_folder):
                 busco_run = config_manager.runner
@@ -190,13 +201,13 @@ def run_BUSCO(params):
             busco_run = BuscoRunner(config)
 
         if os.path.exists(lineage_results_folder):
-            os.rename(lineage_results_folder, os.path.join(main_out_folder, config.get("busco_run", "lineage_results_dir")))
-            busco_run.finish(time.time()-start_time, root_lineage=True)
+            os.rename(lineage_results_folder, os.path.join(main_out_folder,
+                                                           config.get("busco_run", "lineage_results_dir")))
         else:
             busco_run.run_analysis()
-            BuscoRunner.final_results.append(busco_run.analysis.hmmer_results_lines)
+            BuscoRunner.final_results.append(busco_run.analysis.hmmer_runner.hmmer_results_lines)
             BuscoRunner.results_datasets.append(lineage_basename)
-            busco_run.finish(time.time()-start_time)
+        busco_run.finish(time.time()-start_time)
 
     except ToolException as e:
         logger.error(e)
@@ -226,7 +237,8 @@ def run_BUSCO(params):
 
     except BaseException:
         exc_type, exc_value, exc_traceback = sys.exc_info()
-        logger.critical("Unhandled exception occurred:\n{}\n".format("".join(traceback.format_exception(exc_type, exc_value, exc_traceback))))
+        logger.critical("Unhandled exception occurred:\n{}\n".format(
+            "".join(traceback.format_exception(exc_type, exc_value, exc_traceback))))
         raise SystemExit
 
 


=====================================
test_data/bacteria/expected_log.txt
=====================================
@@ -1,4 +1,4 @@
-INFO:	***** Start a BUSCO v4.0.6 analysis, current time: 02/12/2020 16:38:00 *****
+INFO:	***** Start a BUSCO v4.1.2 analysis, current time: 07/01/2020 18:43:08 *****
 INFO:	Configuring BUSCO with /busco/config/config.ini
 INFO:	Mode is genome
 INFO:	Input file is genome.fna
@@ -14,12 +14,12 @@ INFO:	Downloading file 'https://busco-data.ezlab.org/v4/data/lineages/archaea_od
 INFO:	Decompressing file '/busco_wd/busco_downloads/lineages/archaea_odb10.tar.gz'
 INFO:	Running BUSCO using lineage dataset archaea_odb10 (prokaryota, 2019-01-04)
 INFO:	***** Run Prodigal on input to predict and extract genes *****
-INFO:	Running prodigal with genetic code 11 in single mode
-INFO:	Running 1 job(s) on prodigal
+INFO:	Running Prodigal with genetic code 11 in single mode
+INFO:	Running 1 job(s) on prodigal, starting at 07/01/2020 18:43:09
 INFO:	[prodigal]	1 of 1 task(s) completed
 INFO:	Genetic code 11 selected as optimal
 INFO:	***** Run HMMER on gene sequences *****
-INFO:	Running 194 job(s) on hmmsearch
+INFO:	Running 194 job(s) on hmmsearch, starting at 07/01/2020 18:43:10
 INFO:	[hmmsearch]	20 of 194 task(s) completed
 INFO:	[hmmsearch]	39 of 194 task(s) completed
 INFO:	[hmmsearch]	59 of 194 task(s) completed
@@ -38,13 +38,14 @@ INFO:	Running BUSCO using lineage dataset bacteria_odb10 (prokaryota, 2019-06-26
 INFO:	***** Run Prodigal on input to predict and extract genes *****
 INFO:	Genetic code 11 selected as optimal
 INFO:	***** Run HMMER on gene sequences *****
-INFO:	Running 124 job(s) on hmmsearch
+INFO:	Running 124 job(s) on hmmsearch, starting at 07/01/2020 18:43:13
 INFO:	[hmmsearch]	13 of 124 task(s) completed
 INFO:	[hmmsearch]	25 of 124 task(s) completed
 INFO:	[hmmsearch]	38 of 124 task(s) completed
 INFO:	[hmmsearch]	50 of 124 task(s) completed
 INFO:	[hmmsearch]	63 of 124 task(s) completed
 INFO:	[hmmsearch]	75 of 124 task(s) completed
+INFO:	[hmmsearch]	87 of 124 task(s) completed
 INFO:	[hmmsearch]	100 of 124 task(s) completed
 INFO:	[hmmsearch]	112 of 124 task(s) completed
 INFO:	[hmmsearch]	124 of 124 task(s) completed
@@ -53,15 +54,15 @@ INFO:	Results:	C:21.0%[S:21.0%,D:0.0%],F:0.8%,M:78.2%,n:124
 INFO:	Downloading file 'https://busco-data.ezlab.org/v4/data/lineages/eukaryota_odb10.2019-11-20.tar.gz'
 INFO:	Decompressing file '/busco_wd/busco_downloads/lineages/eukaryota_odb10.tar.gz'
 INFO:	Running BUSCO using lineage dataset eukaryota_odb10 (eukaryota, 2019-11-20)
+INFO:	Running 1 job(s) on makeblastdb, starting at 07/01/2020 18:43:16
 INFO:	Creating BLAST database with input file
-INFO:	Running 1 job(s) on makeblastdb
 INFO:	[makeblastdb]	1 of 1 task(s) completed
 INFO:	Running a BLAST search for BUSCOs against created database
-INFO:	Running 1 job(s) on tblastn
+INFO:	Running 1 job(s) on tblastn, starting at 07/01/2020 18:43:16
 INFO:	[tblastn]	1 of 1 task(s) completed
 INFO:	Running Augustus gene predictor on BLAST search results.
 INFO:	Running Augustus prediction using fly as species:
-INFO:	Running 10 job(s) on augustus
+INFO:	Running 10 job(s) on augustus, starting at 07/01/2020 18:43:18
 INFO:	[augustus]	1 of 10 task(s) completed
 INFO:	[augustus]	2 of 10 task(s) completed
 INFO:	[augustus]	3 of 10 task(s) completed
@@ -74,7 +75,7 @@ INFO:	[augustus]	9 of 10 task(s) completed
 INFO:	[augustus]	10 of 10 task(s) completed
 INFO:	Extracting predicted proteins...
 INFO:	***** Run HMMER on gene sequences *****
-INFO:	Running 4 job(s) on hmmsearch
+INFO:	Running 4 job(s) on hmmsearch, starting at 07/01/2020 18:43:51
 INFO:	[hmmsearch]	1 of 4 task(s) completed
 INFO:	[hmmsearch]	2 of 4 task(s) completed
 INFO:	[hmmsearch]	3 of 4 task(s) completed
@@ -85,17 +86,19 @@ INFO:	Results:	C:0.0%[S:0.0%,D:0.0%],F:0.0%,M:100.0%,n:255
 INFO:	Starting second step of analysis. The gene predictor Augustus is retrained using the results from the initial run to yield more accurate results.
 INFO:	Extracting missing and fragmented buscos from the file ancestral_variants...
 INFO:	Running a BLAST search for BUSCOs against created database
+INFO:	Running 1 job(s) on tblastn, starting at 07/01/2020 18:43:52
 INFO:	[tblastn]	1 of 1 task(s) completed
-INFO:	Training Augustus using Single-Copy Complete BUSCOs:
 INFO:	Converting predicted genes to short genbank files
-INFO:	All files converted to short genbank files, now running the training scripts
-INFO:	Running 1 job(s) on new_species.pl
+WARNING:	No jobs to run on gff2gbSmallDNA.pl
+INFO:	All files converted to short genbank files, now training Augustus using Single-Copy Complete BUSCOs
+INFO:	Running 1 job(s) on new_species.pl, starting at 07/01/2020 18:44:07
 INFO:	[new_species.pl]	1 of 1 task(s) completed
-INFO:	Running 1 job(s) on etraining
+INFO:	Running 1 job(s) on etraining, starting at 07/01/2020 18:44:08
 INFO:	[etraining]	1 of 1 task(s) completed
 INFO:	Re-running Augustus with the new metaparameters, number of target BUSCOs: 255
 INFO:	Running Augustus gene predictor on BLAST search results.
 INFO:	Running Augustus prediction using BUSCO_test_bacteria as species:
+INFO:	Running 14 job(s) on augustus, starting at 07/01/2020 18:44:08
 INFO:	[augustus]	2 of 14 task(s) completed
 INFO:	[augustus]	3 of 14 task(s) completed
 INFO:	[augustus]	5 of 14 task(s) completed
@@ -108,12 +111,11 @@ INFO:	[augustus]	13 of 14 task(s) completed
 INFO:	[augustus]	14 of 14 task(s) completed
 INFO:	Extracting predicted proteins...
 INFO:	***** Run HMMER on gene sequences *****
-INFO:	[hmmsearch]	1 of 3 task(s) completed
-INFO:	[hmmsearch]	2 of 3 task(s) completed
-INFO:	[hmmsearch]	3 of 3 task(s) completed
+WARNING:	No jobs to run on hmmsearch
 WARNING:	BUSCO did not find any match. Make sure to check the log files if this is unexpected.
 INFO:	Results:	C:0.0%[S:0.0%,D:0.0%],F:0.0%,M:100.0%,n:255	   
 
+WARNING:	Augustus did not produce a retrained species folder.
 INFO:	bacteria_odb10 selected
 
 INFO:	***** Searching tree for chosen lineage to find best taxonomic match *****
@@ -132,7 +134,7 @@ INFO:	Decompressing file '/busco_wd/busco_downloads/placement_files/mapping_taxi
 INFO:	Downloading file 'https://busco-data.ezlab.org/v4/data/placement_files/mapping_taxid-lineage.bacteria_odb10.2019-12-16.txt.tar.gz'
 INFO:	Decompressing file '/busco_wd/busco_downloads/placement_files/mapping_taxid-lineage.bacteria_odb10.2019-12-16.txt.tar.gz'
 INFO:	Place the markers on the reference tree...
-INFO:	Running 1 job(s) on sepp
+INFO:	Running 1 job(s) on sepp, starting at 07/01/2020 18:44:10
 INFO:	[sepp]	1 of 1 task(s) completed
 INFO:	Not enough markers were placed on the tree (11). Root lineage bacteria is kept
 INFO:	
@@ -148,12 +150,15 @@ INFO:
 	|97	Missing BUSCOs (M)                        |
 	|124	Total BUSCO groups searched               |
 	--------------------------------------------------
-INFO:	BUSCO analysis done with WARNING(s). Total running time: 81 seconds
+INFO:	BUSCO analysis done with WARNING(s). Total running time: 127 seconds
 
 ***** Summary of warnings: *****
 WARNING:busco.ConfigManager	Running Auto Lineage Selector as no lineage dataset was specified. This will take a little longer than normal. If you know what lineage dataset you want to use, please specify this in the config file or using the -l (--lineage-dataset) flag in the command line.
 WARNING:busco.BuscoTools	BUSCO did not find any match. Make sure to check the log files if this is unexpected.
+WARNING:busco.Toolset	No jobs to run on gff2gbSmallDNA.pl
+WARNING:busco.Toolset	No jobs to run on hmmsearch
 WARNING:busco.BuscoTools	BUSCO did not find any match. Make sure to check the log files if this is unexpected.
+WARNING:busco.BuscoTools	Augustus did not produce a retrained species folder.
 
 INFO:	Results written in /busco_wd/test_bacteria
 


=====================================
test_data/eukaryota/expected_log.txt
=====================================
@@ -1,4 +1,4 @@
-INFO:	***** Start a BUSCO v4.0.6 analysis, current time: 02/12/2020 16:40:52 *****
+INFO:	***** Start a BUSCO v4.1.2 analysis, current time: 07/01/2020 17:20:41 *****
 INFO:	Configuring BUSCO with /busco/config/config.ini
 INFO:	Mode is genome
 INFO:	Input file is genome.fna
@@ -14,12 +14,12 @@ INFO:	Downloading file 'https://busco-data.ezlab.org/v4/data/lineages/archaea_od
 INFO:	Decompressing file '/busco_wd/busco_downloads/lineages/archaea_odb10.tar.gz'
 INFO:	Running BUSCO using lineage dataset archaea_odb10 (prokaryota, 2019-01-04)
 INFO:	***** Run Prodigal on input to predict and extract genes *****
-INFO:	Running prodigal with genetic code 11 in single mode
-INFO:	Running 1 job(s) on prodigal
+INFO:	Running Prodigal with genetic code 11 in single mode
+INFO:	Running 1 job(s) on prodigal, starting at 07/01/2020 17:20:42
 INFO:	[prodigal]	1 of 1 task(s) completed
 INFO:	Genetic code 11 selected as optimal
 INFO:	***** Run HMMER on gene sequences *****
-INFO:	Running 194 job(s) on hmmsearch
+INFO:	Running 194 job(s) on hmmsearch, starting at 07/01/2020 17:20:42
 INFO:	[hmmsearch]	20 of 194 task(s) completed
 INFO:	[hmmsearch]	39 of 194 task(s) completed
 INFO:	[hmmsearch]	59 of 194 task(s) completed
@@ -27,6 +27,8 @@ INFO:	[hmmsearch]	78 of 194 task(s) completed
 INFO:	[hmmsearch]	97 of 194 task(s) completed
 INFO:	[hmmsearch]	117 of 194 task(s) completed
 INFO:	[hmmsearch]	136 of 194 task(s) completed
+INFO:	[hmmsearch]	156 of 194 task(s) completed
+INFO:	[hmmsearch]	156 of 194 task(s) completed
 INFO:	[hmmsearch]	175 of 194 task(s) completed
 INFO:	[hmmsearch]	194 of 194 task(s) completed
 INFO:	Results:	C:1.0%[S:1.0%,D:0.0%],F:0.5%,M:98.5%,n:194	   
@@ -37,7 +39,7 @@ INFO:	Running BUSCO using lineage dataset bacteria_odb10 (prokaryota, 2019-06-26
 INFO:	***** Run Prodigal on input to predict and extract genes *****
 INFO:	Genetic code 11 selected as optimal
 INFO:	***** Run HMMER on gene sequences *****
-INFO:	Running 124 job(s) on hmmsearch
+INFO:	Running 124 job(s) on hmmsearch, starting at 07/01/2020 17:20:45
 INFO:	[hmmsearch]	13 of 124 task(s) completed
 INFO:	[hmmsearch]	25 of 124 task(s) completed
 INFO:	[hmmsearch]	38 of 124 task(s) completed
@@ -54,15 +56,15 @@ INFO:	Results:	C:0.0%[S:0.0%,D:0.0%],F:0.0%,M:100.0%,n:124
 INFO:	Downloading file 'https://busco-data.ezlab.org/v4/data/lineages/eukaryota_odb10.2019-11-20.tar.gz'
 INFO:	Decompressing file '/busco_wd/busco_downloads/lineages/eukaryota_odb10.tar.gz'
 INFO:	Running BUSCO using lineage dataset eukaryota_odb10 (eukaryota, 2019-11-20)
+INFO:	Running 1 job(s) on makeblastdb, starting at 07/01/2020 17:20:48
 INFO:	Creating BLAST database with input file
-INFO:	Running 1 job(s) on makeblastdb
 INFO:	[makeblastdb]	1 of 1 task(s) completed
 INFO:	Running a BLAST search for BUSCOs against created database
-INFO:	Running 1 job(s) on tblastn
+INFO:	Running 1 job(s) on tblastn, starting at 07/01/2020 17:20:48
 INFO:	[tblastn]	1 of 1 task(s) completed
 INFO:	Running Augustus gene predictor on BLAST search results.
 INFO:	Running Augustus prediction using fly as species:
-INFO:	Running 52 job(s) on augustus
+INFO:	Running 52 job(s) on augustus, starting at 07/01/2020 17:20:48
 INFO:	[augustus]	6 of 52 task(s) completed
 INFO:	[augustus]	11 of 52 task(s) completed
 INFO:	[augustus]	16 of 52 task(s) completed
@@ -75,7 +77,7 @@ INFO:	[augustus]	47 of 52 task(s) completed
 INFO:	[augustus]	52 of 52 task(s) completed
 INFO:	Extracting predicted proteins...
 INFO:	***** Run HMMER on gene sequences *****
-INFO:	Running 50 job(s) on hmmsearch
+INFO:	Running 50 job(s) on hmmsearch, starting at 07/01/2020 17:21:44
 INFO:	[hmmsearch]	5 of 50 task(s) completed
 INFO:	[hmmsearch]	10 of 50 task(s) completed
 INFO:	[hmmsearch]	15 of 50 task(s) completed
@@ -91,10 +93,10 @@ INFO:	Results:	C:15.3%[S:15.3%,D:0.0%],F:1.2%,M:83.5%,n:255
 INFO:	Starting second step of analysis. The gene predictor Augustus is retrained using the results from the initial run to yield more accurate results.
 INFO:	Extracting missing and fragmented buscos from the file ancestral_variants...
 INFO:	Running a BLAST search for BUSCOs against created database
+INFO:	Running 1 job(s) on tblastn, starting at 07/01/2020 17:21:45
 INFO:	[tblastn]	1 of 1 task(s) completed
-INFO:	Training Augustus using Single-Copy Complete BUSCOs:
 INFO:	Converting predicted genes to short genbank files
-INFO:	Running 39 job(s) on gff2gbSmallDNA.pl
+INFO:	Running 39 job(s) on gff2gbSmallDNA.pl, starting at 07/01/2020 17:21:49
 INFO:	[gff2gbSmallDNA.pl]	4 of 39 task(s) completed
 INFO:	[gff2gbSmallDNA.pl]	8 of 39 task(s) completed
 INFO:	[gff2gbSmallDNA.pl]	12 of 39 task(s) completed
@@ -105,14 +107,15 @@ INFO:	[gff2gbSmallDNA.pl]	28 of 39 task(s) completed
 INFO:	[gff2gbSmallDNA.pl]	32 of 39 task(s) completed
 INFO:	[gff2gbSmallDNA.pl]	36 of 39 task(s) completed
 INFO:	[gff2gbSmallDNA.pl]	39 of 39 task(s) completed
-INFO:	All files converted to short genbank files, now running the training scripts
-INFO:	Running 1 job(s) on new_species.pl
+INFO:	All files converted to short genbank files, now training Augustus using Single-Copy Complete BUSCOs
+INFO:	Running 1 job(s) on new_species.pl, starting at 07/01/2020 17:21:49
 INFO:	[new_species.pl]	1 of 1 task(s) completed
-INFO:	Running 1 job(s) on etraining
+INFO:	Running 1 job(s) on etraining, starting at 07/01/2020 17:21:50
 INFO:	[etraining]	1 of 1 task(s) completed
 INFO:	Re-running Augustus with the new metaparameters, number of target BUSCOs: 216
 INFO:	Running Augustus gene predictor on BLAST search results.
 INFO:	Running Augustus prediction using BUSCO_test_eukaryota as species:
+INFO:	Running 39 job(s) on augustus, starting at 07/01/2020 17:21:50
 INFO:	[augustus]	4 of 39 task(s) completed
 INFO:	[augustus]	8 of 39 task(s) completed
 INFO:	[augustus]	12 of 39 task(s) completed
@@ -125,17 +128,18 @@ INFO:	[augustus]	36 of 39 task(s) completed
 INFO:	[augustus]	39 of 39 task(s) completed
 INFO:	Extracting predicted proteins...
 INFO:	***** Run HMMER on gene sequences *****
-INFO:	[hmmsearch]	4 of 37 task(s) completed
-INFO:	[hmmsearch]	8 of 37 task(s) completed
-INFO:	[hmmsearch]	12 of 37 task(s) completed
-INFO:	[hmmsearch]	15 of 37 task(s) completed
-INFO:	[hmmsearch]	19 of 37 task(s) completed
-INFO:	[hmmsearch]	23 of 37 task(s) completed
-INFO:	[hmmsearch]	26 of 37 task(s) completed
-INFO:	[hmmsearch]	30 of 37 task(s) completed
-INFO:	[hmmsearch]	34 of 37 task(s) completed
-INFO:	[hmmsearch]	37 of 37 task(s) completed
-INFO:	Results:	C:18.8%[S:18.8%,D:0.0%],F:0.4%,M:80.8%,n:255	   
+INFO:	Running 34 job(s) on hmmsearch, starting at 07/01/2020 17:22:01
+INFO:	[hmmsearch]	4 of 34 task(s) completed
+INFO:	[hmmsearch]	7 of 34 task(s) completed
+INFO:	[hmmsearch]	11 of 34 task(s) completed
+INFO:	[hmmsearch]	14 of 34 task(s) completed
+INFO:	[hmmsearch]	17 of 34 task(s) completed
+INFO:	[hmmsearch]	21 of 34 task(s) completed
+INFO:	[hmmsearch]	24 of 34 task(s) completed
+INFO:	[hmmsearch]	28 of 34 task(s) completed
+INFO:	[hmmsearch]	31 of 34 task(s) completed
+INFO:	[hmmsearch]	34 of 34 task(s) completed
+INFO:	Results:	C:18.8%[S:18.8%,D:0.0%],F:1.2%,M:80.0%,n:255	   
 
 INFO:	eukaryota_odb10 selected
 
@@ -155,18 +159,18 @@ INFO:	Decompressing file '/busco_wd/busco_downloads/placement_files/mapping_taxi
 INFO:	Downloading file 'https://busco-data.ezlab.org/v4/data/placement_files/mapping_taxid-lineage.eukaryota_odb10.2019-12-16.txt.tar.gz'
 INFO:	Decompressing file '/busco_wd/busco_downloads/placement_files/mapping_taxid-lineage.eukaryota_odb10.2019-12-16.txt.tar.gz'
 INFO:	Place the markers on the reference tree...
-INFO:	Running 1 job(s) on sepp
+INFO:	Running 1 job(s) on sepp, starting at 07/01/2020 17:22:02
 INFO:	[sepp]	1 of 1 task(s) completed
 INFO:	Lineage saccharomycetes is selected, supported by 16 markers out of 17
 INFO:	Downloading file 'https://busco-data.ezlab.org/v4/data/lineages/saccharomycetes_odb10.2019-11-20.tar.gz'
 INFO:	Decompressing file '/busco_wd/busco_downloads/lineages/saccharomycetes_odb10.tar.gz'
 INFO:	Running BUSCO using lineage dataset saccharomycetes_odb10 (eukaryota, 2019-11-20)
 INFO:	Running a BLAST search for BUSCOs against created database
-INFO:	Running 1 job(s) on tblastn
+INFO:	Running 1 job(s) on tblastn, starting at 07/01/2020 17:25:10
 INFO:	[tblastn]	1 of 1 task(s) completed
 INFO:	Running Augustus gene predictor on BLAST search results.
 INFO:	Running Augustus prediction using aspergillus_nidulans as species:
-INFO:	Running 98 job(s) on augustus
+INFO:	Running 98 job(s) on augustus, starting at 07/01/2020 17:25:14
 INFO:	[augustus]	10 of 98 task(s) completed
 INFO:	[augustus]	20 of 98 task(s) completed
 INFO:	[augustus]	30 of 98 task(s) completed
@@ -179,7 +183,7 @@ INFO:	[augustus]	89 of 98 task(s) completed
 INFO:	[augustus]	98 of 98 task(s) completed
 INFO:	Extracting predicted proteins...
 INFO:	***** Run HMMER on gene sequences *****
-INFO:	Running 63 job(s) on hmmsearch
+INFO:	Running 63 job(s) on hmmsearch, starting at 07/01/2020 17:25:54
 INFO:	[hmmsearch]	7 of 63 task(s) completed
 INFO:	[hmmsearch]	13 of 63 task(s) completed
 INFO:	[hmmsearch]	19 of 63 task(s) completed
@@ -193,10 +197,10 @@ INFO:	[hmmsearch]	63 of 63 task(s) completed
 INFO:	Starting second step of analysis. The gene predictor Augustus is retrained using the results from the initial run to yield more accurate results.
 INFO:	Extracting missing and fragmented buscos from the file ancestral_variants...
 INFO:	Running a BLAST search for BUSCOs against created database
+INFO:	Running 1 job(s) on tblastn, starting at 07/01/2020 17:26:02
 INFO:	[tblastn]	1 of 1 task(s) completed
-INFO:	Training Augustus using Single-Copy Complete BUSCOs:
 INFO:	Converting predicted genes to short genbank files
-INFO:	Running 29 job(s) on gff2gbSmallDNA.pl
+INFO:	Running 29 job(s) on gff2gbSmallDNA.pl, starting at 07/01/2020 17:27:08
 INFO:	[gff2gbSmallDNA.pl]	3 of 29 task(s) completed
 INFO:	[gff2gbSmallDNA.pl]	6 of 29 task(s) completed
 INFO:	[gff2gbSmallDNA.pl]	9 of 29 task(s) completed
@@ -207,14 +211,16 @@ INFO:	[gff2gbSmallDNA.pl]	21 of 29 task(s) completed
 INFO:	[gff2gbSmallDNA.pl]	24 of 29 task(s) completed
 INFO:	[gff2gbSmallDNA.pl]	27 of 29 task(s) completed
 INFO:	[gff2gbSmallDNA.pl]	29 of 29 task(s) completed
-INFO:	All files converted to short genbank files, now running the training scripts
-INFO:	Running 1 job(s) on new_species.pl
+INFO:	[gff2gbSmallDNA.pl]	29 of 29 task(s) completed
+INFO:	All files converted to short genbank files, now training Augustus using Single-Copy Complete BUSCOs
+INFO:	Running 1 job(s) on new_species.pl, starting at 07/01/2020 17:27:09
 INFO:	[new_species.pl]	1 of 1 task(s) completed
-INFO:	Running 1 job(s) on etraining
+INFO:	Running 1 job(s) on etraining, starting at 07/01/2020 17:27:09
 INFO:	[etraining]	1 of 1 task(s) completed
 INFO:	Re-running Augustus with the new metaparameters, number of target BUSCOs: 2108
 INFO:	Running Augustus gene predictor on BLAST search results.
 INFO:	Running Augustus prediction using BUSCO_test_eukaryota as species:
+INFO:	Running 147 job(s) on augustus, starting at 07/01/2020 17:27:10
 INFO:	[augustus]	15 of 147 task(s) completed
 INFO:	[augustus]	30 of 147 task(s) completed
 INFO:	[augustus]	45 of 147 task(s) completed
@@ -227,42 +233,45 @@ INFO:	[augustus]	133 of 147 task(s) completed
 INFO:	[augustus]	147 of 147 task(s) completed
 INFO:	Extracting predicted proteins...
 INFO:	***** Run HMMER on gene sequences *****
-INFO:	[hmmsearch]	15 of 144 task(s) completed
-INFO:	[hmmsearch]	29 of 144 task(s) completed
-INFO:	[hmmsearch]	44 of 144 task(s) completed
-INFO:	[hmmsearch]	58 of 144 task(s) completed
-INFO:	[hmmsearch]	73 of 144 task(s) completed
-INFO:	[hmmsearch]	116 of 144 task(s) completed
-INFO:	[hmmsearch]	130 of 144 task(s) completed
-INFO:	[hmmsearch]	144 of 144 task(s) completed
-INFO:	Results:	C:2.0%[S:2.0%,D:0.0%],F:0.1%,M:97.9%,n:2137	   
+INFO:	Running 140 job(s) on hmmsearch, starting at 07/01/2020 17:27:58
+INFO:	[hmmsearch]	14 of 140 task(s) completed
+INFO:	[hmmsearch]	28 of 140 task(s) completed
+INFO:	[hmmsearch]	42 of 140 task(s) completed
+INFO:	[hmmsearch]	56 of 140 task(s) completed
+INFO:	[hmmsearch]	70 of 140 task(s) completed
+INFO:	[hmmsearch]	84 of 140 task(s) completed
+INFO:	[hmmsearch]	98 of 140 task(s) completed
+INFO:	[hmmsearch]	112 of 140 task(s) completed
+INFO:	[hmmsearch]	126 of 140 task(s) completed
+INFO:	[hmmsearch]	140 of 140 task(s) completed
+INFO:	Results:	C:2.0%[S:2.0%,D:0.0%],F:0.3%,M:97.7%,n:2137	   
 
 INFO:	
 
 	--------------------------------------------------
 	|Results from generic domain eukaryota_odb10      |
 	--------------------------------------------------
-	|C:18.8%[S:18.8%,D:0.0%],F:0.4%,M:80.8%,n:255     |
+	|C:18.8%[S:18.8%,D:0.0%],F:1.2%,M:80.0%,n:255     |
 	|48	Complete BUSCOs (C)                       |
 	|48	Complete and single-copy BUSCOs (S)       |
 	|0	Complete and duplicated BUSCOs (D)        |
-	|1	Fragmented BUSCOs (F)                     |
-	|206	Missing BUSCOs (M)                        |
+	|3	Fragmented BUSCOs (F)                     |
+	|204	Missing BUSCOs (M)                        |
 	|255	Total BUSCO groups searched               |
 	--------------------------------------------------
 
 	--------------------------------------------------
 	|Results from dataset saccharomycetes_odb10       |
 	--------------------------------------------------
-	|C:2.0%[S:2.0%,D:0.0%],F:0.1%,M:97.9%,n:2137      |
+	|C:2.0%[S:2.0%,D:0.0%],F:0.3%,M:97.7%,n:2137      |
 	|42	Complete BUSCOs (C)                       |
 	|42	Complete and single-copy BUSCOs (S)       |
 	|0	Complete and duplicated BUSCOs (D)        |
-	|3	Fragmented BUSCOs (F)                     |
-	|2092	Missing BUSCOs (M)                        |
+	|6	Fragmented BUSCOs (F)                     |
+	|2089	Missing BUSCOs (M)                        |
 	|2137	Total BUSCO groups searched               |
 	--------------------------------------------------
-INFO:	BUSCO analysis done with WARNING(s). Total running time: 212 seconds
+INFO:	BUSCO analysis done with WARNING(s). Total running time: 440 seconds
 
 ***** Summary of warnings: *****
 WARNING:busco.ConfigManager	Running Auto Lineage Selector as no lineage dataset was specified. This will take a little longer than normal. If you know what lineage dataset you want to use, please specify this in the config file or using the -l (--lineage-dataset) flag in the command line.



View it on GitLab: https://salsa.debian.org/med-team/busco/-/compare/7e9907ffb042987f6086f0e34d62b6dc3ae2cf72...1c4781f6b08bd870d527196d9dae09387c727d0b

-- 
View it on GitLab: https://salsa.debian.org/med-team/busco/-/compare/7e9907ffb042987f6086f0e34d62b6dc3ae2cf72...1c4781f6b08bd870d527196d9dae09387c727d0b
You're receiving this email because of your account on salsa.debian.org.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://alioth-lists.debian.net/pipermail/debian-med-commit/attachments/20200727/eafe0078/attachment-0001.html>


More information about the debian-med-commit mailing list