[med-svn] [Git][med-team/busco][master] 5 commits: routine-update: New upstream version

Andreas Tille (@tille) gitlab at salsa.debian.org
Tue Jul 4 10:55:25 BST 2023



Andreas Tille pushed to branch master at Debian Med / busco


Commits:
6fbb71b1 by Andreas Tille at 2023-07-04T11:36:43+02:00
routine-update: New upstream version

- - - - -
d23c473b by Andreas Tille at 2023-07-04T11:36:44+02:00
New upstream version 5.4.7
- - - - -
94e194a9 by Andreas Tille at 2023-07-04T11:36:46+02:00
Update upstream source from tag 'upstream/5.4.7'

Update to upstream version '5.4.7'
with Debian dir d59932b909f81a561e87df518ac14f4897742a1b
- - - - -
d697244b by Andreas Tille at 2023-07-04T11:36:56+02:00
Set upstream metadata fields: Bug-Database, Bug-Submit.

Changes-By: lintian-brush

- - - - -
cfad236d by Andreas Tille at 2023-07-04T11:38:12+02:00
routine-update: Ready to upload to unstable

- - - - -


29 changed files:

- CHANGELOG
- LICENSE
- README.md
- bin/busco
- debian/changelog
- debian/upstream/metadata
- scripts/generate_plot.py
- setup.py
- src/busco/BuscoConfig.py
- src/busco/BuscoDownloadManager.py
- src/busco/BuscoLogger.py
- src/busco/BuscoPlacer.py
- src/busco/BuscoRunner.py
- src/busco/ConfigManager.py
- src/busco/__init__.py
- src/busco/_version.py
- src/busco/analysis/BuscoAnalysis.py
- src/busco/analysis/GeneSetAnalysis.py
- src/busco/analysis/GenomeAnalysis.py
- src/busco/analysis/TranscriptomeAnalysis.py
- src/busco/busco_tools/Toolset.py
- src/busco/busco_tools/base.py
- src/busco/busco_tools/hmmer.py
- src/busco/busco_tools/metaeuk.py
- src/busco/run_BUSCO.py
- test_data/bacteria/expected_log.txt
- test_data/eukaryota/expected_log.txt
- tests/unittests/AutoLineage_unittests.py
- tests/unittests/GenomeAnalysis_unittests.py


Changes:

=====================================
CHANGELOG
=====================================
@@ -1,8 +1,24 @@
+5.4.7
+- Fix bug in overlap handling (Issue #653): this fix also updated the way negative strand coordinates are reported,
+i.e. <gene_id>:<start>-<stop> instead of <gene_id>:<low>-<high>
+- Fix bug in HMMER result filtering (Issue #661)
+- Fix bug in sequence trimming when exons are removed
+
+5.4.6
+- Fix mode setting bug in batch mode autolineage (Issue #636)
+- Fix tab discrepancy bug in batch_summary.txt (Issue #643)
+- Fix coordinate bug in overlap handling (Issue #644)
+
+5.4.5
+- Fix bug in overlap handling (Issues #627, #633)
+- Fix bug in parasitic check (Issue #594)
+
 5.4.4
 - Fix bug in tar option (Issue #591)
 - Fix edge case bug in overlap handling (Issue #592)
 - Fix overlap adjustment algorithm and trim reported sequences
 - Fix file open mode (Issue #622)
+- Efficiency improvements with more parallel processing
 
 5.4.3
 - Fix bug in augustus --long pipeline (Issue #586)


=====================================
LICENSE
=====================================
@@ -1,6 +1,6 @@
 The MIT License (MIT)
 
-Copyright (c) 2016-2022, Evgeny Zdobnov (ez at ezlab.org)
+Copyright (c) 2016-2023, Evgeny Zdobnov (ez at ezlab.org)
 
 Permission is hereby granted, free of charge, to any person obtaining a copy
 of this software and associated documentation files (the "Software"), to deal


=====================================
README.md
=====================================
@@ -43,5 +43,5 @@ BUSCO applications from quality assessments to gene prediction and phylogenomics
 
 BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Felipe A. Simão, Robert M. Waterhouse, Panagiotis Ioannidis, Evgenia V. Kriventseva, and Evgeny M. Zdobnov Bioinformatics, published online June 9, 2015 doi: 10.1093/bioinformatics/btv351
 
-Copyright (c) 2016-2022, Evgeny Zdobnov (ez at ezlab.org)
+Copyright (c) 2016-2023, Evgeny Zdobnov (ez at ezlab.org)
 Licensed under the MIT license. See LICENSE.md file.


=====================================
bin/busco
=====================================
@@ -2,6 +2,7 @@
 
 
 if __name__ == "__main__":
+    __spec__ = None
     try:
         from busco import run_BUSCO
     except ImportError as err:


=====================================
debian/changelog
=====================================
@@ -1,3 +1,9 @@
+busco (5.4.7-1) unstable; urgency=medium
+
+  * New upstream version
+
+ -- Andreas Tille <tille at debian.org>  Tue, 04 Jul 2023 11:37:13 +0200
+
 busco (5.4.4-1) unstable; urgency=medium
 
   * Team Upload.


=====================================
debian/upstream/metadata
=====================================
@@ -1,5 +1,5 @@
-Bug-Database: https://gitlab.com/ezlab/busco/issues
-Bug-Submit: https://gitlab.com/ezlab/busco/issues/new
+Bug-Database: https://gitlab.com/ezlab/busco/-/issues
+Bug-Submit: https://gitlab.com/ezlab/busco/-/issues/new
 Reference:
  - Author: >
         Mathieu Seppey and Mosè Manni and Evgeny M. Zdobnov


=====================================
scripts/generate_plot.py
=====================================
@@ -19,7 +19,7 @@ automatically runs it.
 
 You can find both the resulting R script for customisation and the figure in the working directory.
 
-Copyright (c) 2016-2022, Evgeny Zdobnov (ez at ezlab.org)
+Copyright (c) 2016-2023, Evgeny Zdobnov (ez at ezlab.org)
 Licensed under the MIT license. See LICENSE.md file.
 
 """
@@ -54,7 +54,7 @@ RCODE = (
     "# @version 4.0.0\n"
     "# @since BUSCO 2.0.0\n"
     "# \n"
-    "# Copyright (c) 2016-2022, Evgeny Zdobnov (ez at ezlab.org)\n"
+    "# Copyright (c) 2016-2023, Evgeny Zdobnov (ez at ezlab.org)\n"
     "# Licensed under the MIT license. See LICENSE.md file.\n"
     "#\n"
     "######################################\n"


=====================================
setup.py
=====================================
@@ -4,7 +4,7 @@
 .. versionadded:: 3.0.0
 .. versionchanged:: 5.0.0
 
-Copyright (c) 2016-2022, Evgeny Zdobnov (ez at ezlab.org)
+Copyright (c) 2016-2023, Evgeny Zdobnov (ez at ezlab.org)
 Licensed under the MIT license. See LICENSE.md file.
 
 This script proceeds to the BUSCO packages installation


=====================================
src/busco/BuscoConfig.py
=====================================
@@ -206,31 +206,31 @@ class BaseConfig(ConfigParser, metaclass=ABCMeta):
     def update_mode(self):
         mode = self.get("busco_run", "mode")
         domain = self.get("busco_run", "domain")
-        if mode == "genome":
+        if "genome" in mode:
             if domain in ["prokaryota", "viruses"]:
-                self.set("busco_run", "mode", "prok_genome")
+                mode = "prok_genome"
             elif domain == "eukaryota":
                 if self.getboolean("busco_run", "use_augustus"):
-                    self.set("busco_run", "mode", "euk_genome_aug")
+                    mode = "euk_genome_aug"
                 else:
-                    self.set("busco_run", "mode", "euk_genome_met")
+                    mode = "euk_genome_met"
             else:
-                raise BatchFatalError(
-                    "Unrecognized mode {}".format(self.get("busco_run", "mode"))
-                )
+                raise BatchFatalError("Unrecognized mode {}".format(mode))
 
         elif mode == "transcriptome":
             if domain == "prokaryota":
-                self.set("busco_run", "mode", "prok_tran")
+                mode = "prok_tran"
             elif domain == "eukaryota":
-                self.set("busco_run", "mode", "euk_tran")
+                mode = "euk_tran"
             elif domain == "viruses":
-                self.set(
-                    "busco_run", "mode", "prok_genome"
-                )  # Suggested by Mose - Prodigal may perform better on viruses than BLAST + HMMER.
+                mode = "prok_genome"  # Suggested by Mose - Prodigal may perform better on viruses
+                # than BLAST + HMMER.
+
             else:
                 raise BatchFatalError("Unrecognized mode {}".format(mode))
 
+        return mode, domain
+
     def add_mode_specific_parameters(self):
         mode = self.get("busco_run", "mode")
         if mode == "euk_genome_aug":
@@ -283,7 +283,9 @@ class BaseConfig(ConfigParser, metaclass=ABCMeta):
                     "domain"
                 ]  # Necessary to set domain kw first to enable augustus and prodigal arguments to be handled properly
                 self.set("busco_run", "domain", domain)
-                self.update_mode()
+                mode, domain = self.update_mode()
+                self.set("busco_run", "mode", mode)
+                self.set("busco_run", "domain", domain)
                 del dataset_kwargs["domain"]
                 for key, value in dataset_kwargs.items():
                     if key == "species":
@@ -316,7 +318,7 @@ class BaseConfig(ConfigParser, metaclass=ABCMeta):
                     ]:
                         if self.get("busco_run", "mode") in [
                             "euk_genome_met",
-                            "euk_tran",
+                            "euk_tran"
                         ]:
                             self.set("busco_run", key, value)
                     else:
@@ -336,6 +338,18 @@ class BaseConfig(ConfigParser, metaclass=ABCMeta):
         if not os.path.exists(main_out):
             os.makedirs(main_out)
 
+    def reset(self):
+        options_to_reset = ["domain_run_name", "ambiguous_cd_range_lower", "ambiguous_cd_range_upper", "creation_date",
+                            "domain", "in", "lineage_results_dir", "name", "number_of_buscos",
+                            "number_of_species", "main_out", "prodigal_genetic_code"]
+        for option in options_to_reset:
+            try:
+                self.remove_option("busco_run", option)
+            except NoOptionError:
+                continue
+        if self.getboolean("busco_run", "auto-lineage"):
+            self.remove_option("busco_run", "lineage_dataset")
+
     def set_results_dirname(self, lineage):
         self.set(
             "busco_run",
@@ -653,7 +667,9 @@ class BuscoConfigMain(BaseConfig):
                     ):
                         raise BatchFatalError(
                             "Unknown mode {}.\n'Mode' parameter must be one of "
-                            "['genome', 'transcriptome', 'proteins']".format(value)
+                            "['genome', 'transcriptome', 'proteins']".format(
+                                value
+                            )
                         )
                     if value in synonyms["genome"]:
                         self.set("busco_run", "mode", "genome")


=====================================
src/busco/BuscoDownloadManager.py
=====================================
@@ -6,7 +6,7 @@
 .. versionadded:: 4.0.0
 .. versionchanged:: 5.4.0
 
-Copyright (c) 2016-2022, Evgeny Zdobnov (ez at ezlab.org)
+Copyright (c) 2016-2023, Evgeny Zdobnov (ez at ezlab.org)
 Licensed under the MIT license. See LICENSE.md file.
 
 """


=====================================
src/busco/BuscoLogger.py
=====================================
@@ -8,7 +8,7 @@
 
 This is a logger for the pipeline that extends the default Python logger class
 
-Copyright (c) 2016-2022, Evgeny Zdobnov (ez at ezlab.org)
+Copyright (c) 2016-2023, Evgeny Zdobnov (ez at ezlab.org)
 Licensed under the MIT license. See LICENSE.md file.
 
 """


=====================================
src/busco/BuscoPlacer.py
=====================================
@@ -8,7 +8,7 @@
 .. versionadded:: 4.0.0
 .. versionchanged:: 5.4.0
 
-Copyright (c) 2016-2022, Evgeny Zdobnov (ez at ezlab.org)
+Copyright (c) 2016-2023, Evgeny Zdobnov (ez at ezlab.org)
 Licensed under the MIT license. See LICENSE.md file.
 
 """


=====================================
src/busco/BuscoRunner.py
=====================================
@@ -107,6 +107,7 @@ class SingleRunner:
     def reset(self):
         for runner in type(self).all_runners:
             runner.reset()
+            runner.config.reset()
             runner.analysis.reset()
         type(self).all_runners = set()
 
@@ -160,9 +161,14 @@ class SingleRunner:
 
             if self.config.getboolean("busco_run", "tar"):
                 self.compress_folders()
-            self.compile_summary()
+            try:
+                self.compile_summary()
+            except AttributeError:
+                raise BatchFatalError(
+                    "BUSCO encountered a problem. This is possibly caused by restarting a previously completed run with different parameters.")
             self.runner.finish(time.time() - self.start_time)
 
+
         except BuscoError:
             if self.runner is not None:
                 type(self).all_runners.add(self.runner)
@@ -268,14 +274,14 @@ class BatchRunner:
             except BuscoError as be:
                 if "did not recognize any genes" in be.value:
                     type(self).batch_results.append(
-                        "{}\tNo genes found\t\t\t\t\t\t\t\t\t\t\t\t\t\n".format(
-                            os.path.basename(input_file)
+                        "{}\tNo genes found\t\t\t\t\t\t\t\t\t\t{}\n".format(
+                            os.path.basename(input_file), "\t\t\t"*int(single_run.config.getboolean("busco_run", "auto-lineage"))
                         )
                     )
                 else:
                     type(self).batch_results.append(
-                        "{}\tRun failed; check logs\t\t\t\t\t\t\t\t\t\t\t\t\t\n".format(
-                            os.path.basename(input_file)
+                        "{}\tRun failed; check logs\t\t\t\t\t\t\t\t\t\t{}\n".format(
+                            os.path.basename(input_file), "\t\t\t"*int(single_run.config.getboolean("busco_run", "auto-lineage"))
                         )
                     )
                 logger.error(be.value)
@@ -345,33 +351,11 @@ class AnalysisRunner:
         setattr(BuscoAnalysis, "config", config)
 
         self.input_file = self.config.get("busco_run", "in")
-        self.mode = self.config.get("busco_run", "mode")
-        self.domain = self.config.get("busco_run", "domain")
+        self.mode, self.domain = self.config.update_mode()
         self.lineage_basename = os.path.basename(
             self.config.get("busco_run", "lineage_dataset")
         )
 
-        if self.mode == "genome":
-            if self.domain in ["prokaryota", "viruses"]:
-                self.mode = "prok_genome"
-            elif self.domain == "eukaryota":
-                if self.config.getboolean("busco_run", "use_augustus"):
-                    self.mode = "euk_genome_aug"
-                else:
-                    self.mode = "euk_genome_met"
-            else:
-                raise BatchFatalError("Unrecognized mode {}".format(self.mode))
-
-        elif self.mode == "transcriptome":
-            if self.domain == "prokaryota":
-                self.mode = "prok_tran"
-            elif self.domain == "eukaryota":
-                self.mode = "euk_tran"
-            elif self.domain == "viruses":
-                self.mode = "prok_genome"  # Suggested by Mose - Prodigal may perform better on viruses
-                # than BLAST + HMMER.
-            else:
-                raise BatchFatalError("Unrecognized mode {}".format(self.mode))
         analysis_type = type(self).mode_dict[self.mode]
         self.analysis = analysis_type()
         self.summary = {
@@ -759,7 +743,7 @@ class AnalysisRunner:
                         auto_lineage_line = "\nConsider using the auto-lineage mode to select a more specific lineage."
                         final_output_results.append(auto_lineage_line)
                     with open(
-                        self.analysis.hmmer_runner.short_summary_file, "a"
+                        self.short_summary_file, "a"
                     ) as short_summary_file:
                         short_summary_file.write(positive_parasitic_line)
 


=====================================
src/busco/ConfigManager.py
=====================================
@@ -6,7 +6,7 @@
 .. versionadded:: 3.0.0
 .. versionchanged:: 5.4.0
 
-Copyright (c) 2016-2022, Evgeny Zdobnov (ez at ezlab.org)
+Copyright (c) 2016-2023, Evgeny Zdobnov (ez at ezlab.org)
 Licensed under the MIT license. See LICENSE.md file.
 
 """


=====================================
src/busco/__init__.py
=====================================
@@ -5,7 +5,7 @@
    :synopsis: BUSCO - Benchmarking Universal Single-Copy Orthologs.
 
 
-Copyright (c) 2016-2022, Evgeny Zdobnov (ez at ezlab.org)
+Copyright (c) 2016-2023, Evgeny Zdobnov (ez at ezlab.org)
 Licensed under the MIT license. See LICENSE.md file.
 
 """


=====================================
src/busco/_version.py
=====================================
@@ -2,8 +2,8 @@
 # coding: utf-8
 """
 
-Copyright (c) 2016-2022, Evgeny Zdobnov (ez at ezlab.org)
+Copyright (c) 2016-2023, Evgeny Zdobnov (ez at ezlab.org)
 Licensed under the MIT license. See LICENSE.md file.
 
 """
-__version__ = "5.4.4"
+__version__ = "5.4.7"


=====================================
src/busco/analysis/BuscoAnalysis.py
=====================================
@@ -6,7 +6,7 @@
 .. versionadded:: 3.0.0
 .. versionchanged:: 5.4.0
 
-Copyright (c) 2016-2022, Evgeny Zdobnov (ez at ezlab.org)
+Copyright (c) 2016-2023, Evgeny Zdobnov (ez at ezlab.org)
 Licensed under the MIT license. See LICENSE.md file.
 """
 
@@ -154,7 +154,6 @@ class BuscoAnalysis(metaclass=ABCMeta):
 
         :raises BuscoError: if the dataset is missing files or folders
         """
-
         # Check hmm files exist
         files = os.listdir(os.path.join(self.lineage_dataset, "hmms"))
         if not files:


=====================================
src/busco/analysis/GeneSetAnalysis.py
=====================================
@@ -6,7 +6,7 @@
 .. versionadded:: 3.0.0
 .. versionchanged:: 5.4.0
 
-Copyright (c) 2016-2022, Evgeny Zdobnov (ez at ezlab.org)
+Copyright (c) 2016-2023, Evgeny Zdobnov (ez at ezlab.org)
 Licensed under the MIT license. See LICENSE.md file.
 
 """


=====================================
src/busco/analysis/GenomeAnalysis.py
=====================================
@@ -4,9 +4,9 @@
 .. module:: GenomeAnalysis
    :synopsis: GenomeAnalysis implements genome analysis specifics
 .. versionadded:: 3.0.0
-.. versionchanged:: 5.4.0
+.. versionchanged:: 5.4.7
 
-Copyright (c) 2016-2022, Evgeny Zdobnov (ez at ezlab.org)
+Copyright (c) 2016-2023, Evgeny Zdobnov (ez at ezlab.org)
 Licensed under the MIT license. See LICENSE.md file.
 
 """
@@ -372,6 +372,7 @@ class GenomeAnalysisEukaryotesMetaeuk(GenomeAnalysisEukaryotes):
         super().__init__()
         self.metaeuk_runner = None
         self.gene_details = {}
+        self.gene_update_mapping = defaultdict(dict)
 
     def init_tools(self):
         super().init_tools()
@@ -498,8 +499,13 @@ class GenomeAnalysisEukaryotesMetaeuk(GenomeAnalysisEukaryotes):
         exon_records = []
         for busco_id, gene_match in busco_dict.items():
             for gene_id, details in gene_match.items():
-                sequence, coords = details[0]["orig gene ID"].rsplit(":", 1)
-                gene_start, gene_end = coords.split("-")
+                gene_start, gene_end = gene_id.split(":")[-1].split("-")
+                low_coord = min([int(gene_start), int(gene_end)])
+                high_coord = max([int(gene_start), int(gene_end)])
+                sequence, orig_gene_coords = details[0]["orig gene ID"].rsplit(":", 1)
+                orig_gene_start, orig_gene_end = orig_gene_coords.split("-")
+                low_coord_orig = min([int(orig_gene_start), int(orig_gene_end)])
+                high_coord_orig = max([int(orig_gene_start), int(orig_gene_end)])
                 strand = self.gene_details[gene_id][0]["strand"]
                 score = details[0]["bitscore"]
 
@@ -524,7 +530,7 @@ class GenomeAnalysisEukaryotesMetaeuk(GenomeAnalysisEukaryotes):
                             [
                                 "grep",
                                 "{}|{}|.*|{}|{}|".format(
-                                    sequence, strand, gene_start, gene_end
+                                    sequence, strand, low_coord_orig, high_coord_orig
                                 ),
                                 rerun_results,
                             ]
@@ -534,7 +540,7 @@ class GenomeAnalysisEukaryotesMetaeuk(GenomeAnalysisEukaryotes):
                             [
                                 "grep",
                                 "{}|{}|.*|{}|{}|".format(
-                                    sequence, strand, gene_start, gene_end
+                                    sequence, strand, low_coord_orig, high_coord_orig
                                 ),
                                 initial_run_results,
                             ]
@@ -552,19 +558,22 @@ class GenomeAnalysisEukaryotesMetaeuk(GenomeAnalysisEukaryotes):
                 # redundantly matches the gene coordinates again.
                 good_match = self.metaeuk_runner.find_match(
                     matches,
-                    ["|{}|".format(gene_start), "|{}|".format(gene_end), sequence],
+                    ["|{}|".format(orig_gene_start), "|{}|".format(orig_gene_end), sequence],
                 )
 
                 if good_match:
                     low_coords, high_coords = self.metaeuk_runner.extract_exon_coords(
                         good_match
                     )
-                    if low_coords[0] > high_coords[0]:  # for negative strand exons the order is reversed
+                    if (
+                        low_coords[0] > high_coords[0]
+                    ):  # for negative strand exons the order is reversed
                         low_coords, high_coords = high_coords, low_coords
-                    trimmed_low = int(gene_id.split(":")[-1].split("-")[0])
-                    trimmed_high = int(gene_id.split(":")[-1].split("-")[1])
+
                     for i, entry in enumerate(low_coords):
-                        if int(entry) < trimmed_low or int(entry) > trimmed_high:  # don't include exons that were previously removed due to overlaps
+                        if (
+                            int(entry) < low_coord or int(entry) > high_coord
+                        ):  # don't include exons that were previously removed due to overlaps
                             continue
                         record = (
                             busco_id,
@@ -574,7 +583,7 @@ class GenomeAnalysisEukaryotesMetaeuk(GenomeAnalysisEukaryotes):
                             strand,
                             score,
                             run_found,
-                            gene_id,
+                            details[0]["orig gene ID"],
                         )
                         exon_records.append(record)
         return exon_records
@@ -592,36 +601,50 @@ class GenomeAnalysisEukaryotesMetaeuk(GenomeAnalysisEukaryotes):
             busco_gene_groups = busco_group.groupby("Orig gene ID")
             for gene_match, busco_gene_group in busco_gene_groups:
                 if gene_match not in matches:
-                    continue
-                new_gene_start = gene_match.split(":")[-1].split("-")[
-                    0
-                ]  # these two lines are not really used - they just initialize values that will be changed
-                new_gene_stop = gene_match.split(":")[-1].split("-")[1]
+                    if busco_id not in self.gene_update_mapping:
+                        continue
+                    elif gene_match not in self.gene_update_mapping[busco_id]:
+                        continue
+                    else:
+                        gene_match = self.gene_update_mapping[busco_id][gene_match]["new_gene_coords"]
+                new_gene_start, new_gene_stop = gene_match.split(":")[-1].split("-")  # this line just initializes values that will be changed if an exon is removed
                 start_trim = 0
                 end_trim = 0
                 group_indices = set(busco_gene_group.index)
                 intersection = group_indices.intersection(inds_to_remove)
                 if len(intersection) > 0:
+                    new_gene_low = min(new_gene_start, new_gene_stop)  # initialize values
+                    new_gene_high = max(new_gene_start, new_gene_stop)
                     if intersection == group_indices:
                         continue  # remove entire gene - don't add to new dict
-                    ordered_exons = busco_gene_group.sort_values(by="Start").reset_index()
+                    ordered_exons = busco_gene_group.sort_values(
+                        by="Low coord"
+                    ).reset_index()
                     new_indices = ordered_exons.index
                     seq = ordered_exons.loc[0]["Sequence"]
+                    strand = ordered_exons.loc[0]["Strand"]
 
                     for idx in new_indices:
                         old_index = ordered_exons.loc[idx]["index"]
                         if old_index in intersection:
-                            start_trim += discarded_exon_lengths[old_index]
+                            if strand == "+":
+                                start_trim += discarded_exon_lengths[old_index]
+                            else:
+                                end_trim += discarded_exon_lengths[old_index]
                         else:
-                            new_gene_start = df.loc[old_index]["Start"]
+                            new_gene_low = df.loc[old_index]["Low coord"]
                             break
                     for idx in new_indices[::-1]:
                         old_index = ordered_exons.loc[idx]["index"]
                         if old_index in intersection:
-                            end_trim += discarded_exon_lengths[old_index]
+                            if strand == "+":
+                                end_trim += discarded_exon_lengths[old_index]
+                            else:
+                                start_trim += discarded_exon_lengths[old_index]
                         else:
-                            new_gene_stop = df.loc[old_index]["Stop"]
+                            new_gene_high = df.loc[old_index]["High coord"]
                             break
+                    new_gene_start, new_gene_stop = (new_gene_low, new_gene_high) if strand == "+" else (new_gene_high, new_gene_low)
                     new_gene_match = "{}:{}-{}".format(
                         seq, new_gene_start, new_gene_stop
                     )
@@ -634,19 +657,24 @@ class GenomeAnalysisEukaryotesMetaeuk(GenomeAnalysisEukaryotes):
                 matched_genes_new[new_gene_match].append(busco_id)
                 self.gene_details[new_gene_match] = [
                     {
-                        "gene_start": new_gene_start,
+                        "gene_start": new_gene_start, # these are the same as the old gene coordinates if no exons were removed
                         "gene_end": new_gene_stop,
                         "strand": df_strand,
                     }
                 ]
                 if new_gene_match != gene_match:
                     trimmed_sequence_aa, trimmed_sequence_nt = self.trim_sequence(
-                        gene_match, start_trim, end_trim
+                        gene_match, new_gene_match, start_trim, end_trim
                     )
+                    self.gene_update_mapping[busco_id][gene_match] = {"new_gene_coords": new_gene_match, "start_trim": start_trim, "end_trim": end_trim}
                 else:
                     try:
-                        trimmed_sequence_aa = self.metaeuk_runner.sequences_aa[gene_match]
-                        trimmed_sequence_nt = self.metaeuk_runner.sequences_nt[gene_match]
+                        trimmed_sequence_aa = self.metaeuk_runner.sequences_aa[
+                            gene_match
+                        ]
+                        trimmed_sequence_nt = self.metaeuk_runner.sequences_nt[
+                            gene_match
+                        ]
                     except KeyError:  # happens on the second run if the first run trimmed the sequence already
                         trimmed_sequence_aa = self.sequences_aa[gene_match]
                         trimmed_sequence_nt = self.sequences_nt[gene_match]
@@ -654,9 +682,9 @@ class GenomeAnalysisEukaryotesMetaeuk(GenomeAnalysisEukaryotes):
                 self.sequences_nt[new_gene_match] = trimmed_sequence_nt
         return hmmer_result_dict_new, matched_genes_new
 
-    def trim_sequence(self, old_gene_match, start_trim, end_trim):
-        old_sequence_aa = self.metaeuk_runner.sequences_aa[old_gene_match]
-        old_sequence_nt = self.metaeuk_runner.sequences_nt[old_gene_match]
+    def trim_sequence(self, old_gene_match, new_gene_match, start_trim, end_trim):
+        old_sequence_aa = self.sequences_aa[old_gene_match]
+        old_sequence_nt = self.sequences_nt[old_gene_match]
 
         new_sequence_nt = old_sequence_nt[start_trim : len(old_sequence_nt) - end_trim]
         if start_trim % 3 != 0 and end_trim % 3 != 0:
@@ -666,6 +694,8 @@ class GenomeAnalysisEukaryotesMetaeuk(GenomeAnalysisEukaryotes):
         new_sequence_aa = old_sequence_aa[
             int(start_trim / 3) : len(old_sequence_aa) - int(end_trim / 3)
         ]
+        new_sequence_nt.id = new_sequence_nt.name = new_sequence_nt.description = new_gene_match
+        new_sequence_aa.id = new_sequence_aa.name = new_sequence_aa.description = new_gene_match
         return new_sequence_aa, new_sequence_nt
 
     def exons_to_df(self, records):
@@ -673,10 +703,11 @@ class GenomeAnalysisEukaryotesMetaeuk(GenomeAnalysisEukaryotes):
             logger.info("Validating exons and removing overlapping matches")
 
         df = self.metaeuk_runner.records_to_df(records)
-        df["Start"] = df["Start"].astype(int)
-        df["Stop"] = df["Stop"].astype(int)
+        df["Low coord"] = df["Low coord"].astype(int)
+        df["High coord"] = df["High coord"].astype(int)
         df["Score"] = df["Score"].astype(float)
         df["Run found"] = df["Run found"].astype(int)
+        df.sort_values(by=["Busco id", "Sequence", "Low coord"], inplace=True)
         return df
 
     def find_overlaps(self, df):
@@ -700,7 +731,7 @@ class GenomeAnalysisEukaryotesMetaeuk(GenomeAnalysisEukaryotes):
                 bad_inds = self.handle_diff_busco_overlap(overlap_inds, df)
                 for idx in bad_inds:
                     discarded_exon_lengths[idx] = (
-                        abs(df.iloc[idx]["Stop"] - df.iloc[idx]["Start"]) + 1
+                        abs(df.loc[idx]["High coord"] - df.loc[idx]["Low coord"]) + 1
                     )
         return discarded_exon_lengths
 
@@ -714,6 +745,8 @@ class GenomeAnalysisEukaryotesMetaeuk(GenomeAnalysisEukaryotes):
         busco_match2 = match2["Busco id"]
         gene_match2 = match2["Orig gene ID"]
         run_match2 = match2["Run found"]
+        strand1 = match1["Strand"]
+        strand2 = match2["Strand"]
         exons1 = df.loc[
             (df["Busco id"] == busco_match1)
             & (df["Sequence"] == seq)
@@ -744,30 +777,50 @@ class GenomeAnalysisEukaryotesMetaeuk(GenomeAnalysisEukaryotes):
         )
         gene_id1 = match1["Orig gene ID"]
         gene_id2 = match2["Orig gene ID"]
+
+        if busco_match1 in self.gene_update_mapping and gene_id1 in self.gene_update_mapping[busco_match1]:
+            start_trim1 = self.gene_update_mapping[busco_match1][gene_id1]["start_trim"]
+            end_trim1 = self.gene_update_mapping[busco_match1][gene_id1]["end_trim"]
+        else:
+            start_trim1 = 0
+            end_trim1 = 0
+
+        if busco_match2 in self.gene_update_mapping and gene_id2 in self.gene_update_mapping[busco_match2]:
+            start_trim2 = self.gene_update_mapping[busco_match2][gene_id2]["start_trim"]
+            end_trim2 = self.gene_update_mapping[busco_match2][gene_id2]["end_trim"]
+        else:
+            start_trim2 = 0
+            end_trim2 = 0
+
         if (
             hmmer_match_details1[gene_id1]["score"]
             > hmmer_match_details2[gene_id2]["score"]
         ):
             priority_match = hmmer_match_details1
             secondary_match = hmmer_match_details2
-            priority_exons = exons1
-            secondary_exons = exons2
+            priority_exons = exons1.sort_values(by="Low coord", ascending=strand1 == "+")
+            secondary_exons = exons2.sort_values(by="Low coord", ascending=strand2 == "+")
             priority_gene_id = gene_id1
             secondary_gene_id = gene_id2
+            priority_gene_trim = (start_trim1, end_trim1)
+            secondary_gene_trim = (start_trim2, end_trim2)
         else:
             priority_match = hmmer_match_details2
             secondary_match = hmmer_match_details1
-            priority_exons = exons2
-            secondary_exons = exons1
+            priority_exons = exons2.sort_values(by="Low coord", ascending=strand2 == "+")
+            secondary_exons = exons1.sort_values(by="Low coord", ascending=strand1 == "+")
             priority_gene_id = gene_id2
             secondary_gene_id = gene_id1
+            priority_gene_trim = (start_trim2, end_trim2)
+            secondary_gene_trim = (start_trim1, end_trim1)
+
         priority_env_coords = iter(priority_match[priority_gene_id]["env_coords"])
         secondary_env_coords = iter(secondary_match[secondary_gene_id]["env_coords"])
         priority_used_exons, priority_unused_exons = self.find_unused_exons(
-            priority_env_coords, priority_exons
+            priority_env_coords, priority_exons, priority_gene_trim
         )
         secondary_used_exons, secondary_unused_exons = self.find_unused_exons(
-            secondary_env_coords, secondary_exons
+            secondary_env_coords, secondary_exons, secondary_gene_trim
         )
 
         priority_used_exons = (
@@ -844,20 +897,25 @@ class GenomeAnalysisEukaryotesMetaeuk(GenomeAnalysisEukaryotes):
                         overlap[0] if match1["Score"] < match2["Score"] else overlap[1]
                     )
 
-                exons_to_remove = secondary_exons if index_to_remove in secondary_exons.index else priority_exons
+                exons_to_remove = (
+                    secondary_exons
+                    if index_to_remove in secondary_exons.index
+                    else priority_exons
+                )
                 indices_to_remove.extend(list(exons_to_remove.index))
         return indices_to_remove
 
-    def find_unused_exons(self, env_coords, exons):
+    def find_unused_exons(self, env_coords, exons, gene_trim):
         remaining_hmm_region = 0
         unused_exons = []
         used_exons = []
         hmm_coords = next(env_coords)
+        start_trim, end_trim = gene_trim[0]/3, gene_trim[1]/3
         exon_cumul_len = 0
         for idx, entry in exons.iterrows():
             entry["index"] = idx
             exon_matched = False
-            exon_size_nt = int(entry["Stop"]) - int(entry["Start"]) + 1
+            exon_size_nt = int(entry["High coord"]) - int(entry["Low coord"]) + 1
             if not exon_size_nt % 3 == 0:
                 raise BuscoError(
                     "The exon coordinates contain fractional reading frames and are ambiguous."
@@ -872,10 +930,10 @@ class GenomeAnalysisEukaryotesMetaeuk(GenomeAnalysisEukaryotes):
                 exon_matched = True
 
             elif hmm_coords:
-                while hmm_coords[0] < exon_cumul_len + 1:
+                while hmm_coords[0] - start_trim < exon_cumul_len + 1:
                     # hmm starts within exon
                     exon_matched = True
-                    if hmm_coords[1] <= exon_cumul_len + 1:
+                    if hmm_coords[1] - start_trim <= exon_cumul_len + 1:
                         # hmm ends within exon; cycle to the next hmm region
                         try:
                             hmm_coords = next(env_coords)
@@ -884,13 +942,14 @@ class GenomeAnalysisEukaryotesMetaeuk(GenomeAnalysisEukaryotes):
                             break
                         continue
                     else:
-                        remaining_hmm_region = hmm_coords[1] - exon_size_aa + 1
+                        remaining_hmm_region = hmm_coords[1] - start_trim - exon_size_aa + 1
                         break
             if exon_matched:
                 used_exons.append(entry)
             else:
                 unused_exons.append(entry)
-        used_exons, unused_exons = self.adjust_exon_categories(used_exons, unused_exons)
+        if len(used_exons) > 0:
+            used_exons, unused_exons = self.adjust_exon_categories(used_exons, unused_exons)
         return used_exons, unused_exons
 
     @staticmethod
@@ -902,16 +961,16 @@ class GenomeAnalysisEukaryotesMetaeuk(GenomeAnalysisEukaryotes):
         :return:
         """
 
-        used_exons_start = [x["Start"] for x in used_exons]
-        used_exons_end = [x["Stop"] for x in used_exons]
+        used_exons_start = [x["Low coord"] for x in used_exons]
+        used_exons_end = [x["High coord"] for x in used_exons]
         start = min(used_exons_start)
         stop = max(used_exons_end)
         exons_to_remove = set()
         unused_indices = [exon["index"] for exon in unused_exons]
         for exon in unused_exons:
             if not exon["index"] in exons_to_remove and (
-                (exon["Start"] >= start and exon["Start"] < stop)
-                or (exon["Stop"] > start and exon["Stop"] < stop)
+                (exon["Low coord"] >= start and exon["Low coord"] < stop)
+                or (exon["High coord"] > start and exon["High coord"] < stop)
             ):
                 # find exons that either start or stop within the "used" range
                 used_exons.append(exon)


=====================================
src/busco/analysis/TranscriptomeAnalysis.py
=====================================
@@ -6,7 +6,7 @@
 .. versionadded:: 3.0.0
 .. versionchanged:: 5.4.0
 
-Copyright (c) 2016-2022, Evgeny Zdobnov (ez at ezlab.org)
+Copyright (c) 2016-2023, Evgeny Zdobnov (ez at ezlab.org)
 Licensed under the MIT license. See LICENSE.md file.
 
 """


=====================================
src/busco/busco_tools/Toolset.py
=====================================
@@ -7,7 +7,7 @@
 .. versionadded:: 3.0.0
 .. versionchanged:: 5.4.0
 
-Copyright (c) 2016-2022, Evgeny Zdobnov (ez at ezlab.org)
+Copyright (c) 2016-2023, Evgeny Zdobnov (ez at ezlab.org)
 Licensed under the MIT license. See LICENSE.md file.
 
 """


=====================================
src/busco/busco_tools/base.py
=====================================
@@ -237,6 +237,28 @@ class BaseRunner(Tool, metaclass=ABCMeta):
 
         return
 
+    @staticmethod
+    def get_matches(results_grouped, seq):
+        g1 = results_grouped.get_group(seq)
+        g1_sorted = g1.sort_values(
+            "Low coord"
+        )  # sort to facilitate a single-pass coordinate check
+        for idx1, row1 in g1_sorted.iterrows():
+            strand = g1_sorted.loc[idx1]["Strand"]
+            if strand == "-":
+                start_val = high_coord = g1_sorted.loc[idx1]["High coord"]
+                stop_val = low_coord = g1_sorted.loc[idx1]["Low coord"]
+            else:
+                start_val = low_coord = g1_sorted.loc[idx1]["Low coord"]
+                stop_val = high_coord = g1_sorted.loc[idx1]["High coord"]
+            current_seqid = "{}:{}-{}".format(
+                g1_sorted.loc[idx1], start_val, stop_val
+            )
+            matches = g1_sorted[g1_sorted["Low coord"] >= low_coord].loc[
+                g1_sorted["Low coord"] < high_coord
+                ]  # find entries with a start coordinate between the current exon start and end
+            yield idx1, current_seqid, g1_sorted, matches
+
     @abstractmethod
     def get_version(self):
         return


=====================================
src/busco/busco_tools/hmmer.py
=====================================
@@ -478,19 +478,26 @@ class HMMERRunner(BaseRunner):
             matched_genes_fragment,
         )
 
-    @staticmethod
-    def remove_overlaps(matched_records):
+    def remove_overlaps(self, matched_records):
         seq_ids = []
-        start_coords = []
-        end_coords = []
+        low_coords = []
+        high_coords = []
         scores = []
+        strands = []
         try:
             for record in matched_records:
                 seq_id, coords = record.split(":")
-                start_coord, end_coord = coords.split("-")
+                start_coord, stop_coord = map(int, coords.split("-"))
+                low_coord = min(start_coord, stop_coord)
+                high_coord = max(start_coord, stop_coord)
+                if low_coord == start_coord:
+                    strand = "+"
+                else:
+                    strand = "-"
                 seq_ids.append(seq_id)
-                start_coords.append(start_coord)
-                end_coords.append(end_coord)
+                low_coords.append(low_coord)
+                high_coords.append(high_coord)
+                strands.append(strand)
                 scores.append(matched_records[record]["score"])
         except ValueError:  # for protein sequences there is no ":<coords>" suffix, so skip the overlap filtering
             return matched_records
@@ -498,45 +505,44 @@ class HMMERRunner(BaseRunner):
         records_df = pd.DataFrame(
             {
                 "Sequence": seq_ids,
-                "Start": start_coords,
-                "Stop": end_coords,
+                "Low coord": low_coords,
+                "High coord": high_coords,
                 "Score": scores,
+                "Strand": strands,
             }
         )
         results_grouped = records_df.groupby("Sequence")
         entries_to_remove = []
         seq_ids = results_grouped.groups.keys()
         for seq in seq_ids:
-            g1 = results_grouped.get_group(seq)
-            g1_sorted = g1.sort_values(
-                "Start"
-            )  # sort to facilitate a single-pass coordinate check
-            for idx1, row1 in g1_sorted.iterrows():
-                current_record = g1_sorted.loc[idx1]
-                start_val = current_record["Start"]
-                stop_val = current_record["Stop"]
-                current_seqid = "{}:{}-{}".format(
-                    current_record["Sequence"], start_val, stop_val
-                )
+            match_finder = self.get_matches(results_grouped, seq)
+            for match in match_finder:
+                idx1, current_seqid, g1_sorted, matches = match
                 if (
                     current_seqid in entries_to_remove
                 ):  # overlaps with removed entries don't count
                     continue
 
-                matches = g1_sorted[g1_sorted["Start"] > start_val].loc[
-                    g1_sorted["Start"] < stop_val
-                ]  # find entries with a start coordinate between the current exon start and end
-                for idx2 in matches.index.values:
-                    if g1_sorted.loc[idx1]["Score"] >= g1_sorted.loc[idx2]["Score"]:
+                for idx2 in matches.index.values: # don't consider overlaps with self
+                    if idx1 == idx2:
+                        continue
+                    elif g1_sorted.loc[idx1]["Score"] >= g1_sorted.loc[idx2]["Score"]:
                         ind_to_remove = idx2
                     else:
                         ind_to_remove = idx1
                     record_to_remove = g1_sorted.loc[ind_to_remove]
+                    record_start_coord, record_stop_coord = (
+                        record_to_remove["Low coord"],
+                        record_to_remove["High coord"],
+                    ) if record_to_remove["Strand"] == "+" else (
+                        record_to_remove["High coord"],
+                        record_to_remove["Low coord"],
+                    )
                     entries_to_remove.append(
                         "{}:{}-{}".format(
                             record_to_remove["Sequence"],
-                            record_to_remove["Start"],
-                            record_to_remove["Stop"],
+                            record_start_coord,
+                            record_stop_coord,
                         )
                     )
 
@@ -632,14 +638,7 @@ class HMMERRunner(BaseRunner):
         """
         # For a given input dictionary {busco_id: gene_ids}, make sure we are using the corresponding dictionary
         # {gene_id: busco_matches}
-        if busco_dict == self.is_complete:
-            matched_genes = self.matched_genes_complete
-        elif busco_dict == self.is_very_large:
-            matched_genes = self.matched_genes_vlarge
-        elif busco_dict == self.is_fragment:
-            matched_genes = self.matched_genes_fragment
-        else:
-            raise BuscoError("Unrecognized dictionary of BUSCOs.")
+        matched_genes = self.get_matched_genes_dict(busco_dict)
 
         busco_matches_to_remove = []
         # Keep the best scoring gene if gene is matched by more than one busco with the same match rank
@@ -679,6 +678,17 @@ class HMMERRunner(BaseRunner):
 
         return
 
+    def get_matched_genes_dict(self, busco_dict):
+        if busco_dict == self.is_complete:
+            matched_genes = self.matched_genes_complete
+        elif busco_dict == self.is_very_large:
+            matched_genes = self.matched_genes_vlarge
+        elif busco_dict == self.is_fragment:
+            matched_genes = self.matched_genes_fragment
+        else:
+            raise BuscoError("Unrecognized dictionary of BUSCOs.")
+        return matched_genes
+
     def _remove_low_scoring_matches(self, busco_dict):
         """
         Go through input dictionary and remove any gene matches that score less than 85% of the top gene match score
@@ -689,6 +699,8 @@ class HMMERRunner(BaseRunner):
         """
         empty_buscos = []
 
+        matched_genes = self.get_matched_genes_dict(busco_dict)
+
         # For each busco, keep only matches within 85% of top bitscore match for that busco
         for busco_id, matches in busco_dict.items():
             if len(matches) > 1:
@@ -714,7 +726,9 @@ class HMMERRunner(BaseRunner):
         for item in empty_buscos:
             busco_id, gene_id = item
             busco_dict[busco_id].pop(gene_id)
-
+            matched_genes[gene_id].remove(busco_id)
+            if len(matched_genes[gene_id]) == 0:
+                matched_genes.pop(gene_id)
         return
 
     @staticmethod
@@ -1001,6 +1015,7 @@ class HMMERRunner(BaseRunner):
                     aa_seqs = [sequences_aa[gene_id] for gene_id in gene_matches]
                 with open(os.path.join(output_dir, "{}.faa".format(busco)), "w") as f1:
                     SeqIO.write(aa_seqs, f1, "fasta")
+        return
 
     def write_hmmer_results(self, output_lines):
         """


=====================================
src/busco/busco_tools/metaeuk.py
=====================================
@@ -290,8 +290,8 @@ class MetaeukRunner(BaseRunner):
             columns=[
                 "Busco id",
                 "Sequence",
-                "Start",
-                "Stop",
+                "Low coord",
+                "High coord",
                 "Strand",
                 "Score",
                 "Run found",
@@ -302,20 +302,12 @@ class MetaeukRunner(BaseRunner):
 
         return results
 
-    @staticmethod
-    def detect_overlap(results_grouped, seq):
+    def detect_overlap(self, results_grouped, seq):
         overlap_inds = []
         handled_inds = set()
-        g1 = results_grouped.get_group(seq)
-        g1_sorted = g1.sort_values(
-            "Start"
-        )  # sort to facilitate a single-pass coordinate check
-        for idx1, row1 in g1_sorted.iterrows():
-            start_val = g1_sorted.loc[idx1]["Start"]
-            stop_val = g1_sorted.loc[idx1]["Stop"]
-            matches = g1_sorted[g1_sorted["Start"] >= start_val].loc[
-                g1_sorted["Start"] < stop_val
-            ]  # find entries with a start coordinate between the current exon start and end
+        match_finder = self.get_matches(results_grouped, seq)
+        for match in match_finder:
+            idx1, current_seqid, g1_sorted, matches = match
             for idx2 in matches.index.values:
                 if idx2 in handled_inds:
                     continue
@@ -340,8 +332,8 @@ class MetaeukRunner(BaseRunner):
                         ]  # check overlaps are on the same strand
                     )
                     and (
-                        match1_details["Start"] % 3
-                        == match2_details["Start"]
+                        match1_details["Low coord"] % 3
+                        == match2_details["Low coord"]
                         % 3  # check overlaps are in the same reading frame
                     )
                 ):
@@ -602,6 +594,11 @@ class MetaeukRunner(BaseRunner):
         high_coord = int(header_parts[7])
         exon_coords = header_parts[8:]
 
+        if strand == "+":
+            gene_id = "{}:{}-{}".format(C_acc, low_coord, high_coord)
+        else:
+            gene_id = "{}:{}-{}".format(C_acc, high_coord, low_coord)
+
         all_low_exon_coords = []
         all_taken_low_exon_coords = []
         all_high_exon_coords = []
@@ -636,8 +633,6 @@ class MetaeukRunner(BaseRunner):
             all_taken_high_exon_coords.append(taken_high_exon_coord)
             all_taken_exon_nucl_len.append(taken_nucl_len)
 
-        gene_id = "{}:{}-{}".format(C_acc, low_coord, high_coord)
-
         details = {
             "T_acc": T_acc,
             "C_acc": C_acc,


=====================================
src/busco/run_BUSCO.py
=====================================
@@ -13,7 +13,7 @@ To get help, ``busco -h``. See also the user guide.
 
 And visit our website `<http://busco.ezlab.org/>`_
 
-Copyright (c) 2016-2022, Evgeny Zdobnov (ez at ezlab.org)
+Copyright (c) 2016-2023, Evgeny Zdobnov (ez at ezlab.org)
 Licensed under the MIT license. See LICENSE.md file.
 
 """


=====================================
test_data/bacteria/expected_log.txt
=====================================
@@ -1,108 +1,110 @@
-2022-12-05 16:29:58 INFO:	***** Start a BUSCO v5.4.4 analysis, current time: 12/05/2022 16:29:58 *****
-2022-12-05 16:29:58 INFO:	Configuring BUSCO with local environment
-2022-12-05 16:29:58 WARNING:	Running Auto Lineage Selector as no lineage dataset was specified. This will take a little longer than normal. If you know what lineage dataset you want to use, please specify this in the config file or using the -l (--lineage-dataset) flag in the command line.
-2022-12-05 16:29:58 INFO:	Mode is genome
-2022-12-05 16:29:58 INFO:	Downloading information on latest versions of BUSCO data...
-2022-12-05 16:30:00 INFO:	Input file is /busco_wd/test_data/bacteria/genome.fna
-2022-12-05 16:30:00 INFO:	No lineage specified. Running lineage auto selector.
+2023-04-28 14:24:43 INFO:	***** Start a BUSCO v5.4.7 analysis, current time: 04/28/2023 14:24:43 *****
+2023-04-28 14:24:43 INFO:	Configuring BUSCO with local environment
+2023-04-28 14:24:43 WARNING:	Running Auto Lineage Selector as no lineage dataset was specified. This will take a little longer than normal. If you know what lineage dataset you want to use, please specify this in the config file or using the -l (--lineage-dataset) flag in the command line.
+2023-04-28 14:24:43 INFO:	Mode is genome
+2023-04-28 14:24:43 INFO:	'Force' option selected; overwriting previous results directory
+2023-04-28 14:24:44 INFO:	Downloading information on latest versions of BUSCO data...
+2023-04-28 14:24:46 INFO:	Input file is /busco_wd/test_data/bacteria/genome.fna
+2023-04-28 14:24:46 INFO:	No lineage specified. Running lineage auto selector.
 
-2022-12-05 16:30:00 INFO:	***** Starting Auto Select Lineage *****
+2023-04-28 14:24:46 INFO:	***** Starting Auto Select Lineage *****
 	This process runs BUSCO on the generic lineage datasets for the domains archaea, bacteria and eukaryota. Once the optimal domain is selected, BUSCO automatically attempts to find the most appropriate BUSCO dataset to use based on phylogenetic placement.
 	--auto-lineage-euk and --auto-lineage-prok are also available if you know your input assembly is, or is not, an eukaryote. See the user guide for more information.
 	A reminder: Busco evaluations are valid when an appropriate dataset is used, i.e., the dataset belongs to the lineage of the species to test. Because of overlapping markers/spurious matches among domains, busco matches in another domain do not necessarily mean that your genome/proteome contains sequences from this domain. However, a high busco score in multiple domains might help you identify possible contaminations.
-2022-12-05 16:30:00 INFO:	Running BUSCO using lineage dataset archaea_odb10 (prokaryota, 2021-02-23)
-2022-12-05 16:30:00 INFO:	Running 1 job(s) on bbtools, starting at 12/05/2022 16:30:00
-2022-12-05 16:30:02 INFO:	[bbtools]	1 of 1 task(s) completed
-2022-12-05 16:30:02 INFO:	***** Run Prodigal on input to predict and extract genes *****
-2022-12-05 16:30:02 INFO:	Running Prodigal with genetic code 11 in single mode
-2022-12-05 16:30:02 INFO:	Running 1 job(s) on prodigal, starting at 12/05/2022 16:30:02
-2022-12-05 16:30:04 INFO:	[prodigal]	1 of 1 task(s) completed
-2022-12-05 16:30:04 INFO:	Genetic code 11 selected as optimal
-2022-12-05 16:30:04 INFO:	***** Run HMMER on gene sequences *****
-2022-12-05 16:30:04 INFO:	Running 194 job(s) on hmmsearch, starting at 12/05/2022 16:30:04
-2022-12-05 16:30:06 INFO:	[hmmsearch]	20 of 194 task(s) completed
-2022-12-05 16:30:06 INFO:	[hmmsearch]	39 of 194 task(s) completed
-2022-12-05 16:30:06 INFO:	[hmmsearch]	59 of 194 task(s) completed
-2022-12-05 16:30:06 INFO:	[hmmsearch]	78 of 194 task(s) completed
-2022-12-05 16:30:06 INFO:	[hmmsearch]	97 of 194 task(s) completed
-2022-12-05 16:30:07 INFO:	[hmmsearch]	117 of 194 task(s) completed
-2022-12-05 16:30:07 INFO:	[hmmsearch]	136 of 194 task(s) completed
-2022-12-05 16:30:07 INFO:	[hmmsearch]	156 of 194 task(s) completed
-2022-12-05 16:30:07 INFO:	[hmmsearch]	175 of 194 task(s) completed
-2022-12-05 16:30:07 INFO:	[hmmsearch]	194 of 194 task(s) completed
-2022-12-05 16:30:09 INFO:	Results:	C:5.2%[S:5.2%,D:0.0%],F:1.5%,M:93.3%,n:194	   
+2023-04-28 14:24:46 INFO:	Running BUSCO using lineage dataset archaea_odb10 (prokaryota, 2021-02-23)
+2023-04-28 14:24:46 INFO:	Running 1 job(s) on bbtools, starting at 04/28/2023 14:24:46
+2023-04-28 14:24:48 INFO:	[bbtools]	1 of 1 task(s) completed
+2023-04-28 14:24:48 INFO:	***** Run Prodigal on input to predict and extract genes *****
+2023-04-28 14:24:48 INFO:	Running Prodigal with genetic code 11 in single mode
+2023-04-28 14:24:48 INFO:	Running 1 job(s) on prodigal, starting at 04/28/2023 14:24:48
+2023-04-28 14:24:50 INFO:	[prodigal]	1 of 1 task(s) completed
+2023-04-28 14:24:50 INFO:	Genetic code 11 selected as optimal
+2023-04-28 14:24:50 INFO:	***** Run HMMER on gene sequences *****
+2023-04-28 14:24:50 INFO:	Running 194 job(s) on hmmsearch, starting at 04/28/2023 14:24:50
+2023-04-28 14:24:51 INFO:	[hmmsearch]	20 of 194 task(s) completed
+2023-04-28 14:24:52 INFO:	[hmmsearch]	39 of 194 task(s) completed
+2023-04-28 14:24:52 INFO:	[hmmsearch]	59 of 194 task(s) completed
+2023-04-28 14:24:52 INFO:	[hmmsearch]	78 of 194 task(s) completed
+2023-04-28 14:24:52 INFO:	[hmmsearch]	97 of 194 task(s) completed
+2023-04-28 14:24:52 INFO:	[hmmsearch]	117 of 194 task(s) completed
+2023-04-28 14:24:52 INFO:	[hmmsearch]	136 of 194 task(s) completed
+2023-04-28 14:24:52 INFO:	[hmmsearch]	156 of 194 task(s) completed
+2023-04-28 14:24:53 INFO:	[hmmsearch]	175 of 194 task(s) completed
+2023-04-28 14:24:53 INFO:	[hmmsearch]	194 of 194 task(s) completed
+2023-04-28 14:24:54 INFO:	Results:	C:5.2%[S:5.2%,D:0.0%],F:1.5%,M:93.3%,n:194	   
 
-2022-12-05 16:30:09 INFO:	Running BUSCO using lineage dataset bacteria_odb10 (prokaryota, 2020-03-06)
-2022-12-05 16:30:09 INFO:	Running 1 job(s) on bbtools, starting at 12/05/2022 16:30:09
-2022-12-05 16:30:11 INFO:	[bbtools]	1 of 1 task(s) completed
-2022-12-05 16:30:11 INFO:	***** Run Prodigal on input to predict and extract genes *****
-2022-12-05 16:30:11 INFO:	Genetic code 11 selected as optimal
-2022-12-05 16:30:11 INFO:	***** Run HMMER on gene sequences *****
-2022-12-05 16:30:11 INFO:	Running 124 job(s) on hmmsearch, starting at 12/05/2022 16:30:11
-2022-12-05 16:30:12 INFO:	[hmmsearch]	13 of 124 task(s) completed
-2022-12-05 16:30:13 INFO:	[hmmsearch]	25 of 124 task(s) completed
-2022-12-05 16:30:13 INFO:	[hmmsearch]	38 of 124 task(s) completed
-2022-12-05 16:30:13 INFO:	[hmmsearch]	50 of 124 task(s) completed
-2022-12-05 16:30:13 INFO:	[hmmsearch]	63 of 124 task(s) completed
-2022-12-05 16:30:13 INFO:	[hmmsearch]	75 of 124 task(s) completed
-2022-12-05 16:30:13 INFO:	[hmmsearch]	87 of 124 task(s) completed
-2022-12-05 16:30:13 INFO:	[hmmsearch]	100 of 124 task(s) completed
-2022-12-05 16:30:14 INFO:	[hmmsearch]	112 of 124 task(s) completed
-2022-12-05 16:30:14 INFO:	[hmmsearch]	124 of 124 task(s) completed
-2022-12-05 16:30:16 INFO:	Results:	C:21.0%[S:21.0%,D:0.0%],F:0.8%,M:78.2%,n:124	   
+2023-04-28 14:24:55 INFO:	Running BUSCO using lineage dataset bacteria_odb10 (prokaryota, 2020-03-06)
+2023-04-28 14:24:55 INFO:	Running 1 job(s) on bbtools, starting at 04/28/2023 14:24:55
+2023-04-28 14:24:57 INFO:	[bbtools]	1 of 1 task(s) completed
+2023-04-28 14:24:57 INFO:	***** Run Prodigal on input to predict and extract genes *****
+2023-04-28 14:24:58 INFO:	Genetic code 11 selected as optimal
+2023-04-28 14:24:58 INFO:	***** Run HMMER on gene sequences *****
+2023-04-28 14:24:58 INFO:	Running 124 job(s) on hmmsearch, starting at 04/28/2023 14:24:58
+2023-04-28 14:24:59 INFO:	[hmmsearch]	13 of 124 task(s) completed
+2023-04-28 14:25:00 INFO:	[hmmsearch]	25 of 124 task(s) completed
+2023-04-28 14:25:00 INFO:	[hmmsearch]	38 of 124 task(s) completed
+2023-04-28 14:25:00 INFO:	[hmmsearch]	50 of 124 task(s) completed
+2023-04-28 14:25:00 INFO:	[hmmsearch]	63 of 124 task(s) completed
+2023-04-28 14:25:00 INFO:	[hmmsearch]	75 of 124 task(s) completed
+2023-04-28 14:25:00 INFO:	[hmmsearch]	87 of 124 task(s) completed
+2023-04-28 14:25:00 INFO:	[hmmsearch]	100 of 124 task(s) completed
+2023-04-28 14:25:01 INFO:	[hmmsearch]	112 of 124 task(s) completed
+2023-04-28 14:25:01 INFO:	[hmmsearch]	124 of 124 task(s) completed
+2023-04-28 14:25:03 INFO:	Results:	C:21.0%[S:21.0%,D:0.0%],F:0.8%,M:78.2%,n:124	   
 
-2022-12-05 16:30:17 INFO:	Running BUSCO using lineage dataset eukaryota_odb10 (eukaryota, 2020-09-10)
-2022-12-05 16:30:17 INFO:	Running 1 job(s) on bbtools, starting at 12/05/2022 16:30:17
-2022-12-05 16:30:19 INFO:	[bbtools]	1 of 1 task(s) completed
-2022-12-05 16:30:19 INFO:	Running 1 job(s) on metaeuk, starting at 12/05/2022 16:30:19
-2022-12-05 16:31:01 INFO:	[metaeuk]	1 of 1 task(s) completed
-2022-12-05 16:31:02 INFO:	***** Run HMMER on gene sequences *****
-2022-12-05 16:31:02 INFO:	Running 255 job(s) on hmmsearch, starting at 12/05/2022 16:31:02
-2022-12-05 16:31:03 INFO:	[hmmsearch]	26 of 255 task(s) completed
-2022-12-05 16:31:04 INFO:	[hmmsearch]	51 of 255 task(s) completed
-2022-12-05 16:31:04 INFO:	[hmmsearch]	77 of 255 task(s) completed
-2022-12-05 16:31:04 INFO:	[hmmsearch]	102 of 255 task(s) completed
-2022-12-05 16:31:05 INFO:	[hmmsearch]	128 of 255 task(s) completed
-2022-12-05 16:31:05 INFO:	[hmmsearch]	153 of 255 task(s) completed
-2022-12-05 16:31:05 INFO:	[hmmsearch]	179 of 255 task(s) completed
-2022-12-05 16:31:05 INFO:	[hmmsearch]	204 of 255 task(s) completed
-2022-12-05 16:31:06 INFO:	[hmmsearch]	230 of 255 task(s) completed
-2022-12-05 16:31:06 INFO:	[hmmsearch]	255 of 255 task(s) completed
-2022-12-05 16:31:08 INFO:	Validating exons and removing overlapping matches
-2022-12-05 16:31:10 INFO:	0 candidate overlapping regions found
-2022-12-05 16:31:10 INFO:	3 exons in total
-2022-12-05 16:31:10 INFO:	Results:	C:1.2%[S:1.2%,D:0.0%],F:0.0%,M:98.8%,n:255	   
+2023-04-28 14:25:03 INFO:	Running BUSCO using lineage dataset eukaryota_odb10 (eukaryota, 2020-09-10)
+2023-04-28 14:25:03 INFO:	Running 1 job(s) on bbtools, starting at 04/28/2023 14:25:03
+2023-04-28 14:25:06 INFO:	[bbtools]	1 of 1 task(s) completed
+2023-04-28 14:25:06 INFO:	Running 1 job(s) on metaeuk, starting at 04/28/2023 14:25:06
+2023-04-28 14:25:49 INFO:	[metaeuk]	1 of 1 task(s) completed
+2023-04-28 14:25:50 INFO:	***** Run HMMER on gene sequences *****
+2023-04-28 14:25:50 INFO:	Running 255 job(s) on hmmsearch, starting at 04/28/2023 14:25:50
+2023-04-28 14:25:52 INFO:	[hmmsearch]	26 of 255 task(s) completed
+2023-04-28 14:25:52 INFO:	[hmmsearch]	51 of 255 task(s) completed
+2023-04-28 14:25:52 INFO:	[hmmsearch]	77 of 255 task(s) completed
+2023-04-28 14:25:52 INFO:	[hmmsearch]	102 of 255 task(s) completed
+2023-04-28 14:25:52 INFO:	[hmmsearch]	128 of 255 task(s) completed
+2023-04-28 14:25:52 INFO:	[hmmsearch]	153 of 255 task(s) completed
+2023-04-28 14:25:52 INFO:	[hmmsearch]	179 of 255 task(s) completed
+2023-04-28 14:25:53 INFO:	[hmmsearch]	204 of 255 task(s) completed
+2023-04-28 14:25:53 INFO:	[hmmsearch]	230 of 255 task(s) completed
+2023-04-28 14:25:53 INFO:	[hmmsearch]	255 of 255 task(s) completed
+2023-04-28 14:25:55 INFO:	Validating exons and removing overlapping matches
+2023-04-28 14:25:55 INFO:	0 candidate overlapping regions found
+2023-04-28 14:25:55 INFO:	3 exons in total
+2023-04-28 14:25:55 INFO:	Results:	C:1.2%[S:1.2%,D:0.0%],F:0.0%,M:98.8%,n:255	   
 
-2022-12-05 16:31:10 INFO:	Extracting missing and fragmented buscos from the file refseq_db.faa...
-2022-12-05 16:31:26 INFO:	Running 1 job(s) on metaeuk, starting at 12/05/2022 16:31:26
-2022-12-05 16:32:23 INFO:	[metaeuk]	1 of 1 task(s) completed
-2022-12-05 16:32:24 INFO:	***** Run HMMER on gene sequences *****
-2022-12-05 16:32:24 INFO:	Running 252 job(s) on hmmsearch, starting at 12/05/2022 16:32:24
-2022-12-05 16:32:26 INFO:	[hmmsearch]	26 of 252 task(s) completed
-2022-12-05 16:32:26 INFO:	[hmmsearch]	51 of 252 task(s) completed
-2022-12-05 16:32:26 INFO:	[hmmsearch]	76 of 252 task(s) completed
-2022-12-05 16:32:26 INFO:	[hmmsearch]	101 of 252 task(s) completed
-2022-12-05 16:32:27 INFO:	[hmmsearch]	126 of 252 task(s) completed
-2022-12-05 16:32:27 INFO:	[hmmsearch]	152 of 252 task(s) completed
-2022-12-05 16:32:27 INFO:	[hmmsearch]	177 of 252 task(s) completed
-2022-12-05 16:32:27 INFO:	[hmmsearch]	202 of 252 task(s) completed
-2022-12-05 16:32:27 INFO:	[hmmsearch]	227 of 252 task(s) completed
-2022-12-05 16:32:27 INFO:	[hmmsearch]	252 of 252 task(s) completed
-2022-12-05 16:32:29 INFO:	Validating exons and removing overlapping matches
-2022-12-05 16:32:30 INFO:	0 candidate overlapping regions found
-2022-12-05 16:32:30 INFO:	3 exons in total
-2022-12-05 16:32:30 INFO:	Results:	C:1.2%[S:1.2%,D:0.0%],F:0.0%,M:98.8%,n:255	   
+2023-04-28 14:25:55 INFO:	Extracting missing and fragmented buscos from the file refseq_db.faa...
+2023-04-28 14:26:13 INFO:	Running 1 job(s) on metaeuk, starting at 04/28/2023 14:26:13
+2023-04-28 14:27:12 INFO:	[metaeuk]	1 of 1 task(s) completed
+2023-04-28 14:27:13 INFO:	***** Run HMMER on gene sequences *****
+2023-04-28 14:27:13 INFO:	Running 252 job(s) on hmmsearch, starting at 04/28/2023 14:27:13
+2023-04-28 14:27:15 INFO:	[hmmsearch]	26 of 252 task(s) completed
+2023-04-28 14:27:15 INFO:	[hmmsearch]	51 of 252 task(s) completed
+2023-04-28 14:27:15 INFO:	[hmmsearch]	76 of 252 task(s) completed
+2023-04-28 14:27:15 INFO:	[hmmsearch]	101 of 252 task(s) completed
+2023-04-28 14:27:15 INFO:	[hmmsearch]	126 of 252 task(s) completed
+2023-04-28 14:27:15 INFO:	[hmmsearch]	152 of 252 task(s) completed
+2023-04-28 14:27:15 INFO:	[hmmsearch]	177 of 252 task(s) completed
+2023-04-28 14:27:16 INFO:	[hmmsearch]	202 of 252 task(s) completed
+2023-04-28 14:27:16 INFO:	[hmmsearch]	202 of 252 task(s) completed
+2023-04-28 14:27:16 INFO:	[hmmsearch]	227 of 252 task(s) completed
+2023-04-28 14:27:16 INFO:	[hmmsearch]	252 of 252 task(s) completed
+2023-04-28 14:27:17 INFO:	Validating exons and removing overlapping matches
+2023-04-28 14:27:18 INFO:	0 candidate overlapping regions found
+2023-04-28 14:27:18 INFO:	3 exons in total
+2023-04-28 14:27:18 INFO:	Results:	C:1.2%[S:1.2%,D:0.0%],F:0.0%,M:98.8%,n:255	   
 
-2022-12-05 16:32:30 INFO:	bacteria_odb10 selected
+2023-04-28 14:27:18 INFO:	bacteria_odb10 selected
 
-2022-12-05 16:32:30 INFO:	***** Searching tree for chosen lineage to find best taxonomic match *****
+2023-04-28 14:27:18 INFO:	***** Searching tree for chosen lineage to find best taxonomic match *****
 
-2022-12-05 16:32:30 INFO:	Extract markers...
-2022-12-05 16:32:30 INFO:	Place the markers on the reference tree...
-2022-12-05 16:32:30 INFO:	Running 1 job(s) on sepp, starting at 12/05/2022 16:32:30
-2022-12-05 16:33:54 INFO:	[sepp]	1 of 1 task(s) completed
-2022-12-05 16:33:54 INFO:	Not enough markers were placed on the tree (11). Root lineage bacteria is kept
-2022-12-05 16:33:54 INFO:	
+2023-04-28 14:27:18 INFO:	Extract markers...
+2023-04-28 14:27:18 INFO:	Place the markers on the reference tree...
+2023-04-28 14:27:18 INFO:	Running 1 job(s) on sepp, starting at 04/28/2023 14:27:18
+2023-04-28 14:28:48 INFO:	[sepp]	1 of 1 task(s) completed
+2023-04-28 14:28:48 INFO:	Not enough markers were placed on the tree (11). Root lineage bacteria is kept
+2023-04-28 14:28:48 INFO:	
 
 	--------------------------------------------------
 	|Results from dataset bacteria_odb10              |
@@ -115,12 +117,12 @@
 	|97	Missing BUSCOs (M)                        |
 	|124	Total BUSCO groups searched               |
 	--------------------------------------------------
-2022-12-05 16:33:54 INFO:	BUSCO analysis done with WARNING(s). Total running time: 234 seconds
+2023-04-28 14:28:48 INFO:	BUSCO analysis done with WARNING(s). Total running time: 243 seconds
 
 ***** Summary of warnings: *****
-2022-12-05 16:29:58 WARNING:busco.BuscoConfig	Running Auto Lineage Selector as no lineage dataset was specified. This will take a little longer than normal. If you know what lineage dataset you want to use, please specify this in the config file or using the -l (--lineage-dataset) flag in the command line.
+2023-04-28 14:24:43 WARNING:busco.BuscoConfig	Running Auto Lineage Selector as no lineage dataset was specified. This will take a little longer than normal. If you know what lineage dataset you want to use, please specify this in the config file or using the -l (--lineage-dataset) flag in the command line.
 
-2022-12-05 16:33:54 INFO:	Results written in /busco_wd/test_bacteria
-2022-12-05 16:33:54 INFO:	For assistance with interpreting the results, please consult the userguide: https://busco.ezlab.org/busco_userguide.html
+2023-04-28 14:28:48 INFO:	Results written in /busco_wd/test_bacteria
+2023-04-28 14:28:48 INFO:	For assistance with interpreting the results, please consult the userguide: https://busco.ezlab.org/busco_userguide.html
 
-2022-12-05 16:33:54 INFO:	Visit this page https://gitlab.com/ezlab/busco#how-to-cite-busco to see how to cite BUSCO
+2023-04-28 14:28:48 INFO:	Visit this page https://gitlab.com/ezlab/busco#how-to-cite-busco to see how to cite BUSCO


=====================================
test_data/eukaryota/expected_log.txt
=====================================
@@ -1,148 +1,148 @@
-2022-12-05 16:33:55 INFO:	***** Start a BUSCO v5.4.4 analysis, current time: 12/05/2022 16:33:55 *****
-2022-12-05 16:33:55 INFO:	Configuring BUSCO with local environment
-2022-12-05 16:33:55 WARNING:	Running Auto Lineage Selector as no lineage dataset was specified. This will take a little longer than normal. If you know what lineage dataset you want to use, please specify this in the config file or using the -l (--lineage-dataset) flag in the command line.
-2022-12-05 16:33:55 INFO:	Mode is genome
-2022-12-05 16:33:55 INFO:	Downloading information on latest versions of BUSCO data...
-2022-12-05 16:33:57 INFO:	Input file is /busco_wd/test_data/eukaryota/genome.fna
-2022-12-05 16:33:57 INFO:	No lineage specified. Running lineage auto selector.
+2023-04-28 14:28:49 INFO:	***** Start a BUSCO v5.4.7 analysis, current time: 04/28/2023 14:28:49 *****
+2023-04-28 14:28:49 INFO:	Configuring BUSCO with local environment
+2023-04-28 14:28:49 WARNING:	Running Auto Lineage Selector as no lineage dataset was specified. This will take a little longer than normal. If you know what lineage dataset you want to use, please specify this in the config file or using the -l (--lineage-dataset) flag in the command line.
+2023-04-28 14:28:49 INFO:	Mode is genome
+2023-04-28 14:28:49 INFO:	'Force' option selected; overwriting previous results directory
+2023-04-28 14:28:50 INFO:	Downloading information on latest versions of BUSCO data...
+2023-04-28 14:28:52 INFO:	Input file is /busco_wd/test_data/eukaryota/genome.fna
+2023-04-28 14:28:52 INFO:	No lineage specified. Running lineage auto selector.
 
-2022-12-05 16:33:57 INFO:	***** Starting Auto Select Lineage *****
+2023-04-28 14:28:52 INFO:	***** Starting Auto Select Lineage *****
 	This process runs BUSCO on the generic lineage datasets for the domains archaea, bacteria and eukaryota. Once the optimal domain is selected, BUSCO automatically attempts to find the most appropriate BUSCO dataset to use based on phylogenetic placement.
 	--auto-lineage-euk and --auto-lineage-prok are also available if you know your input assembly is, or is not, an eukaryote. See the user guide for more information.
 	A reminder: Busco evaluations are valid when an appropriate dataset is used, i.e., the dataset belongs to the lineage of the species to test. Because of overlapping markers/spurious matches among domains, busco matches in another domain do not necessarily mean that your genome/proteome contains sequences from this domain. However, a high busco score in multiple domains might help you identify possible contaminations.
-2022-12-05 16:33:57 INFO:	Running BUSCO using lineage dataset archaea_odb10 (prokaryota, 2021-02-23)
-2022-12-05 16:33:57 INFO:	Running 1 job(s) on bbtools, starting at 12/05/2022 16:33:57
-2022-12-05 16:33:58 INFO:	[bbtools]	1 of 1 task(s) completed
-2022-12-05 16:33:58 INFO:	***** Run Prodigal on input to predict and extract genes *****
-2022-12-05 16:33:58 INFO:	Running Prodigal with genetic code 11 in single mode
-2022-12-05 16:33:58 INFO:	Running 1 job(s) on prodigal, starting at 12/05/2022 16:33:58
-2022-12-05 16:33:59 INFO:	[prodigal]	1 of 1 task(s) completed
-2022-12-05 16:33:59 INFO:	Genetic code 11 selected as optimal
-2022-12-05 16:33:59 INFO:	***** Run HMMER on gene sequences *****
-2022-12-05 16:33:59 INFO:	Running 194 job(s) on hmmsearch, starting at 12/05/2022 16:33:59
-2022-12-05 16:34:00 INFO:	[hmmsearch]	20 of 194 task(s) completed
-2022-12-05 16:34:00 INFO:	[hmmsearch]	39 of 194 task(s) completed
-2022-12-05 16:34:00 INFO:	[hmmsearch]	59 of 194 task(s) completed
-2022-12-05 16:34:00 INFO:	[hmmsearch]	78 of 194 task(s) completed
-2022-12-05 16:34:00 INFO:	[hmmsearch]	97 of 194 task(s) completed
-2022-12-05 16:34:01 INFO:	[hmmsearch]	117 of 194 task(s) completed
-2022-12-05 16:34:01 INFO:	[hmmsearch]	136 of 194 task(s) completed
-2022-12-05 16:34:01 INFO:	[hmmsearch]	156 of 194 task(s) completed
-2022-12-05 16:34:01 INFO:	[hmmsearch]	175 of 194 task(s) completed
-2022-12-05 16:34:01 INFO:	[hmmsearch]	194 of 194 task(s) completed
-2022-12-05 16:34:02 INFO:	Results:	C:1.0%[S:1.0%,D:0.0%],F:0.5%,M:98.5%,n:194	   
+2023-04-28 14:28:52 INFO:	Running BUSCO using lineage dataset archaea_odb10 (prokaryota, 2021-02-23)
+2023-04-28 14:28:52 INFO:	Running 1 job(s) on bbtools, starting at 04/28/2023 14:28:52
+2023-04-28 14:28:53 INFO:	[bbtools]	1 of 1 task(s) completed
+2023-04-28 14:28:53 INFO:	***** Run Prodigal on input to predict and extract genes *****
+2023-04-28 14:28:53 INFO:	Running Prodigal with genetic code 11 in single mode
+2023-04-28 14:28:53 INFO:	Running 1 job(s) on prodigal, starting at 04/28/2023 14:28:53
+2023-04-28 14:28:54 INFO:	[prodigal]	1 of 1 task(s) completed
+2023-04-28 14:28:54 INFO:	Genetic code 11 selected as optimal
+2023-04-28 14:28:54 INFO:	***** Run HMMER on gene sequences *****
+2023-04-28 14:28:54 INFO:	Running 194 job(s) on hmmsearch, starting at 04/28/2023 14:28:54
+2023-04-28 14:28:56 INFO:	[hmmsearch]	20 of 194 task(s) completed
+2023-04-28 14:28:56 INFO:	[hmmsearch]	39 of 194 task(s) completed
+2023-04-28 14:28:56 INFO:	[hmmsearch]	59 of 194 task(s) completed
+2023-04-28 14:28:57 INFO:	[hmmsearch]	78 of 194 task(s) completed
+2023-04-28 14:28:57 INFO:	[hmmsearch]	97 of 194 task(s) completed
+2023-04-28 14:28:57 INFO:	[hmmsearch]	117 of 194 task(s) completed
+2023-04-28 14:28:57 INFO:	[hmmsearch]	136 of 194 task(s) completed
+2023-04-28 14:28:57 INFO:	[hmmsearch]	156 of 194 task(s) completed
+2023-04-28 14:28:57 INFO:	[hmmsearch]	175 of 194 task(s) completed
+2023-04-28 14:28:57 INFO:	[hmmsearch]	194 of 194 task(s) completed
+2023-04-28 14:29:00 INFO:	Results:	C:1.0%[S:1.0%,D:0.0%],F:0.5%,M:98.5%,n:194	   
 
-2022-12-05 16:34:02 INFO:	Running BUSCO using lineage dataset bacteria_odb10 (prokaryota, 2020-03-06)
-2022-12-05 16:34:02 INFO:	Running 1 job(s) on bbtools, starting at 12/05/2022 16:34:02
-2022-12-05 16:34:03 INFO:	[bbtools]	1 of 1 task(s) completed
-2022-12-05 16:34:03 INFO:	***** Run Prodigal on input to predict and extract genes *****
-2022-12-05 16:34:03 INFO:	Genetic code 11 selected as optimal
-2022-12-05 16:34:03 INFO:	***** Run HMMER on gene sequences *****
-2022-12-05 16:34:03 INFO:	Running 124 job(s) on hmmsearch, starting at 12/05/2022 16:34:03
-2022-12-05 16:34:04 INFO:	[hmmsearch]	13 of 124 task(s) completed
-2022-12-05 16:34:04 INFO:	[hmmsearch]	25 of 124 task(s) completed
-2022-12-05 16:34:04 INFO:	[hmmsearch]	38 of 124 task(s) completed
-2022-12-05 16:34:04 INFO:	[hmmsearch]	50 of 124 task(s) completed
-2022-12-05 16:34:04 INFO:	[hmmsearch]	63 of 124 task(s) completed
-2022-12-05 16:34:04 INFO:	[hmmsearch]	75 of 124 task(s) completed
-2022-12-05 16:34:04 INFO:	[hmmsearch]	87 of 124 task(s) completed
-2022-12-05 16:34:04 INFO:	[hmmsearch]	100 of 124 task(s) completed
-2022-12-05 16:34:04 INFO:	[hmmsearch]	112 of 124 task(s) completed
-2022-12-05 16:34:04 INFO:	[hmmsearch]	124 of 124 task(s) completed
-2022-12-05 16:34:05 WARNING:	BUSCO did not find any match. Make sure to check the log files if this is unexpected.
-2022-12-05 16:34:05 INFO:	Results:	C:0.0%[S:0.0%,D:0.0%],F:0.0%,M:100.0%,n:124	   
+2023-04-28 14:29:00 INFO:	Running BUSCO using lineage dataset bacteria_odb10 (prokaryota, 2020-03-06)
+2023-04-28 14:29:00 INFO:	Running 1 job(s) on bbtools, starting at 04/28/2023 14:29:00
+2023-04-28 14:29:02 INFO:	[bbtools]	1 of 1 task(s) completed
+2023-04-28 14:29:02 INFO:	***** Run Prodigal on input to predict and extract genes *****
+2023-04-28 14:29:02 INFO:	Genetic code 11 selected as optimal
+2023-04-28 14:29:02 INFO:	***** Run HMMER on gene sequences *****
+2023-04-28 14:29:02 INFO:	Running 124 job(s) on hmmsearch, starting at 04/28/2023 14:29:02
+2023-04-28 14:29:03 INFO:	[hmmsearch]	13 of 124 task(s) completed
+2023-04-28 14:29:03 INFO:	[hmmsearch]	25 of 124 task(s) completed
+2023-04-28 14:29:03 INFO:	[hmmsearch]	50 of 124 task(s) completed
+2023-04-28 14:29:03 INFO:	[hmmsearch]	63 of 124 task(s) completed
+2023-04-28 14:29:03 INFO:	[hmmsearch]	75 of 124 task(s) completed
+2023-04-28 14:29:03 INFO:	[hmmsearch]	87 of 124 task(s) completed
+2023-04-28 14:29:03 INFO:	[hmmsearch]	100 of 124 task(s) completed
+2023-04-28 14:29:03 INFO:	[hmmsearch]	112 of 124 task(s) completed
+2023-04-28 14:29:03 INFO:	[hmmsearch]	124 of 124 task(s) completed
+2023-04-28 14:29:05 WARNING:	BUSCO did not find any match. Make sure to check the log files if this is unexpected.
+2023-04-28 14:29:05 INFO:	Results:	C:0.0%[S:0.0%,D:0.0%],F:0.0%,M:100.0%,n:124	   
 
-2022-12-05 16:34:05 INFO:	Running BUSCO using lineage dataset eukaryota_odb10 (eukaryota, 2020-09-10)
-2022-12-05 16:34:05 INFO:	Running 1 job(s) on bbtools, starting at 12/05/2022 16:34:05
-2022-12-05 16:34:06 INFO:	[bbtools]	1 of 1 task(s) completed
-2022-12-05 16:34:06 INFO:	Running 1 job(s) on metaeuk, starting at 12/05/2022 16:34:06
-2022-12-05 16:34:35 INFO:	[metaeuk]	1 of 1 task(s) completed
-2022-12-05 16:34:37 INFO:	***** Run HMMER on gene sequences *****
-2022-12-05 16:34:37 INFO:	Running 255 job(s) on hmmsearch, starting at 12/05/2022 16:34:37
-2022-12-05 16:34:38 INFO:	[hmmsearch]	26 of 255 task(s) completed
-2022-12-05 16:34:38 INFO:	[hmmsearch]	51 of 255 task(s) completed
-2022-12-05 16:34:39 INFO:	[hmmsearch]	77 of 255 task(s) completed
-2022-12-05 16:34:39 INFO:	[hmmsearch]	102 of 255 task(s) completed
-2022-12-05 16:34:39 INFO:	[hmmsearch]	128 of 255 task(s) completed
-2022-12-05 16:34:39 INFO:	[hmmsearch]	153 of 255 task(s) completed
-2022-12-05 16:34:39 INFO:	[hmmsearch]	179 of 255 task(s) completed
-2022-12-05 16:34:39 INFO:	[hmmsearch]	204 of 255 task(s) completed
-2022-12-05 16:34:40 INFO:	[hmmsearch]	255 of 255 task(s) completed
-2022-12-05 16:34:41 INFO:	Validating exons and removing overlapping matches
-2022-12-05 16:34:42 INFO:	0 candidate overlapping regions found
-2022-12-05 16:34:42 INFO:	51 exons in total
-2022-12-05 16:34:42 INFO:	Results:	C:19.2%[S:19.2%,D:0.0%],F:0.8%,M:80.0%,n:255	   
+2023-04-28 14:29:05 INFO:	Running BUSCO using lineage dataset eukaryota_odb10 (eukaryota, 2020-09-10)
+2023-04-28 14:29:05 INFO:	Running 1 job(s) on bbtools, starting at 04/28/2023 14:29:05
+2023-04-28 14:29:07 INFO:	[bbtools]	1 of 1 task(s) completed
+2023-04-28 14:29:07 INFO:	Running 1 job(s) on metaeuk, starting at 04/28/2023 14:29:07
+2023-04-28 14:29:49 INFO:	[metaeuk]	1 of 1 task(s) completed
+2023-04-28 14:29:51 INFO:	***** Run HMMER on gene sequences *****
+2023-04-28 14:29:51 INFO:	Running 255 job(s) on hmmsearch, starting at 04/28/2023 14:29:51
+2023-04-28 14:29:52 INFO:	[hmmsearch]	51 of 255 task(s) completed
+2023-04-28 14:29:53 INFO:	[hmmsearch]	77 of 255 task(s) completed
+2023-04-28 14:29:53 INFO:	[hmmsearch]	102 of 255 task(s) completed
+2023-04-28 14:29:53 INFO:	[hmmsearch]	128 of 255 task(s) completed
+2023-04-28 14:29:53 INFO:	[hmmsearch]	153 of 255 task(s) completed
+2023-04-28 14:29:53 INFO:	[hmmsearch]	179 of 255 task(s) completed
+2023-04-28 14:29:53 INFO:	[hmmsearch]	204 of 255 task(s) completed
+2023-04-28 14:29:54 INFO:	[hmmsearch]	230 of 255 task(s) completed
+2023-04-28 14:29:54 INFO:	[hmmsearch]	255 of 255 task(s) completed
+2023-04-28 14:29:56 INFO:	Validating exons and removing overlapping matches
+2023-04-28 14:29:57 INFO:	0 candidate overlapping regions found
+2023-04-28 14:29:57 INFO:	51 exons in total
+2023-04-28 14:29:57 INFO:	Results:	C:19.2%[S:19.2%,D:0.0%],F:0.8%,M:80.0%,n:255	   
 
-2022-12-05 16:34:42 INFO:	Extracting missing and fragmented buscos from the file refseq_db.faa...
-2022-12-05 16:34:57 INFO:	Running 1 job(s) on metaeuk, starting at 12/05/2022 16:34:57
-2022-12-05 16:35:17 INFO:	[metaeuk]	1 of 1 task(s) completed
-2022-12-05 16:35:18 INFO:	***** Run HMMER on gene sequences *****
-2022-12-05 16:35:18 INFO:	Running 206 job(s) on hmmsearch, starting at 12/05/2022 16:35:18
-2022-12-05 16:35:20 INFO:	[hmmsearch]	21 of 206 task(s) completed
-2022-12-05 16:35:20 INFO:	[hmmsearch]	42 of 206 task(s) completed
-2022-12-05 16:35:20 INFO:	[hmmsearch]	62 of 206 task(s) completed
-2022-12-05 16:35:20 INFO:	[hmmsearch]	83 of 206 task(s) completed
-2022-12-05 16:35:20 INFO:	[hmmsearch]	104 of 206 task(s) completed
-2022-12-05 16:35:20 INFO:	[hmmsearch]	124 of 206 task(s) completed
-2022-12-05 16:35:20 INFO:	[hmmsearch]	145 of 206 task(s) completed
-2022-12-05 16:35:20 INFO:	[hmmsearch]	165 of 206 task(s) completed
-2022-12-05 16:35:20 INFO:	[hmmsearch]	186 of 206 task(s) completed
-2022-12-05 16:35:20 INFO:	[hmmsearch]	206 of 206 task(s) completed
-2022-12-05 16:35:22 INFO:	Validating exons and removing overlapping matches
-2022-12-05 16:35:23 INFO:	0 candidate overlapping regions found
-2022-12-05 16:35:23 INFO:	51 exons in total
-2022-12-05 16:35:23 INFO:	Results:	C:19.2%[S:19.2%,D:0.0%],F:0.8%,M:80.0%,n:255	   
+2023-04-28 14:29:57 INFO:	Extracting missing and fragmented buscos from the file refseq_db.faa...
+2023-04-28 14:30:14 INFO:	Running 1 job(s) on metaeuk, starting at 04/28/2023 14:30:14
+2023-04-28 14:30:45 INFO:	[metaeuk]	1 of 1 task(s) completed
+2023-04-28 14:30:48 INFO:	***** Run HMMER on gene sequences *****
+2023-04-28 14:30:48 INFO:	Running 206 job(s) on hmmsearch, starting at 04/28/2023 14:30:48
+2023-04-28 14:30:50 INFO:	[hmmsearch]	21 of 206 task(s) completed
+2023-04-28 14:30:50 INFO:	[hmmsearch]	42 of 206 task(s) completed
+2023-04-28 14:30:50 INFO:	[hmmsearch]	62 of 206 task(s) completed
+2023-04-28 14:30:51 INFO:	[hmmsearch]	83 of 206 task(s) completed
+2023-04-28 14:30:51 INFO:	[hmmsearch]	104 of 206 task(s) completed
+2023-04-28 14:30:51 INFO:	[hmmsearch]	124 of 206 task(s) completed
+2023-04-28 14:30:51 INFO:	[hmmsearch]	145 of 206 task(s) completed
+2023-04-28 14:30:51 INFO:	[hmmsearch]	165 of 206 task(s) completed
+2023-04-28 14:30:51 INFO:	[hmmsearch]	186 of 206 task(s) completed
+2023-04-28 14:30:52 INFO:	[hmmsearch]	206 of 206 task(s) completed
+2023-04-28 14:30:54 INFO:	Validating exons and removing overlapping matches
+2023-04-28 14:30:56 INFO:	0 candidate overlapping regions found
+2023-04-28 14:30:56 INFO:	51 exons in total
+2023-04-28 14:30:56 INFO:	Results:	C:19.2%[S:19.2%,D:0.0%],F:0.8%,M:80.0%,n:255	   
 
-2022-12-05 16:35:23 INFO:	eukaryota_odb10 selected
+2023-04-28 14:30:56 INFO:	eukaryota_odb10 selected
 
-2022-12-05 16:35:23 INFO:	***** Searching tree for chosen lineage to find best taxonomic match *****
+2023-04-28 14:30:56 INFO:	***** Searching tree for chosen lineage to find best taxonomic match *****
 
-2022-12-05 16:35:23 INFO:	Extract markers...
-2022-12-05 16:35:23 INFO:	Place the markers on the reference tree...
-2022-12-05 16:35:23 INFO:	Running 1 job(s) on sepp, starting at 12/05/2022 16:35:23
-2022-12-05 16:38:57 INFO:	[sepp]	1 of 1 task(s) completed
-2022-12-05 16:38:58 INFO:	Lineage saccharomycetes is selected, supported by 18 markers out of 19
-2022-12-05 16:38:58 INFO:	Running BUSCO using lineage dataset saccharomycetes_odb10 (eukaryota, 2020-08-05)
-2022-12-05 16:38:58 INFO:	Running 1 job(s) on bbtools, starting at 12/05/2022 16:38:58
-2022-12-05 16:38:59 INFO:	[bbtools]	1 of 1 task(s) completed
-2022-12-05 16:38:59 INFO:	Running 1 job(s) on metaeuk, starting at 12/05/2022 16:38:59
-2022-12-05 16:39:06 INFO:	[metaeuk]	1 of 1 task(s) completed
-2022-12-05 16:39:07 INFO:	***** Run HMMER on gene sequences *****
-2022-12-05 16:39:07 INFO:	Running 2137 job(s) on hmmsearch, starting at 12/05/2022 16:39:07
-2022-12-05 16:39:12 INFO:	[hmmsearch]	214 of 2137 task(s) completed
-2022-12-05 16:39:14 INFO:	[hmmsearch]	428 of 2137 task(s) completed
-2022-12-05 16:39:16 INFO:	[hmmsearch]	642 of 2137 task(s) completed
-2022-12-05 16:39:18 INFO:	[hmmsearch]	855 of 2137 task(s) completed
-2022-12-05 16:39:20 INFO:	[hmmsearch]	1069 of 2137 task(s) completed
-2022-12-05 16:39:22 INFO:	[hmmsearch]	1283 of 2137 task(s) completed
-2022-12-05 16:39:23 INFO:	[hmmsearch]	1496 of 2137 task(s) completed
-2022-12-05 16:39:25 INFO:	[hmmsearch]	1710 of 2137 task(s) completed
-2022-12-05 16:39:27 INFO:	[hmmsearch]	1924 of 2137 task(s) completed
-2022-12-05 16:39:31 INFO:	[hmmsearch]	2137 of 2137 task(s) completed
-2022-12-05 16:39:34 INFO:	Validating exons and removing overlapping matches
-2022-12-05 16:39:36 INFO:	0 candidate overlapping regions found
-2022-12-05 16:39:36 INFO:	45 exons in total
-2022-12-05 16:39:36 INFO:	Extracting missing and fragmented buscos from the file refseq_db.faa...
-2022-12-05 16:39:41 INFO:	Running 1 job(s) on metaeuk, starting at 12/05/2022 16:39:41
-2022-12-05 16:39:48 INFO:	[metaeuk]	1 of 1 task(s) completed
-2022-12-05 16:39:50 INFO:	***** Run HMMER on gene sequences *****
-2022-12-05 16:39:50 INFO:	Running 2093 job(s) on hmmsearch, starting at 12/05/2022 16:39:50
-2022-12-05 16:39:54 INFO:	[hmmsearch]	210 of 2093 task(s) completed
-2022-12-05 16:39:55 INFO:	[hmmsearch]	419 of 2093 task(s) completed
-2022-12-05 16:39:56 INFO:	[hmmsearch]	628 of 2093 task(s) completed
-2022-12-05 16:39:57 INFO:	[hmmsearch]	838 of 2093 task(s) completed
-2022-12-05 16:39:58 INFO:	[hmmsearch]	1047 of 2093 task(s) completed
-2022-12-05 16:39:59 INFO:	[hmmsearch]	1256 of 2093 task(s) completed
-2022-12-05 16:40:01 INFO:	[hmmsearch]	1466 of 2093 task(s) completed
-2022-12-05 16:40:02 INFO:	[hmmsearch]	1675 of 2093 task(s) completed
-2022-12-05 16:40:02 INFO:	[hmmsearch]	1884 of 2093 task(s) completed
-2022-12-05 16:40:04 INFO:	[hmmsearch]	2093 of 2093 task(s) completed
-2022-12-05 16:40:06 INFO:	Validating exons and removing overlapping matches
-2022-12-05 16:40:08 INFO:	3 candidate overlapping regions found
-2022-12-05 16:40:08 INFO:	49 exons in total
-2022-12-05 16:40:10 INFO:	Results:	C:2.1%[S:2.1%,D:0.0%],F:0.0%,M:97.9%,n:2137	   
+2023-04-28 14:30:56 INFO:	Extract markers...
+2023-04-28 14:30:56 INFO:	Place the markers on the reference tree...
+2023-04-28 14:30:56 INFO:	Running 1 job(s) on sepp, starting at 04/28/2023 14:30:56
+2023-04-28 14:35:03 INFO:	[sepp]	1 of 1 task(s) completed
+2023-04-28 14:35:03 INFO:	Lineage saccharomycetes is selected, supported by 18 markers out of 19
+2023-04-28 14:35:04 INFO:	Running BUSCO using lineage dataset saccharomycetes_odb10 (eukaryota, 2020-08-05)
+2023-04-28 14:35:04 INFO:	Running 1 job(s) on bbtools, starting at 04/28/2023 14:35:04
+2023-04-28 14:35:06 INFO:	[bbtools]	1 of 1 task(s) completed
+2023-04-28 14:35:06 INFO:	Running 1 job(s) on metaeuk, starting at 04/28/2023 14:35:06
+2023-04-28 14:35:13 INFO:	[metaeuk]	1 of 1 task(s) completed
+2023-04-28 14:35:15 INFO:	***** Run HMMER on gene sequences *****
+2023-04-28 14:35:15 INFO:	Running 2137 job(s) on hmmsearch, starting at 04/28/2023 14:35:15
+2023-04-28 14:35:18 INFO:	[hmmsearch]	214 of 2137 task(s) completed
+2023-04-28 14:35:20 INFO:	[hmmsearch]	428 of 2137 task(s) completed
+2023-04-28 14:35:22 INFO:	[hmmsearch]	642 of 2137 task(s) completed
+2023-04-28 14:35:24 INFO:	[hmmsearch]	855 of 2137 task(s) completed
+2023-04-28 14:35:26 INFO:	[hmmsearch]	1069 of 2137 task(s) completed
+2023-04-28 14:35:28 INFO:	[hmmsearch]	1283 of 2137 task(s) completed
+2023-04-28 14:35:30 INFO:	[hmmsearch]	1496 of 2137 task(s) completed
+2023-04-28 14:35:32 INFO:	[hmmsearch]	1710 of 2137 task(s) completed
+2023-04-28 14:35:34 INFO:	[hmmsearch]	1924 of 2137 task(s) completed
+2023-04-28 14:35:38 INFO:	[hmmsearch]	2137 of 2137 task(s) completed
+2023-04-28 14:35:43 INFO:	Validating exons and removing overlapping matches
+2023-04-28 14:35:45 INFO:	0 candidate overlapping regions found
+2023-04-28 14:35:45 INFO:	45 exons in total
+2023-04-28 14:35:45 INFO:	Extracting missing and fragmented buscos from the file refseq_db.faa...
+2023-04-28 14:35:50 INFO:	Running 1 job(s) on metaeuk, starting at 04/28/2023 14:35:50
+2023-04-28 14:35:57 INFO:	[metaeuk]	1 of 1 task(s) completed
+2023-04-28 14:35:59 INFO:	***** Run HMMER on gene sequences *****
+2023-04-28 14:35:59 INFO:	Running 2093 job(s) on hmmsearch, starting at 04/28/2023 14:35:59
+2023-04-28 14:36:03 INFO:	[hmmsearch]	210 of 2093 task(s) completed
+2023-04-28 14:36:05 INFO:	[hmmsearch]	419 of 2093 task(s) completed
+2023-04-28 14:36:06 INFO:	[hmmsearch]	628 of 2093 task(s) completed
+2023-04-28 14:36:08 INFO:	[hmmsearch]	838 of 2093 task(s) completed
+2023-04-28 14:36:09 INFO:	[hmmsearch]	1047 of 2093 task(s) completed
+2023-04-28 14:36:10 INFO:	[hmmsearch]	1256 of 2093 task(s) completed
+2023-04-28 14:36:11 INFO:	[hmmsearch]	1466 of 2093 task(s) completed
+2023-04-28 14:36:12 INFO:	[hmmsearch]	1675 of 2093 task(s) completed
+2023-04-28 14:36:13 INFO:	[hmmsearch]	1884 of 2093 task(s) completed
+2023-04-28 14:36:14 INFO:	[hmmsearch]	2093 of 2093 task(s) completed
+2023-04-28 14:36:16 INFO:	Validating exons and removing overlapping matches
+2023-04-28 14:36:18 INFO:	3 candidate overlapping regions found
+2023-04-28 14:36:18 INFO:	49 exons in total
+2023-04-28 14:36:20 INFO:	Results:	C:2.1%[S:2.1%,D:0.0%],F:0.0%,M:97.9%,n:2137	   
 
-2022-12-05 16:40:10 INFO:	
+2023-04-28 14:36:20 INFO:	
 
 	--------------------------------------------------
 	|Results from generic domain eukaryota_odb10      |
@@ -167,13 +167,13 @@
 	|2091	Missing BUSCOs (M)                        |
 	|2137	Total BUSCO groups searched               |
 	--------------------------------------------------
-2022-12-05 16:40:10 INFO:	BUSCO analysis done with WARNING(s). Total running time: 373 seconds
+2023-04-28 14:36:20 INFO:	BUSCO analysis done with WARNING(s). Total running time: 449 seconds
 
 ***** Summary of warnings: *****
-2022-12-05 16:33:55 WARNING:busco.BuscoConfig	Running Auto Lineage Selector as no lineage dataset was specified. This will take a little longer than normal. If you know what lineage dataset you want to use, please specify this in the config file or using the -l (--lineage-dataset) flag in the command line.
-2022-12-05 16:34:05 WARNING:busco.busco_tools.hmmer	BUSCO did not find any match. Make sure to check the log files if this is unexpected.
+2023-04-28 14:28:49 WARNING:busco.BuscoConfig	Running Auto Lineage Selector as no lineage dataset was specified. This will take a little longer than normal. If you know what lineage dataset you want to use, please specify this in the config file or using the -l (--lineage-dataset) flag in the command line.
+2023-04-28 14:29:05 WARNING:busco.busco_tools.hmmer	BUSCO did not find any match. Make sure to check the log files if this is unexpected.
 
-2022-12-05 16:40:10 INFO:	Results written in /busco_wd/test_eukaryota
-2022-12-05 16:40:10 INFO:	For assistance with interpreting the results, please consult the userguide: https://busco.ezlab.org/busco_userguide.html
+2023-04-28 14:36:20 INFO:	Results written in /busco_wd/test_eukaryota
+2023-04-28 14:36:20 INFO:	For assistance with interpreting the results, please consult the userguide: https://busco.ezlab.org/busco_userguide.html
 
-2022-12-05 16:40:10 INFO:	Visit this page https://gitlab.com/ezlab/busco#how-to-cite-busco to see how to cite BUSCO
+2023-04-28 14:36:20 INFO:	Visit this page https://gitlab.com/ezlab/busco#how-to-cite-busco to see how to cite BUSCO


=====================================
tests/unittests/AutoLineage_unittests.py
=====================================
@@ -273,26 +273,24 @@ class TestAutoLineage(unittest.TestCase):
         mock_config3 = Mock()
         mock_config1.get.side_effect = [
             "test_input1",
-            "prok_tran",
-            "prokaryota",
             "test_lineage1",
             "test_lineage1",
         ]
         mock_config2.get.side_effect = [
             "test_input2",
-            "prok_tran",
-            "prokaryota",
             "test_lineage2",
             "test_lineage2",
         ]
         mock_config3.get.side_effect = [
             "test_input3",
-            "euk_tran",
-            "eukaryota",
             "test_lineage3",
             "test_lineage3",
         ]
 
+        mock_config1.update_mode.side_effect = [("prok_tran", "prokaryota")]
+        mock_config2.update_mode.side_effect = [("prok_tran", "prokaryota")]
+        mock_config3.update_mode.side_effect = [("prok_tran", "eukaryota")]
+
         mock_analysis1 = Mock()
         mock_analysis2 = Mock()
         mock_analysis3 = Mock()


=====================================
tests/unittests/GenomeAnalysis_unittests.py
=====================================
@@ -7,8 +7,6 @@ class TestConfigManager(unittest.TestCase):
     def setUp(self) -> None:
         pass
 
-    # @patch('busco.analysis.GenomeAnalysis.BuscoAnalysis.config.get', return_value="test")
-    # @patch('busco.analysis.GenomeAnalysis.BuscoAnalysis.config.getboolean', return_value=True)
     @patch("busco.analysis.GenomeAnalysis.BuscoAnalysis.config")
     @patch("busco.analysis.BuscoAnalysis.os.path")
     @patch("busco.analysis.GenomeAnalysis.NucleotideAnalysis.check_nucleotide_file")
@@ -21,7 +19,6 @@ class TestConfigManager(unittest.TestCase):
         "busco.analysis.GenomeAnalysis.BuscoAnalysis.config.get",
         return_value="euk_genome_aug",
     )
-    # @patch('busco.analysis.GenomeAnalysis.BuscoAnalysis.config.getboolean')
     @patch("busco.analysis.GenomeAnalysis.BuscoAnalysis.config")
     @patch("busco.analysis.Analysis.logger.warning")
     @patch("busco.analysis.GenomeAnalysis.BBToolsRunner")
@@ -47,6 +44,7 @@ class TestConfigManager(unittest.TestCase):
         *args
     ):
         analysis = GenomeAnalysis.GenomeAnalysisEukaryotesAugustus()
+        analysis.domain = "eukaryota"
         analysis.init_tools()
         mock_hmmer.assert_called()
         mock_mkblast.assert_called()
@@ -71,6 +69,7 @@ class TestConfigManager(unittest.TestCase):
         self, mock_hmmer, mock_metaeuk, mock_bbtools, *args
     ):
         analysis = GenomeAnalysis.GenomeAnalysisEukaryotesMetaeuk()
+        analysis.domain = "eukaryota"
         analysis.init_tools()
         mock_hmmer.assert_called()
         mock_metaeuk.assert_called()
@@ -89,6 +88,7 @@ class TestConfigManager(unittest.TestCase):
         self, mock_hmmer, mock_prodigal, mock_bbtools, *args
     ):
         analysis = GenomeAnalysis.GenomeAnalysisProkaryotes()
+        analysis.domain = "prokaryota"
         analysis.init_tools()
         mock_hmmer.assert_called()
         mock_prodigal.assert_called()
@@ -122,16 +122,5 @@ class TestConfigManager(unittest.TestCase):
         mock_run_bbtools.assert_called()
         analysis.write_gff_files.assert_called()
 
-        # @patch('busco.GenomeAnalysis.GenomeAnalysisEukaryotesAugustus._rerun_analysis')
-        # @patch('busco.GenomeAnalysis.GenomeAnalysisEukaryotesAugustus.run_hmmer')
-        # @patch('busco.GenomeAnalysis.GenomeAnalysisEukaryotesAugustus._run_augustus')
-        # @patch('busco.GenomeAnalysis.BLASTAnalysis._run_tblastn')
-        # @patch('busco.GenomeAnalysis.BLASTAnalysis._run_mkblast')
-        # mock_mkblast.assert_called()
-        # mock_tblastn.assert_called()
-        # mock_augustus.assert_called()
-        # mock_hmmer.assert_called()
-        # mock_rerun.assert_called()
-
     def tearDown(self) -> None:
         pass



View it on GitLab: https://salsa.debian.org/med-team/busco/-/compare/15a2dbe93d50896b85ccaa51be4c9d5442f4879b...cfad236d2146476512425d61a6473bff672e77f1

-- 
View it on GitLab: https://salsa.debian.org/med-team/busco/-/compare/15a2dbe93d50896b85ccaa51be4c9d5442f4879b...cfad236d2146476512425d61a6473bff672e77f1
You're receiving this email because of your account on salsa.debian.org.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://alioth-lists.debian.net/pipermail/debian-med-commit/attachments/20230704/23c2fb34/attachment-0001.htm>


More information about the debian-med-commit mailing list