[med-svn] [Git][med-team/flye][upstream] New upstream version 2.9.6+dfsg

Étienne Mollier (@emollier) gitlab at salsa.debian.org
Wed Aug 13 21:30:59 BST 2025



Étienne Mollier pushed to branch upstream at Debian Med / flye


Commits:
1028a41a by Étienne Mollier at 2025-08-13T21:55:38+02:00
New upstream version 2.9.6+dfsg
- - - - -


14 changed files:

- README.md
- docs/NEWS.md
- docs/USAGE.md
- + flye/.main.py.swo
- flye/__build__.py
- flye/__version__.py
- flye/config/bin_cfg/asm_nano_hq.cfg
- flye/main.py
- flye/polishing/alignment.py
- flye/polishing/bubbles.py
- flye/tests/test_toy.py
- flye/utils/fasta_parser.py
- flye/utils/sam_parser.py
- flye/utils/utils.py


Changes:

=====================================
README.md
=====================================
@@ -3,7 +3,7 @@ Flye assembler
 
 [![BioConda Install](https://img.shields.io/conda/dn/bioconda/flye.svg?style=flag&label=BioConda%20install)](https://anaconda.org/bioconda/flye)
 
-### Version: 2.9.5
+### Version: 2.9.6
 
 Flye is a de novo assembler for single-molecule sequencing reads,
 such as those produced by PacBio and Oxford Nanopore Technologies.
@@ -16,6 +16,10 @@ Currently, Flye will produce collapsed assemblies of diploid genomes,
 represented by a single mosaic haplotype. To recover two phased haplotypes
 consider applying [HapDup](https://github.com/fenderglass/hapdup) after the assembly.
 
+If you are using Flye / metaFlye to assemble heteroztgous bacterial genomes or metagenomes,
+you may consider using [strainy](https://github.com/katerinakazantseva/strainy) to
+call and quanitify heterozygosity and reveal collapsed strains.
+
 Manuals
 -------
 
@@ -26,6 +30,12 @@ Manuals
 Latest updates
 --------------
 
+Flye 2.9.6 release (2 May 2025)
+==============================
+* Minor fix release, most assmeblies should not change
+* Fixed rare race condition in polishing stage
+* R10 ONT parameters (3% error) are now default for --nano-hq
+
 Flye 2.9.5 release (27 Aug 2024)
 ===============================
 * Python 3.12 support, Python 2 dropped


=====================================
docs/NEWS.md
=====================================
@@ -1,3 +1,9 @@
+Flye 2.9.6 release (2 May 2025)
+==============================
+* Minor fix release, most assmeblies should not change
+* Fixed rare race condition in polishing stage
+* R10 ONT parameters (3% error) are now default for --nano-hq
+
 Flye 2.9.5 release (27 Aug 2024)
 ===============================
 * Python 3.12 support, Python 2 dropped


=====================================
docs/USAGE.md
=====================================
@@ -124,14 +124,13 @@ The dataset was originally released by the
 
 ### Oxford Nanopore
 
-* The default mode for regular ONT data is `--nano-raw`. It works well for a good
-range of datasets, from old R7 pores to the most recent R9.x and R10.x. The
-expected error rate is 10-15%.
+* For R10 data, use `--nano-hq`. Expected error rate is <3%.
 
-* For the most recent ONT data basecalled with Guppy5+ SUP use the new `--nano-hq` mode.
-Expected error rate is <5%.
+* For the R9 data basecalled with Guppy5+ use the new `--nano-hq --read-error 0.05`.
+The expected error rate is <5%.
 
-* For Q20 data, use a combination of `--nano-hq` and `--read-error 0.03`.
+* For older ONT data (e.g. R7-older R9 chemistry) use  `--nano-raw`. The
+expected error rate is 10-15%.
 
 * If you have error-corrected ONT reads (with methods such as Canu), use `--nano-corr`.
 
@@ -153,7 +152,6 @@ Error could be adjusted via `--read-error`.
 * If you have error-corrected PacBio reads (with methods such as Canu), use `--pacbio-corr`.
 
 ### Consensus of multiple contig sets
-
 WARNING: this mode is being deprecated and will be removed in the future versions.
 This is to make the future maintenance of Flye easier. Instead, we suggest to use
 more specialized software, like [quickmerge](https://github.com/mahulchak/quickmerge).
@@ -202,6 +200,10 @@ It is sensitive to very short sequences and underrepresented organisms at low re
 
 For relatively complex single genomes, "regular" mode often outperforms metageomic mode.
 
+If you are using Flye / metaFlye to assemble heteroztgous bacterial genomes or metagenomes,
+you may consider using [strainy](https://github.com/katerinakazantseva/strainy) to
+call and quanitify heterozygosity and reveal collapsed strains.
+
 ### Haplotype mode
 
 By default, Flye (and metaFlye) collapses graph structures caused by


=====================================
flye/.main.py.swo
=====================================
Binary files /dev/null and b/flye/.main.py.swo differ


=====================================
flye/__build__.py
=====================================
@@ -1 +1 @@
-__build__ = 1801
+__build__ = 1802


=====================================
flye/__version__.py
=====================================
@@ -1 +1 @@
-__version__ = "2.9.5"
+__version__ = "2.9.6"


=====================================
flye/config/bin_cfg/asm_nano_hq.cfg
=====================================
@@ -19,9 +19,9 @@ maximum_overhang = 1500
 repeat_kmer_rate = 100
 
 #overlap similarity thresholds
-assemble_ovlp_divergence = 0.05
+assemble_ovlp_divergence = 0.03
 assemble_divergence_relative = 1
-repeat_graph_ovlp_divergence = 0.05
+repeat_graph_ovlp_divergence = 0.03
 read_align_ovlp_divergence = 0.10
 hpc_scoring_on = 1
 


=====================================
flye/main.py
=====================================
@@ -470,7 +470,8 @@ def _run(args):
     current_job = 0
     if args.resume or args.resume_from:
         if not os.path.exists(save_file):
-            raise ResumeException("Can't find save file")
+            raise ResumeException("Can't resume, please check if the output directory contains intermediate results from previous Flye run. "
+                                  "Otherwise, make a new run without --resume option")
 
         logger.info("Resuming previous run")
         if args.resume_from:
@@ -542,7 +543,7 @@ def _epilog():
     return ("Input reads can be in FASTA or FASTQ format, uncompressed\n"
             "or compressed with gz. Currently, PacBio (CLR, HiFi, corrected)\n"
             "and ONT reads (regular, HQ, corrected) are supported. Expected error rates are\n"
-            "<15% for PB CLR/regular ONT; <5% for ONT HQ, <3% for corrected, and <1% for HiFi. Note that Flye\n"
+            "<15% for PB CLR/regular ONT; <3% for ONT R10, <3% for corrected, and <1% for HiFi. Note that Flye\n"
             "was primarily developed to run on uncorrected reads. You may specify multiple\n"
             "files with reads (separated by spaces). Mixing different read\n"
             "types is not yet supported. The --meta option enables the mode\n"
@@ -597,13 +598,13 @@ def main():
                         help="PacBio HiFi reads (<1%% error)")
     read_group.add_argument("--nano-raw", dest="nano_raw", nargs="+",
                         default=None, metavar="path",
-                        help="ONT regular reads, pre-Guppy5 (<20%% error)")
+                        help="ONT reads with odler chemistries, pre R9 Guppy5 (10-20%% error)")
     read_group.add_argument("--nano-corr", dest="nano_corrected", nargs="+",
                         default=None, metavar="path",
                         help="ONT reads that were corrected with other methods (<3%% error)")
     read_group.add_argument("--nano-hq", dest="nano_hq", nargs="+",
                         default=None, metavar="path",
-                        help="ONT high-quality reads: Guppy5+ SUP or Q20 (<5%% error)")
+                        help="ONT R10 reads, aka Q20 (<3%% error). For R9 Guppy5+, increase --read-error slightly")
     read_group.add_argument("--subassemblies", dest="subassemblies", nargs="+",
                         default=None, metavar="path",
                         help="[deprecated] high-quality contigs input")


=====================================
flye/polishing/alignment.py
=====================================
@@ -274,7 +274,7 @@ def _run_minimap(reference_file, reads_files, num_proc, reads_type, out_file):
                               "set -eo pipefail; " + " ".join(cmdline)],
                               stderr=open(stderr_file, "w"),
                               stdout=open(os.devnull, "w"))
-        subprocess.check_call(SAMTOOLS_BIN + " index -@ 4 " + "'" + out_file + "'", shell=True)
+        subprocess.check_call(SAMTOOLS_BIN + " index -c -@ 4 " + "'" + out_file + "'", shell=True)
         #os.remove(stderr_file)
 
     except (subprocess.CalledProcessError, OSError) as e:


=====================================
flye/polishing/bubbles.py
=====================================
@@ -97,7 +97,8 @@ def _thread_worker(aln_reader, chunk_feeder, contigs_info, err_mode,
             if bubbles_file_lock:
                 bubbles_file_lock.acquire()
 
-            _output_bubbles(ctg_bubbles, open(bubbles_file, "a"))
+            with open(bubbles_file, "a") as fout:
+                _output_bubbles(ctg_bubbles, fout)
             results_queue.put((ctg_id, len(ctg_bubbles), num_long_bubbles,
                                num_empty, num_long_branch, aln_errors,
                                mean_cov))


=====================================
flye/tests/test_toy.py
=====================================
@@ -15,11 +15,10 @@ import os
 import sys
 import subprocess
 import shutil
-from distutils.spawn import find_executable
 
 
 def test_toy():
-    if not find_executable("flye"):
+    if not shutil.which("flye"):
         sys.exit("flye is not installed!")
 
     print("Running toy test:\n")


=====================================
flye/utils/fasta_parser.py
=====================================
@@ -64,16 +64,22 @@ def stream_sequence(filename):
             handle = io.BufferedReader(gz)
 
         if fastq:
-            for hdr, seq, _ in _read_fastq(handle):
+            for line, (hdr, seq, _) in enumerate(_read_fastq(handle)):
+                if b'@' in hdr:
+                    raise FastaError("Fasta/q sequence header has '@' symbol in file: {0}, entry {1}"
+                                     .format(filename, line))
                 if not _validate_seq(seq):
-                    raise FastaError("Invalid char while reading {0}"
-                                     .format(filename))
+                    raise FastaError("Invalid sequence symbol in file: {0}, entry {1}"
+                                     .format(filename, line))
                 yield _STR(hdr), _STR(_to_acgt_bytes(seq))
         else:
-            for hdr, seq in _read_fasta(handle):
+            for line, (hdr, seq) in enumerate(_read_fasta(handle)):
+                if b'@' in hdr:
+                    raise FastaError("Fasta/q sequence header has '@' symbol in file: {0}, entry {1}"
+                                     .format(filename, line))
                 if not _validate_seq(seq):
-                    raise FastaError("Invalid char while reading {0}"
-                                     .format(filename))
+                    raise FastaError("Invalid sequence symbol in file: {0}, entry {1}"
+                                     .format(filename, line))
                 yield _STR(hdr), _STR(_to_acgt_bytes(seq))
 
     except IOError as e:


=====================================
flye/utils/sam_parser.py
=====================================
@@ -198,7 +198,7 @@ class SynchronizedSamReader(object):
         #check that alignment exists
         if not os.path.exists(sam_alignment):
             raise AlignmentException("Can't open {0}".format(sam_alignment))
-        if not os.path.exists(sam_alignment + ".bai"):
+        if not (os.path.exists(sam_alignment + ".bai") or os.path.exists(sam_alignment + ".csi")):
             raise AlignmentException("Bam not indexed: {0}".format(sam_alignment))
 
         #will not be changed during exceution, each process has its own copy


=====================================
flye/utils/utils.py
=====================================
@@ -6,6 +6,10 @@ from __future__ import absolute_import
 import os
 import signal
 import multiprocessing
+import logging
+
+
+logger = logging.getLogger()
 
 
 def which(program):



View it on GitLab: https://salsa.debian.org/med-team/flye/-/commit/1028a41ad1b7a1c105c56e8cca58ca117cc3185b

-- 
View it on GitLab: https://salsa.debian.org/med-team/flye/-/commit/1028a41ad1b7a1c105c56e8cca58ca117cc3185b
You're receiving this email because of your account on salsa.debian.org.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://alioth-lists.debian.net/pipermail/debian-med-commit/attachments/20250813/714c6a11/attachment-0001.htm>


More information about the debian-med-commit mailing list