[med-svn] [Git][med-team/flye][master] 6 commits: routine-update: New upstream version

Wed May 22 14:32:29 BST 2024


Alexandre Detiste pushed to branch master at Debian Med / flye


Commits:
533c6422 by Alexandre Detiste at 2024-05-22T14:18:46+02:00
routine-update: New upstream version

- - - - -
ebaf557e by Alexandre Detiste at 2024-05-22T14:18:47+02:00
New upstream version 2.9.4+dfsg
- - - - -
93796a3f by Alexandre Detiste at 2024-05-22T14:19:09+02:00
Update upstream source from tag 'upstream/2.9.4+dfsg'

Update to upstream version '2.9.4+dfsg'
with Debian dir 138944cd37f3ce82baf8095768a3e830ff178c7a
- - - - -
6e953a1b by Alexandre Detiste at 2024-05-22T14:19:10+02:00
routine-update: Standards-Version: 4.7.0

- - - - -
6640e6a4 by Alexandre Detiste at 2024-05-22T14:22:15+02:00
refresh patch

- - - - -
8bddd17f by Alexandre Detiste at 2024-05-22T15:31:27+02:00
release

- - - - -


16 changed files:

- README.md
- debian/changelog
- debian/control
- debian/patches/python3.12.patch
- docs/NEWS.md
- docs/USAGE.md
- flye/__build__.py
- flye/__version__.py
- flye/main.py
- flye/polishing/bubbles.py
- flye/polishing/polish.py
- flye/utils/sam_parser.py
- src/polishing/bubble_processor.cpp
- src/polishing/bubble_processor.h
- src/polishing/subs_matrix.cpp
- src/polishing/subs_matrix.h


Changes:

=====================================
README.md
=====================================
@@ -3,7 +3,7 @@ Flye assembler
 
 [![BioConda Install](https://img.shields.io/conda/dn/bioconda/flye.svg?style=flag&label=BioConda%20install)](https://anaconda.org/bioconda/flye)
 
-### Version: 2.9.3
+### Version: 2.9.4
 
 Flye is a de novo assembler for single-molecule sequencing reads,
 such as those produced by PacBio and Oxford Nanopore Technologies.
@@ -178,7 +178,7 @@ Publications
 Mikhail Kolmogorov, Derek M. Bickhart, Bahar Behsaz, Alexey Gurevich, Mikhail Rayko, Sung Bong
 Shin, Kristen Kuhn, Jeffrey Yuan, Evgeny Polevikov, Timothy P. L. Smith and Pavel A. Pevzner
 "metaFlye: scalable long-read metagenome assembly using repeat graphs", Nature Methods, 2020
-[doi:s41592-020-00971-x](https://doi.org/10.1038/s41592-020-00971-x)
+[doi:10.1038/s41592-020-00971-x](https://doi.org/10.1038/s41592-020-00971-x)
 
 Mikhail Kolmogorov, Jeffrey Yuan, Yu Lin and Pavel Pevzner, 
 "Assembly of Long Error-Prone Reads Using Repeat Graphs", Nature Biotechnology, 2019


=====================================
debian/changelog
=====================================
@@ -1,3 +1,11 @@
+flye (2.9.4+dfsg-1) unstable; urgency=medium
+
+  * Team upload.
+  * New upstream version
+  * Standards-Version: 4.7.0 (routine-update)
+
+ -- Alexandre Detiste <tchet at debian.org>  Wed, 22 May 2024 14:21:21 +0200
+
 flye (2.9.3+dfsg2-1) unstable; urgency=medium
 
   [ Étienne Mollier ]


=====================================
debian/control
=====================================
@@ -12,7 +12,7 @@ Build-Depends: debhelper-compat (= 13),
                libminimap2-dev,
                samtools,
                zlib1g-dev
-Standards-Version: 4.6.2
+Standards-Version: 4.7.0
 Vcs-Browser: https://salsa.debian.org/med-team/flye
 Vcs-Git: https://salsa.debian.org/med-team/flye.git
 Homepage: https://github.com/fenderglass/Flye


=====================================
debian/patches/python3.12.patch
=====================================
@@ -4,11 +4,9 @@ Author: Andreas Tille <tille at debian.org>
 Last-Update: Fri, 02 Feb 2024 10:40:30 +0100
 
 
-diff --git a/flye/assembly/scaffolder.py b/flye/assembly/scaffolder.py
-index f52a70d..5d957b0 100644
 --- a/flye/assembly/scaffolder.py
 +++ b/flye/assembly/scaffolder.py
-@@ -12,7 +12,7 @@ import logging
+@@ -12,7 +12,7 @@
  
  import flye.utils.fasta_parser as fp
  import flye.config.py_cfg as cfg
@@ -17,11 +15,9 @@ index f52a70d..5d957b0 100644
  
  logger = logging.getLogger()
  
-diff --git a/flye/config/configurator.py b/flye/config/configurator.py
-index 3df64cb..28a7e1f 100644
 --- a/flye/config/configurator.py
 +++ b/flye/config/configurator.py
-@@ -12,7 +12,7 @@ import logging
+@@ -12,7 +12,7 @@
  
  import flye.utils.fasta_parser as fp
  import flye.config.py_cfg as cfg
@@ -30,11 +26,9 @@ index 3df64cb..28a7e1f 100644
  
  
  logger = logging.getLogger()
-diff --git a/flye/main.py b/flye/main.py
-index d069808..afd0c60 100644
 --- a/flye/main.py
 +++ b/flye/main.py
-@@ -33,7 +33,7 @@ import flye.utils.fasta_parser as fp
+@@ -33,7 +33,7 @@
  #import flye.trestle.trestle as tres
  #import flye.trestle.graph_resolver as tres_graph
  from flye.repeat_graph.repeat_graph import RepeatGraph
@@ -43,11 +37,9 @@ index d069808..afd0c60 100644
  
  logger = logging.getLogger()
  
-diff --git a/flye/polishing/alignment.py b/flye/polishing/alignment.py
-index c7ad442..09cfc65 100644
 --- a/flye/polishing/alignment.py
 +++ b/flye/polishing/alignment.py
-@@ -18,8 +18,8 @@ from copy import copy
+@@ -18,8 +18,8 @@
  import flye.utils.fasta_parser as fp
  from flye.utils.utils import which, get_median
  from flye.utils.sam_parser import AlignmentException
@@ -58,20 +50,18 @@ index c7ad442..09cfc65 100644
  
  
  logger = logging.getLogger()
-diff --git a/flye/polishing/bubbles.py b/flye/polishing/bubbles.py
-index 4a04bf9..e13c623 100644
 --- a/flye/polishing/bubbles.py
 +++ b/flye/polishing/bubbles.py
-@@ -10,7 +10,7 @@ from __future__ import absolute_import
+@@ -10,7 +10,7 @@
  from __future__ import division
  import logging
  from bisect import bisect
 -from flye.six.moves import range
 +from six.moves import range
  from collections import defaultdict
+ from queue import Queue
  
- import multiprocessing
-@@ -21,7 +21,7 @@ import flye.config.py_cfg as cfg
+@@ -22,7 +22,7 @@
  from flye.polishing.alignment import shift_gaps, get_uniform_alignments
  from flye.utils.sam_parser import SynchronizedSamReader, SynchonizedChunkManager
  from flye.utils.utils import process_in_parallel, get_median
@@ -80,11 +70,9 @@ index 4a04bf9..e13c623 100644
  
  
  logger = logging.getLogger()
-diff --git a/flye/polishing/consensus.py b/flye/polishing/consensus.py
-index 0e0befc..2aa3499 100644
 --- a/flye/polishing/consensus.py
 +++ b/flye/polishing/consensus.py
-@@ -10,8 +10,8 @@ from __future__ import absolute_import
+@@ -10,8 +10,8 @@
  from __future__ import division
  import logging
  from collections import defaultdict
@@ -95,7 +83,7 @@ index 0e0befc..2aa3499 100644
  
  import multiprocessing
  import traceback
-@@ -21,7 +21,7 @@ from flye.utils.sam_parser import SynchronizedSamReader, SynchonizedChunkManager
+@@ -21,7 +21,7 @@
  import flye.config.py_cfg as cfg
  import flye.utils.fasta_parser as fp
  from flye.utils.utils import process_in_parallel
@@ -104,11 +92,9 @@ index 0e0befc..2aa3499 100644
  
  logger = logging.getLogger()
  
-diff --git a/flye/polishing/polish.py b/flye/polishing/polish.py
-index 78060c1..c55e7b0 100644
 --- a/flye/polishing/polish.py
 +++ b/flye/polishing/polish.py
-@@ -21,8 +21,8 @@ from flye.polishing.bubbles import make_bubbles
+@@ -21,8 +21,8 @@
  import flye.utils.fasta_parser as fp
  from flye.utils.utils import which
  import flye.config.py_cfg as cfg
@@ -119,11 +105,9 @@ index 78060c1..c55e7b0 100644
  
  
  POLISH_BIN = "flye-modules"
-diff --git a/flye/short_plasmids/circular_sequences.py b/flye/short_plasmids/circular_sequences.py
-index 92a448b..c87ac86 100644
 --- a/flye/short_plasmids/circular_sequences.py
 +++ b/flye/short_plasmids/circular_sequences.py
-@@ -9,8 +9,8 @@ import flye.short_plasmids.unmapped_reads as unmapped
+@@ -9,8 +9,8 @@
  import flye.utils.fasta_parser as fp
  from flye.utils.sam_parser import read_paf, read_paf_grouped
  import logging
@@ -134,11 +118,9 @@ index 92a448b..c87ac86 100644
  
  logger = logging.getLogger()
  
-diff --git a/flye/short_plasmids/unmapped_reads.py b/flye/short_plasmids/unmapped_reads.py
-index fd218e5..bdbab3c 100644
 --- a/flye/short_plasmids/unmapped_reads.py
 +++ b/flye/short_plasmids/unmapped_reads.py
-@@ -9,8 +9,8 @@ import flye.utils.fasta_parser as fp
+@@ -9,8 +9,8 @@
  from flye.utils.sam_parser import read_paf_grouped
  import logging
  from collections import defaultdict
@@ -149,8 +131,6 @@ index fd218e5..bdbab3c 100644
  
  logger = logging.getLogger()
  
-diff --git a/flye/short_plasmids/utils.py b/flye/short_plasmids/utils.py
-index 0d1f817..75a0a3d 100644
 --- a/flye/short_plasmids/utils.py
 +++ b/flye/short_plasmids/utils.py
 @@ -2,7 +2,7 @@
@@ -162,11 +142,9 @@ index 0d1f817..75a0a3d 100644
  
  def find_connected_components(graph):
      def dfs(start_vertex, connected_components_counter):
-diff --git a/flye/trestle/divergence.py b/flye/trestle/divergence.py
-index e3b2644..f9c9862 100644
 --- a/flye/trestle/divergence.py
 +++ b/flye/trestle/divergence.py
-@@ -12,7 +12,7 @@ from __future__ import absolute_import
+@@ -12,7 +12,7 @@
  from __future__ import division
  import logging
  from collections import defaultdict
@@ -175,7 +153,7 @@ index e3b2644..f9c9862 100644
  
  import multiprocessing
  import os.path
-@@ -22,7 +22,7 @@ from flye.utils.sam_parser import SynchronizedSamReader, SynchonizedChunkManager
+@@ -22,7 +22,7 @@
  import flye.utils.fasta_parser as fp
  from flye.utils.utils import process_in_parallel
  import flye.config.py_cfg as config
@@ -184,11 +162,9 @@ index e3b2644..f9c9862 100644
  
  logger = logging.getLogger()
  
-diff --git a/flye/trestle/graph_resolver.py b/flye/trestle/graph_resolver.py
-index 9944831..b6ad6d7 100644
 --- a/flye/trestle/graph_resolver.py
 +++ b/flye/trestle/graph_resolver.py
-@@ -13,8 +13,8 @@ from collections import defaultdict
+@@ -13,8 +13,8 @@
  
  import flye.utils.fasta_parser as fp
  from flye.repeat_graph.graph_alignment import iter_alignments
@@ -199,11 +175,9 @@ index 9944831..b6ad6d7 100644
  
  logger = logging.getLogger()
  
-diff --git a/flye/trestle/trestle.py b/flye/trestle/trestle.py
-index 244e4ad..677090d 100644
 --- a/flye/trestle/trestle.py
 +++ b/flye/trestle/trestle.py
-@@ -25,8 +25,8 @@ import flye.polishing.polish as pol
+@@ -25,8 +25,8 @@
  
  import flye.trestle.divergence as div
  import flye.trestle.trestle_config as trestle_config
@@ -214,11 +188,9 @@ index 244e4ad..677090d 100644
  
  logger = logging.getLogger()
  
-diff --git a/flye/utils/fasta_parser.py b/flye/utils/fasta_parser.py
-index 66c0b15..54f7dca 100644
 --- a/flye/utils/fasta_parser.py
 +++ b/flye/utils/fasta_parser.py
-@@ -23,7 +23,7 @@ else:
+@@ -23,7 +23,7 @@
      _STR = bytes.decode
      _BYTES = str.encode
  
@@ -227,11 +199,9 @@ index 66c0b15..54f7dca 100644
  
  
  logger = logging.getLogger()
-diff --git a/flye/utils/sam_parser.py b/flye/utils/sam_parser.py
-index 0db41f0..a16bb6b 100644
 --- a/flye/utils/sam_parser.py
 +++ b/flye/utils/sam_parser.py
-@@ -32,8 +32,8 @@ else:
+@@ -32,8 +32,8 @@
      _STR = bytes.decode
      _BYTES = str.encode
  


=====================================
docs/NEWS.md
=====================================
@@ -1,3 +1,7 @@
+Flye 2.9.4 release (14 May 2024)
+===============================
+* Minor technical changes
+
 Flye 2.9.3 release (28 November 2023)
 ====================================
 * Disjointig step speedup for `--nano-hq` mode


=====================================
docs/USAGE.md
=====================================
@@ -316,7 +316,7 @@ Scaffold gaps are marked with `??` symbols, and `*` symbol denotes a
 terminal graph node.
 
 Alternative contigs (representing alternative haplotypes) will have the same
-alt. group ID. Primary contigs are marked by `*`. Note that the ouptut of
+alt. group ID. Primary contigs are marked by `*`. Note that the outptut of
 alternative contigs could be disabled via the `--no-alt-contigs` option.
 
 ## <a name="graph"></a> Repeat graph


=====================================
flye/__build__.py
=====================================
@@ -1 +1 @@
-__build__ = 1797
+__build__ = 1799


=====================================
flye/__version__.py
=====================================
@@ -1 +1 @@
-__version__ = "2.9.3"
+__version__ = "2.9.4"


=====================================
flye/main.py
=====================================
@@ -406,12 +406,13 @@ def _set_genome_size(args):
         args.genome_size = human2bytes(args.genome_size.upper())
 
 
-def _run_polisher_only(args):
+def _run_polisher_only(args, output_progress=True):
     """
     Runs standalone polisher
     """
-    logger.info("Running Flye polisher")
-    logger.debug("Cmd: %s", " ".join(sys.argv))
+    if output_progress:
+        logger.info("Running Flye polisher")
+        logger.debug("Cmd: %s", " ".join(sys.argv))
     bam_input = False
 
     for read_file in args.reads:
@@ -434,8 +435,9 @@ def _run_polisher_only(args):
 
     pol.polish(args.polish_target, args.reads, args.out_dir,
                args.num_iters, args.threads, args.platform,
-               args.read_type, output_progress=True)
-    logger.info("Done!")
+               args.read_type, output_progress)
+    if output_progress:
+        logger.info("Done!")
 
 
 def _run(args):


=====================================
flye/polishing/bubbles.py
=====================================
@@ -12,6 +12,7 @@ import logging
 from bisect import bisect
 from flye.six.moves import range
 from collections import defaultdict
+from queue import Queue
 
 import multiprocessing
 import traceback
@@ -93,11 +94,16 @@ def _thread_worker(aln_reader, chunk_feeder, contigs_info, err_mode,
             for b in ctg_bubbles:
                 b.position += ctg_region.start
 
-            with bubbles_file_lock:
-                _output_bubbles(ctg_bubbles, open(bubbles_file, "a"))
+            if bubbles_file_lock:
+                bubbles_file_lock.acquire()
+
+            _output_bubbles(ctg_bubbles, open(bubbles_file, "a"))
             results_queue.put((ctg_id, len(ctg_bubbles), num_long_bubbles,
                                num_empty, num_long_branch, aln_errors,
                                mean_cov))
+            
+            if bubbles_file_lock:
+                bubbles_file_lock.release()
 
             del profile
             del ctg_bubbles
@@ -116,20 +122,26 @@ def make_bubbles(alignment_path, contigs_info, contigs_path,
     CHUNK_SIZE = 1000000
 
     contigs_fasta = fp.read_sequence_dict(contigs_path)
-    manager = multiprocessing.Manager()
+    manager = None if num_proc == 1 else multiprocessing.Manager()
     aln_reader = SynchronizedSamReader(alignment_path, contigs_fasta, manager,
                                        cfg.vals["max_read_coverage"], use_secondary=True)
     chunk_feeder = SynchonizedChunkManager(contigs_fasta, manager, chunk_size=CHUNK_SIZE)
 
-    results_queue = manager.Queue()
-    error_queue = manager.Queue()
-    bubbles_out_lock = multiprocessing.Lock()
-    #bubbles_out_handle = open(bubbles_out, "w")
+    if manager:
+        results_queue = manager.Queue()
+        error_queue = manager.Queue()
+        bubbles_out_lock = multiprocessing.Lock()
 
-    process_in_parallel(_thread_worker, (aln_reader, chunk_feeder, contigs_info, err_mode,
+        process_in_parallel(_thread_worker, (aln_reader, chunk_feeder, contigs_info, err_mode,
                          results_queue, error_queue, bubbles_out, bubbles_out_lock), num_proc)
-    #_thread_worker(aln_reader, chunk_feeder, contigs_info, err_mode,
-    #               results_queue, error_queue, bubbles_out, bubbles_out_lock)
+    else:
+        results_queue = Queue()
+        error_queue = Queue()
+        bubbles_out_lock = None
+
+        _thread_worker(aln_reader, chunk_feeder, contigs_info, err_mode,
+                results_queue, error_queue, bubbles_out, bubbles_out_lock)
+        
     if not error_queue.empty():
         raise error_queue.get()
 


=====================================
flye/polishing/polish.py
=====================================
@@ -104,6 +104,7 @@ def polish(contig_seqs, read_seqs, work_dir, num_iters, num_threads, read_platfo
                 logger.disabled = logger_state
             open(stats_file, "w").write("#seq_name\tlength\tcoverage\n")
             open(polished_file, "w")
+            gzip.open(bed_coverage, "wt")
             return polished_file, stats_file
 
         #####


=====================================
flye/utils/sam_parser.py
=====================================
@@ -137,8 +137,12 @@ class SynchonizedChunkManager(object):
         #will be shared between processes
         #self.shared_manager = multiprocessing.Manager()
         self.shared_num_jobs = multiprocessing.Value(ctypes.c_int, 0)
-        self.shared_lock = multiproc_manager.Lock()
-        self.shared_eof = multiprocessing.Value(ctypes.c_bool, False)
+        if multiproc_manager:
+            self.shared_lock = multiproc_manager.Lock()
+            self.shared_eof = multiprocessing.Value(ctypes.c_bool, False)
+        else:
+            self.shared_lock = None
+            self.shared_eof = ctypes.c_bool(False)
 
 
         for ctg_id in reference_fasta:
@@ -161,15 +165,22 @@ class SynchonizedChunkManager(object):
     def get_chunk(self):
         job_id = None
         while True:
-            with self.shared_lock:
-                if self.shared_eof.value:
-                    return None
-
-                job_id = self.shared_num_jobs.value
-                self.shared_num_jobs.value = self.shared_num_jobs.value + 1
-                if self.shared_num_jobs.value == len(self.fetch_list):
-                    self.shared_eof.value = True
-                break
+            if self.shared_lock:
+                self.shared_lock.acquire()
+
+            if self.shared_eof.value:
+                if self.shared_lock:
+                    self.shared_lock.release()
+                return None
+            
+            job_id = self.shared_num_jobs.value
+            self.shared_num_jobs.value = self.shared_num_jobs.value + 1
+            if self.shared_num_jobs.value == len(self.fetch_list):
+                self.shared_eof.value = True
+
+            if self.shared_lock:
+                self.shared_lock.release()
+            break
 
             time.sleep(0.01)
 
@@ -197,7 +208,7 @@ class SynchronizedSamReader(object):
         self.cigar_parser = re.compile(b"[0-9]+[MIDNSHP=X]")
 
         #self.shared_manager = multiprocessing.Manager()
-        self.ref_fasta = multiproc_manager.dict()
+        self.ref_fasta = dict() if multiproc_manager == None else multiproc_manager.dict()
         for (h, s) in iteritems(reference_fasta):
             self.ref_fasta[_BYTES(h)] = _BYTES(s)
 


=====================================
src/polishing/bubble_processor.cpp
=====================================
@@ -21,14 +21,14 @@ namespace
 BubbleProcessor::BubbleProcessor(const std::string& subsMatPath,
 								 const std::string& hopoMatrixPath,
 								 bool showProgress, bool hopoEnabled):
+	_hopoEnabled(hopoEnabled),
 	_subsMatrix(subsMatPath),
-	_hopoMatrix(hopoMatrixPath),
+	_hopoMatrix(hopoMatrixPath, _hopoEnabled),
 	_generalPolisher(_subsMatrix),
 	_homoPolisher(_subsMatrix, _hopoMatrix),
 	_dinucFixer(_subsMatrix),
 	_verbose(false),
-	_showProgress(showProgress),
-	_hopoEnabled(hopoEnabled)
+	_showProgress(showProgress)
 {
 }
 


=====================================
src/polishing/bubble_processor.h
=====================================
@@ -37,6 +37,10 @@ private:
 
 	const int BUBBLES_CACHE = 100;
 
+	bool					  _verbose;
+	bool 					  _showProgress;
+	bool					  _hopoEnabled;
+
 	const SubstitutionMatrix  _subsMatrix;
 	const HopoMatrix 		  _hopoMatrix;
 	const GeneralPolisher 	  _generalPolisher;
@@ -50,7 +54,4 @@ private:
 	std::ifstream			  _bubblesFile;
 	std::ofstream			  _consensusFile;
 	std::ofstream			  _logFile;
-	bool					  _verbose;
-	bool 					  _showProgress;
-	bool					  _hopoEnabled;
 };


=====================================
src/polishing/subs_matrix.cpp
=====================================
@@ -215,8 +215,12 @@ std::string HopoMatrix::obsToStr(HopoMatrix::Observation obs)
 	return result;
 }*/
 
-HopoMatrix::HopoMatrix(const std::string& fileName)
+HopoMatrix::HopoMatrix(const std::string& fileName, bool hopoEnabled = true)
 {
+	if (!hopoEnabled)
+	{
+		return;
+	}
 	for (size_t i = 0; i < NUM_HOPO_STATES; ++i)
 	{
 		_observationProbs.emplace_back(NUM_HOPO_OBS, probToScore(MIN_HOPO_PROB));
@@ -256,7 +260,7 @@ void HopoMatrix::loadMatrix(const std::string& fileName)
 	{
 		observationsFreq.push_back(std::vector<size_t>(NUM_HOPO_OBS, 0));
 	}
-
+	
 	while (std::getline(fin, buffer))
 	{
 		if (buffer.empty()) continue;


=====================================
src/polishing/subs_matrix.h
=====================================
@@ -68,7 +68,7 @@ public:
 	};
 	typedef std::vector<Observation> ObsVector;
 
-	HopoMatrix(const std::string& fileName);
+	HopoMatrix(const std::string& fileName, bool hopoEnabled);
 	AlnScoreType getObsProb(State state, Observation observ) const
 		{return _observationProbs[state.id][observ.id];}
 	AlnScoreType getGenomeProb(State state) const



View it on GitLab: https://salsa.debian.org/med-team/flye/-/compare/07ad497f1d2c7f7e038e33152e35ecccff95a188...8bddd17f26b33c0c81ce847fdea6fe088fed0ca4

-- 
This project does not include diff previews in email notifications.
View it on GitLab: https://salsa.debian.org/med-team/flye/-/compare/07ad497f1d2c7f7e038e33152e35ecccff95a188...8bddd17f26b33c0c81ce847fdea6fe088fed0ca4
You're receiving this email because of your account on salsa.debian.org.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://alioth-lists.debian.net/pipermail/debian-med-commit/attachments/20240522/771be6f2/attachment-0001.htm>