[med-svn] [Git][med-team/flye][master] 9 commits: routine-update: New upstream version
Étienne Mollier (@emollier)
gitlab at salsa.debian.org
Wed Nov 29 21:50:19 GMT 2023
Étienne Mollier pushed to branch master at Debian Med / flye
Commits:
8189ce15 by Étienne Mollier at 2023-11-29T22:06:56+01:00
routine-update: New upstream version
- - - - -
eb1e7a71 by Étienne Mollier at 2023-11-29T22:07:23+01:00
New upstream version 2.9.3+dfsg
- - - - -
f4163cae by Étienne Mollier at 2023-11-29T22:07:42+01:00
Update upstream source from tag 'upstream/2.9.3+dfsg'
Update to upstream version '2.9.3+dfsg'
with Debian dir 25a4756ea470f9e1a4d400daf2d01782d200e637
- - - - -
41c36f36 by Étienne Mollier at 2023-11-29T22:07:46+01:00
routine-update: Build-Depends: s/dh-python/dh-sequence-python3/
- - - - -
9827782c by Étienne Mollier at 2023-11-29T22:12:14+01:00
gcc-13.patch: delete: applied upstream.
- - - - -
86c3c8a3 by Étienne Mollier at 2023-11-29T22:34:59+01:00
cppflags.patch: new: propagate CPPFLAGS.
Among other things, this allows injection of source fortification
macro, which participates to executables hardening.
- - - - -
1414642a by Étienne Mollier at 2023-11-29T22:40:30+01:00
d/rules: activate hardening flags.
- - - - -
828617c4 by Étienne Mollier at 2023-11-29T22:41:34+01:00
use_debian_packaged_libs.patch: forward not-needed.
- - - - -
5277dfe4 by Étienne Mollier at 2023-11-29T22:49:15+01:00
ready to upload to unstable.
- - - - -
25 changed files:
- README.md
- debian/changelog
- debian/control
- + debian/patches/cppflags.patch
- − debian/patches/gcc-13.patch
- debian/patches/series
- debian/patches/use_debian_packaged_libs.patch
- debian/rules
- docs/FAQ.md
- docs/NEWS.md
- flye/__build__.py
- flye/__version__.py
- flye/config/bin_cfg/asm_defaults.cfg
- flye/config/bin_cfg/asm_nano_hq.cfg
- flye/main.py
- flye/polishing/polish.py
- src/assemble/extender.cpp
- src/polishing/subs_matrix.h
- src/repeat_graph/haplotype_resolver.cpp
- src/repeat_graph/main_repeat.cpp
- src/repeat_graph/output_generator.cpp
- src/repeat_graph/repeat_graph.cpp
- src/repeat_graph/repeat_graph.h
- src/sequence/overlap.cpp
- src/sequence/sequence_container.h
Changes:
=====================================
README.md
=====================================
@@ -3,7 +3,7 @@ Flye assembler
[![BioConda Install](https://img.shields.io/conda/dn/bioconda/flye.svg?style=flag&label=BioConda%20install)](https://anaconda.org/bioconda/flye)
-### Version: 2.9.2
+### Version: 2.9.3
Flye is a de novo assembler for single-molecule sequencing reads,
such as those produced by PacBio and Oxford Nanopore Technologies.
@@ -26,6 +26,12 @@ Manuals
Latest updates
--------------
+### Flye 2.9.3 release (28 November 2023)
+* Disjointig step speedup for `--nano-hq` mode
+* Improved `--keep-haplotypes` mode preserves more heterozygous SVs
+* A few bug fixes
+
+
### Flye 2.9.2 release (18 March 2023)
* Update to minimap 2.24 + using HiFi and Kit14 parameters for faster alignment
* Fixed a few small bugs and corner cases
@@ -54,29 +60,6 @@ Latest updates
* Update to minimap 2.18
* Several rare bug fixes/other improvements
-### Flye 2.8.3 release (10 Feb 2021)
-* Reduced RAM consumption for some ultra-long ONT datasets
-* Fixed rare artificial sequence insertions on some ONT datasets
-* Assemblies should be largely identical to 2.8
-
-### Flye 2.8.2 release (12 Dec 2020)
-* Improvements in GFA output, much faster generation of large and tangled graphs
-* Speed improvements for graph simplification algorithms
-* A few minor bugs fixed
-* Assemblies should be largely identical to 2.8
-
-### Flye 2.8.1 release (02 Sep 2020)
-* Added a new option `--hifi-error` to control the expected error rate of HiFi reads (no other changes)
-
-### Flye 2.8 release (04 Aug 2020)
-* Improvements in contiguity and speed for PacBio HiFi mode
-* Using the `--meta` k-mer selection strategy in isolate assemblies as well.
-This strategy is more robust to drops in coverage/contamination and requires less memory
-* 1.5-2x RAM footprint reduction for large assemblies (e.g. human ONT assembly now uses 400-500 Gb)
-* Genome size parameter is no longer required (it is still needed for downsampling though `--asm-coverage`)
-* Flye now can occasionally use overlaps shorter than "minOverlap" parameter to close disjointing gaps
-* Various improvements and bugfixes
-
Repeat graph
------------
@@ -218,4 +201,4 @@ has already been answered.
If you are reporting a problem, please include the `flye.log` file and provide
details about your dataset.
-In case you prefer personal communication, please contact Mikhail at fenderglass at gmail.com.
+In case you prefer personal communication, please contact Mikhail at mikolmogorov at gmail.com.
=====================================
debian/changelog
=====================================
@@ -1,3 +1,16 @@
+flye (2.9.3+dfsg-1) unstable; urgency=medium
+
+ * New upstream version
+ * Build-Depends: s/dh-python/dh-sequence-python3/ (routine-update)
+ * gcc-13.patch: delete: applied upstream.
+ * cppflags.patch: new: propagate CPPFLAGS.
+ Among other things, this allows injection of source fortification
+ macro, which participates to executables hardening.
+ * d/rules: activate hardening flags.
+ * use_debian_packaged_libs.patch: forward not-needed.
+
+ -- Étienne Mollier <emollier at debian.org> Wed, 29 Nov 2023 22:45:17 +0100
+
flye (2.9.2+dfsg-2) unstable; urgency=medium
* gcc-13.patch: new: fix ftbfs with gcc-13. (Closes: #1037662)
=====================================
debian/control
=====================================
@@ -5,7 +5,7 @@ Uploaders: Andreas Tille <tille at debian.org>,
Section: science
Priority: optional
Build-Depends: debhelper-compat (= 13),
- dh-python,
+ dh-sequence-python3,
python3,
python3-setuptools,
liblemon-dev,
=====================================
debian/patches/cppflags.patch
=====================================
@@ -0,0 +1,65 @@
+Description: inject CPPFLAGS in compiler invocations.
+ Among other things, this allows the propagation of source fortification macro.
+Author: Étienne Mollier <emollier at debian.org>
+Forwarded: not-needed
+Last-Update: 2023-11-29
+---
+This patch header follows DEP-3: http://dep.debian.net/deps/dep3/
+--- flye.orig/src/Makefile
++++ flye/src/Makefile
+@@ -13,7 +13,7 @@
+ release: CXXFLAGS += -O3 -DNDEBUG
+ release: flye-modules
+
+-SANITIZE_FLAGS += -D_GLIBCXX_SANITIZE_VECTOR -U_FORTIFY_SOURCE -fsanitize=address -fno-omit-frame-pointer -fsanitize=undefined
++SANITIZE_FLAGS += -D_GLIBCXX_SANITIZE_VECTOR -U_FORTIFY_SOURCE=2 -fsanitize=address -fno-omit-frame-pointer -fsanitize=undefined
+ #SANITIZE_FLAGS += -fsanitize=thread -fsanitize=leak -fsanitize=undefined
+ debug: CXXFLAGS += -Og ${SANITIZE_FLAGS}
+ #debug: CXXFLAGS += -D_GLIBCXX_DEBUG
+@@ -25,32 +25,32 @@
+ sequence_obj := ${patsubst %.cpp,%.o,${wildcard sequence/*.cpp}}
+
+ sequence/%.o: sequence/%.cpp sequence/*.h common/*.h
+- ${CXX} -c ${CXXFLAGS} $< -o $@
++ ${CXX} ${CPPFLAGS} -c ${CXXFLAGS} $< -o $@
+
+ #flye-assemble module
+ assemble_obj := ${patsubst %.cpp,%.o,${wildcard assemble/*.cpp}}
+
+ assemble/%.o: assemble/%.cpp assemble/*.h sequence/*.h common/*.h
+- ${CXX} -c ${CXXFLAGS} $< -o $@
++ ${CXX} ${CPPFLAGS} -c ${CXXFLAGS} $< -o $@
+
+ #flye-repeat module
+ repeat_obj := ${patsubst %.cpp,%.o,${wildcard repeat_graph/*.cpp}}
+
+ repeat_graph/%.o: repeat_graph/%.cpp repeat_graph/*.h sequence/*.h common/*.h
+- ${CXX} -c ${CXXFLAGS} $< -o $@
++ ${CXX} ${CPPFLAGS} -c ${CXXFLAGS} $< -o $@
+
+ #flye-contigger module
+ contigger_obj := ${patsubst %.cpp,%.o,${wildcard contigger/*.cpp}}
+
+ contigger/%.o: contigger/%.cpp repeat_graph/*.h sequence/*.h common/*.h
+- ${CXX} -c ${CXXFLAGS} $< -o $@
++ ${CXX} ${CPPFLAGS} -c ${CXXFLAGS} $< -o $@
+
+
+ #flye-polish module
+ polish_obj := ${patsubst %.cpp,%.o,${wildcard polishing/*.cpp}}
+
+ polishing/%.o: polishing/%.cpp bin/polisher.cpp polishing/*.h common/*h
+- ${CXX} -c ${CXXFLAGS} $< -o $@
++ ${CXX} ${CPPFLAGS} -c ${CXXFLAGS} $< -o $@
+
+ #main module
+ #main_obj := ${patsubst %.cpp,%.o,${wildcard main/*.cpp}}
+@@ -60,7 +60,7 @@
+
+ #main/%.o: main/%.cpp assemble/*.h sequence/*.h common/*.h repeat_graph/*.h contigger/*.h polishing/*.h
+ main.o: main.cpp
+- ${CXX} -c ${CXXFLAGS} $< -o $@
++ ${CXX} ${CPPFLAGS} -c ${CXXFLAGS} $< -o $@
+
+
+ clean:
=====================================
debian/patches/gcc-13.patch deleted
=====================================
@@ -1,27 +0,0 @@
-Description: fix build failure with gcc 13
-Author: Étienne Mollier <emollier at debian.org>
-Bug-Debian: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1037662
-Forwarded: https://github.com/fenderglass/Flye/pull/621
-Last-Update: 2023-08-13
----
-This patch header follows DEP-3: http://dep.debian.net/deps/dep3/
---- flye.orig/src/sequence/sequence_container.h
-+++ flye/src/sequence/sequence_container.h
-@@ -4,6 +4,7 @@
-
- #pragma once
-
-+#include <cstdint>
- #include <vector>
- #include <unordered_map>
- #include <string>
---- flye.orig/src/polishing/subs_matrix.h
-+++ flye/src/polishing/subs_matrix.h
-@@ -4,6 +4,7 @@
-
- #pragma once
-
-+#include <cstdint>
- #include <string>
- #include <fstream>
- #include <iostream>
=====================================
debian/patches/series
=====================================
@@ -1,2 +1,2 @@
use_debian_packaged_libs.patch
-gcc-13.patch
+cppflags.patch
=====================================
debian/patches/use_debian_packaged_libs.patch
=====================================
@@ -1,6 +1,7 @@
Author: Andreas Tille <tille at debian.org>
Last-Update: Fri, 05 Jun 2020 15:05:08 +0200
Description: use Debian packaged libminimap2 and liblemon
+Forwarded: not-needed
--- a/Makefile
+++ b/Makefile
=====================================
debian/rules
=====================================
@@ -5,11 +5,10 @@ export LC_ALL=C.UTF-8
include /usr/share/dpkg/default.mk
-# for hardening you might like to uncomment this:
-# export DEB_BUILD_MAINT_OPTIONS=hardening=+all
+export DEB_BUILD_MAINT_OPTIONS=hardening=+all
%:
- dh $@ --with python3 --buildsystem=pybuild
+ dh $@ --buildsystem=pybuild
override_dh_installchangelogs:
dh_installchangelogs docs/NEWS.md
=====================================
docs/FAQ.md
=====================================
@@ -236,10 +236,17 @@ flye --polish-target SEQ_TO_POLISH --pacbio-raw READS --iterations NUM_ITER --ou
You can also provide Bam file as input instead of reads, which will skip the read mapping step.
+
+Flye assembly of the same reads is slightly different from run to run
+---------------------------------------------------------------------
+
+Flye is not fully deterministic, and this would be very difficult to fix. See more info here: https://github.com/fenderglass/Flye/issues/509
+For test runs, one can use `--deterministic` option to make the output stable, at the expense of substantially slower runtimes.
+
My question is not listed, how do I get help?
---------------------------------------------
Please post your question to the [issue tracker](https://github.com/fenderglass/Flye/issues).
-In case you prefer personal communcation, you can contact Mikhail at fenderglass at gmail.com.
+In case you prefer personal communcation, you can contact Mikhail at mikolmogorov at gmail.com.
If you reporting a problem, please include the `flye.log` file and provide some
details about your dataset (if possible).
=====================================
docs/NEWS.md
=====================================
@@ -1,3 +1,9 @@
+Flye 2.9.3 release (28 November 2023)
+====================================
+* Disjointig step speedup for `--nano-hq` mode
+* Improved `--keep-haplotypes` mode preserves more heterozygous SVs
+* A few bug fixes
+
Flye 2.9.2 release (18 March 2023)
=================================
* Update to minimap 2.24 + using HiFi and Kit14 parameters for faster alignment
=====================================
flye/__build__.py
=====================================
@@ -1 +1 @@
-__build__ = 1786
+__build__ = 1797
=====================================
flye/__version__.py
=====================================
@@ -1 +1 @@
-__version__ = "2.9.2"
+__version__ = "2.9.3"
=====================================
flye/config/bin_cfg/asm_defaults.cfg
=====================================
@@ -8,6 +8,7 @@ meta_read_filter_kmer_freq = 100
chain_large_gap_penalty = 2
chain_small_gap_penalty = 0.5
chain_gap_jump_threshold = 100
+max_jump_gap = 500
#read assembly parameters
max_coverage_drop_rate = 5
@@ -17,6 +18,7 @@ chimera_overhang = 1000
min_reads_in_disjointig = 4
max_inner_reads = 10
max_inner_fraction = 0.25
+aggressive_dup_filter = 1
#repeat graph parameters
max_separation = 500
@@ -33,5 +35,5 @@ weak_detach_rate = 5
tip_coverage_rate = 2
tip_length_rate = 2
-output_gfa_before_rr = 0
+output_gfa_before_rr = 1
remove_alt_edges = 0
=====================================
flye/config/bin_cfg/asm_nano_hq.cfg
=====================================
@@ -6,7 +6,7 @@ low_cutoff_warning = 0
#k-mer selection
kmer_size = 17
use_minimizers = 1
-minimizer_window = 5
+minimizer_window = 10
reads_base_alignment = 1
=====================================
flye/main.py
=====================================
@@ -429,6 +429,9 @@ def _run_polisher_only(args):
if bam_input and len(args.reads) > 1:
raise ResumeException("Only single bam input supported")
+ if bam_input and args.num_iters > 1:
+ raise ResumeException("Bam input only supports single iteration. For multiple iterations, provide fastq instead")
+
pol.polish(args.polish_target, args.reads, args.out_dir,
args.num_iters, args.threads, args.platform,
args.read_type, output_progress=True)
@@ -678,19 +681,21 @@ def main():
if args.read_error and args.read_error > 1:
parser.error("--read-error expressed as a decimal fraction, e.g. 0.01 or 0.03")
- if args.read_error:
- hifi_str = "assemble_ovlp_divergence={0},repeat_graph_ovlp_divergence={0}".format(args.read_error)
+ def _add_extra_param(param):
if args.extra_params:
- args.extra_params += "," + hifi_str
+ args.extra_params += "," + param
else:
- args.extra_params = hifi_str
+ args.extra_params = param
+
+ if args.read_error:
+ hifi_str = "assemble_ovlp_divergence={0},repeat_graph_ovlp_divergence={0}".format(args.read_error)
+ _add_extra_param(hifi_str)
if args.no_alt_contigs:
- alt_params = "remove_alt_edges=1"
- if args.extra_params:
- args.extra_params += "," + alt_params
- else:
- args.extra_params = "remove_alt_edges=1"
+ _add_extra_param("remove_alt_edges=1")
+
+ if args.keep_haplotypes:
+ _add_extra_param("aggressive_dup_filter=0")
if args.pacbio_raw:
args.reads = args.pacbio_raw
=====================================
flye/polishing/polish.py
=====================================
@@ -125,6 +125,8 @@ def polish(contig_seqs, read_seqs, work_dir, num_iters, num_threads, read_platfo
with open(stats_file, "w") as f:
f.write("#seq_name\tlength\tcoverage\n")
for ctg_id in contig_lengths:
+ if ctg_id not in coverage_stats:
+ coverage_stats[ctg_id] = 0
f.write("{0}\t{1}\t{2}\n".format(ctg_id,
contig_lengths[ctg_id], coverage_stats[ctg_id]))
=====================================
src/assemble/extender.cpp
=====================================
@@ -298,10 +298,10 @@ void Extender::assembleDisjointigs()
//int extRight = this->countRightExtensions(startOvlps);
if (_chimDetector.isChimeric(startRead, startOvlps) ||
- _readsContainer.seqLen(startRead) < _safeOverlap ||
- //std::max(extLeft, extRight) > maxStartExt ||
- //std::min(extLeft, extRight) < minStartExt ||
- numInnerOvlp > totalOverlaps / 2) return;
+ _readsContainer.seqLen(startRead) < _safeOverlap) return;
+
+ const bool aggressiveDupFilt = (int)Config::get("aggressive_dup_filter");
+ if (aggressiveDupFilt && numInnerOvlp > totalOverlaps / 2) return;
//Good to go!
ExtensionInfo exInfo = this->extendDisjointig(startRead);
=====================================
src/polishing/subs_matrix.h
=====================================
@@ -4,6 +4,7 @@
#pragma once
+#include <cstdint>
#include <string>
#include <fstream>
#include <iostream>
=====================================
src/repeat_graph/haplotype_resolver.cpp
=====================================
@@ -166,7 +166,7 @@ int HaplotypeResolver::findHeterozygousLoops()
//loop coverage should be roughly equal or less
if (loop.meanCoverage >
COV_MULT * std::min(entrancePath->meanCoverage,
- entrancePath->meanCoverage)) continue;
+ exitPath->meanCoverage)) continue;
//loop should not be longer than other branches
if (loop.length > std::max(entrancePath->length,
=====================================
src/repeat_graph/main_repeat.cpp
=====================================
@@ -190,7 +190,7 @@ int repeat_main(int argc, char** argv)
Logger::get().info() << "Building repeat graph";
SequenceContainer edgeSequences;
RepeatGraph rg(seqAssembly, &edgeSequences);
- rg.build();
+ rg.build(keepHaplotypes);
//rg.validateGraph();
Logger::get().info() << "Parsing reads";
@@ -261,7 +261,7 @@ int repeat_main(int argc, char** argv)
Logger::get().debug() << "[SIMPL] == Iteration " << iterNum << " ==";
actions += multInf.splitNodes();
- if (isMeta)
+ if (isMeta && !keepHaplotypes)
{
actions += multInf.disconnectMinorPaths();
}
@@ -277,7 +277,7 @@ int repeat_main(int argc, char** argv)
if (!actions) break;
}
- if (isMeta)
+ if (isMeta && !keepHaplotypes)
{
multInf.resolveForks();
}
=====================================
src/repeat_graph/output_generator.cpp
=====================================
@@ -106,13 +106,23 @@ void OutputGenerator::outputGfa(const std::vector<UnbranchingPath>& paths,
}
//make sure that if there are nodes with one incoming and one outgoing
- //edge, they are connected. Most relevant to the circular contigs
+ //edge, they are connected. Initialize those connections to zero.
+ //Most relevant to the circular contigs, but also to strange bubbles
for (auto& node : _graph.iterNodes())
{
- if (node->inEdges.size() == 1 && node->outEdges.size() == 1)
+ if (node->outEdges.size() == 1)
{
- //initialize to zero
- edgeConnections[node->inEdges.front()][node->outEdges.front()];
+ for (auto& inEdge : node->inEdges)
+ {
+ edgeConnections[inEdge][node->outEdges.front()]; //initialize to zero
+ }
+ }
+ if (node->inEdges.size() == 1)
+ {
+ for (auto& outEdge : node->outEdges)
+ {
+ edgeConnections[node->inEdges.front()][outEdge]; //initialize to zero
+ }
}
}
=====================================
src/repeat_graph/repeat_graph.cpp
=====================================
@@ -75,7 +75,7 @@ std::unordered_set<GraphEdge*> GraphEdge::adjacentEdges()
return edges;
}
-void RepeatGraph::build()
+void RepeatGraph::build(bool keepHaplotypes)
{
//getting overlaps
VertexIndex asmIndex(_asmSeqs);
@@ -104,7 +104,10 @@ void RepeatGraph::build()
asmOverlaps.overlapDivergenceStats();
this->getGluepoints(asmOverlaps);
- this->collapseTandems();
+ if (!keepHaplotypes)
+ {
+ this->collapseTandems();
+ }
this->initializeEdges(asmOverlaps);
GraphProcessor proc(*this, _asmSeqs);
proc.simplify();
=====================================
src/repeat_graph/repeat_graph.h
=====================================
@@ -250,7 +250,7 @@ public:
{}
~RepeatGraph();
- void build();
+ void build(bool keepHaplotypes);
void updateEdgeSequences();
void storeGraph(const std::string& filename);
void loadGraph(const std::string& filename);
=====================================
src/sequence/overlap.cpp
=====================================
@@ -112,6 +112,7 @@ OverlapDetector::getSeqOverlaps(const FastaRecord& fastaRec,
static const float LG_GAP = (float)Config::get("chain_large_gap_penalty");
static const float SM_GAP = (float)Config::get("chain_small_gap_penalty");
static const int GAP_JUMP_THLD = (int)Config::get("chain_gap_jump_threshold");
+ static const int MAX_GAP = (int)Config::get("max_jump_gap");
//outSuggestChimeric = false;
int32_t curLen = fastaRec.sequence.length();
@@ -288,14 +289,15 @@ OverlapDetector::getSeqOverlaps(const FastaRecord& fastaRec,
{
int32_t curPrev = matchesList[j].curPos;
int32_t extPrev = matchesList[j].extPos;
+ int32_t jumpDiv = abs((curNext - curPrev) -
+ (extNext - extPrev));
if (0 < curNext - curPrev && curNext - curPrev < _maxJump &&
- 0 < extNext - extPrev && extNext - extPrev < _maxJump)
+ 0 < extNext - extPrev && extNext - extPrev < _maxJump &&
+ jumpDiv <= MAX_GAP)
{
int32_t matchScore =
std::min(std::min(curNext - curPrev, extNext - extPrev),
kmerSize);
- int32_t jumpDiv = abs((curNext - curPrev) -
- (extNext - extPrev));
//int32_t gapCost = jumpDiv ?
// kmerSize * jumpDiv + ilog2_32(jumpDiv) : 0;
int32_t gapCost = (jumpDiv > GAP_JUMP_THLD ? LG_GAP : SM_GAP) * jumpDiv;
=====================================
src/sequence/sequence_container.h
=====================================
@@ -4,6 +4,7 @@
#pragma once
+#include <cstdint>
#include <vector>
#include <unordered_map>
#include <string>
View it on GitLab: https://salsa.debian.org/med-team/flye/-/compare/e121632809d313f28d1c6d97ea79cc1979ab3750...5277dfe4899d5fd6c6f3973d4bb01a2159bcd60a
--
View it on GitLab: https://salsa.debian.org/med-team/flye/-/compare/e121632809d313f28d1c6d97ea79cc1979ab3750...5277dfe4899d5fd6c6f3973d4bb01a2159bcd60a
You're receiving this email because of your account on salsa.debian.org.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://alioth-lists.debian.net/pipermail/debian-med-commit/attachments/20231129/884854a4/attachment-0001.htm>
More information about the debian-med-commit
mailing list