[med-svn] [Git][med-team/unicycler][master] 5 commits: d/watch: Fix watch regex
Nilesh Patra (@nilesh)
gitlab at salsa.debian.org
Sat Aug 7 20:51:24 BST 2021
Nilesh Patra pushed to branch master at Debian Med / unicycler
Commits:
ae1b45ca by Nilesh Patra at 2021-08-08T00:49:24+05:30
d/watch: Fix watch regex
- - - - -
4ffd09ab by Nilesh Patra at 2021-08-08T00:50:16+05:30
New upstream version 0.4.9+dfsg
- - - - -
098a4d74 by Nilesh Patra at 2021-08-08T00:50:32+05:30
Update upstream source from tag 'upstream/0.4.9+dfsg'
Update to upstream version '0.4.9+dfsg'
with Debian dir 84a461f8ce953a86d6b644be308ab43782a3c66e
- - - - -
aa01f79b by Nilesh Patra at 2021-08-08T00:54:44+05:30
d/p/{append_flags,spades.patch}: Refresh patches
- - - - -
50a1cc57 by Nilesh Patra at 2021-08-08T01:21:06+05:30
[skip ci] Interim changelog entry
- - - - -
12 changed files:
- README.md
- debian/changelog
- debian/patches/append_flags
- debian/patches/spades.patch
- debian/watch
- test/test_misc.py
- unicycler/assembly_graph.py
- unicycler/misc.py
- unicycler/settings.py
- unicycler/spades_func.py
- unicycler/unicycler.py
- unicycler/version.py
Changes:
=====================================
README.md
=====================================
@@ -10,6 +10,17 @@ And read about how we use it to complete bacterial genomes here:
+# A note on Trycycler
+
+[Trycycler](https://github.com/rrwick/Trycycler/wiki) is a newer tool that in many cases is a better choice than Unicycler. Here is a quick guide on whether you should use Unicycler or Trycycler to assemble your bacterial genome:
+* If you only have short reads, use Unicycler (Trycycler does not do short-read assembly).
+* If you only have long reads, Trycycler is a better choice. While Unicycler can do long-read-only assembly, its approach is somewhat out-of-date. Something else to consider is the depth of your long reads, as Trycycler prefers deeper read sets. If your long-read set is particularly shallow (~25× or less), then [Flye](https://github.com/fenderglass/Flye) might be your best option.
+* If you have both short and long reads (i.e. are doing a hybrid assembly), then Unicycler and [Trycycler+polishing](https://github.com/rrwick/Trycycler/wiki/Polishing-after-Trycycler) are both viable options. If you have lots of long reads (~100× depth or more), use Trycycler+polishing. If you have sparse long reads (~25× or less), use Unicycler. If your long-read depth falls between those values, it might be worth trying both approaches.
+
+You can read more on Trycycler's FAQ page: [Should I use Unicycler or Trycycler to assemble my bacterial genome?](https://github.com/rrwick/Trycycler/wiki/FAQ-and-miscellaneous-tips#should-i-use-unicycler-or-trycycler-to-assemble-my-bacterial-genome)
+
+
+
# Table of contents
* [Introduction](#introduction)
@@ -94,7 +105,7 @@ Reasons to __not__ use Unicycler:
* [ICC](https://software.intel.com/en-us/c-compilers) also works (though I don't know the minimum required version number)
* [setuptools](https://packaging.python.org/installing/#install-pip-setuptools-and-wheel) (only required for installation of Unicycler)
* For short-read or hybrid assembly:
- * [SPAdes](http://bioinf.spbau.ru/spades) v3.6.2 or later (`spades.py`)
+ * [SPAdes](http://bioinf.spbau.ru/spades) v3.6.2 – v3.13.0 (`spades.py`)
* For long-read or hybrid assembly:
* [Racon](https://github.com/isovic/racon) (`racon`)
* For polishing
=====================================
debian/changelog
=====================================
@@ -1,3 +1,11 @@
+unicycler (0.4.9+dfsg-1) UNRELEASED; urgency=medium
+
+ * d/watch: Fix watch regex
+ * New upstream version 0.4.9+dfsg
+TODO: Error: Unicycler requires SPAdes v3.6.2 - v3.13.0
+
+ -- Nilesh Patra <nilesh at debian.org> Sun, 08 Aug 2021 00:50:46 +0530
+
unicycler (0.4.8+dfsg-2) unstable; urgency=medium
* Autopkgtest: Restrictions: skip-not-installable
=====================================
debian/patches/append_flags
=====================================
@@ -1,7 +1,7 @@
From: Michael R. Crusoe <michael.crusoe at gmail.com>
Subject: Inherit and use LDFLAGS and CPPFLAGS
---- unicycler.orig/Makefile
-+++ unicycler/Makefile
+--- a/Makefile
++++ b/Makefile
@@ -66,7 +66,7 @@
# These flags are required for the build to work.
=====================================
debian/patches/spades.patch
=====================================
@@ -4,7 +4,7 @@ Description: SPAdes is in Debian at /usr/bin/spades
--- a/test/test_dependencies.py
+++ b/test/test_dependencies.py
-@@ -42,7 +42,7 @@ class TestDependencies(unittest.TestCase
+@@ -42,7 +42,7 @@
def test_spades_not_found(self):
stdout, stderr, ret_code = self.run_unicycler(['--spades_path', 'not_a_real_path'])
@@ -13,7 +13,7 @@ Description: SPAdes is in Debian at /usr/bin/spades
self.assertTrue('could not find SPAdes' in stderr)
self.assertEqual(ret_code, 1)
-@@ -102,14 +102,14 @@ class TestDependencies(unittest.TestCase
+@@ -102,14 +102,14 @@
def test_no_rotate(self):
stdout, stderr, ret_code = self.run_unicycler(['--spades_path', 'not_a_real_path',
'--no_rotate'])
@@ -30,7 +30,7 @@ Description: SPAdes is in Debian at /usr/bin/spades
self.assertTrue(bool(re.search(r'bowtie2-build\s+not used', stdout)))
self.assertTrue(bool(re.search(r'bowtie2\s+not used', stdout)))
self.assertTrue(bool(re.search(r'samtools\s+not used', stdout)))
-@@ -119,12 +119,12 @@ class TestDependencies(unittest.TestCase
+@@ -119,12 +119,12 @@
def test_verbosity_1(self):
stdout, stderr, ret_code = self.run_unicycler(['--spades_path', 'not_a_real_path',
'--verbosity', '1'])
@@ -47,7 +47,7 @@ Description: SPAdes is in Debian at /usr/bin/spades
self.assertTrue(bool(re.search(r'Program\s+Version\s+Status\s+Path', stdout)))
--- a/test/overlap_removal_test.py
+++ b/test/overlap_removal_test.py
-@@ -94,7 +94,7 @@ def run_spades(out_dir):
+@@ -94,7 +94,7 @@
reads_2 = os.path.join(out_dir, 'reads_2.fastq')
reads_unpaired = os.path.join(out_dir, 'reads_unpaired.fastq')
@@ -58,7 +58,7 @@ Description: SPAdes is in Debian at /usr/bin/spades
spades_cmd += ['-1', reads_1, '-2', reads_2]
--- a/test/test_misc.py
+++ b/test/test_misc.py
-@@ -391,14 +391,14 @@ class TestMiscFunctions(unittest.TestCas
+@@ -391,14 +391,14 @@
def test_spades_version_parsing_3(self):
spades_version_output = 'option -v not recognized\nSPAdes genome assembler v.3.5.0\n\n' \
@@ -77,16 +77,16 @@ Description: SPAdes is in Debian at /usr/bin/spades
self.assertEqual(version, '2.4.0')
--- a/README.md
+++ b/README.md
-@@ -94,7 +94,7 @@ Reasons to __not__ use Unicycler:
+@@ -105,7 +105,7 @@
* [ICC](https://software.intel.com/en-us/c-compilers) also works (though I don't know the minimum required version number)
* [setuptools](https://packaging.python.org/installing/#install-pip-setuptools-and-wheel) (only required for installation of Unicycler)
* For short-read or hybrid assembly:
-- * [SPAdes](http://bioinf.spbau.ru/spades) v3.6.2 or later (`spades.py`)
-+ * [SPAdes](http://bioinf.spbau.ru/spades) v3.6.2 or later (`spades`)
+- * [SPAdes](http://bioinf.spbau.ru/spades) v3.6.2 – v3.13.0 (`spades.py`)
++ * [SPAdes](http://bioinf.spbau.ru/spades) v3.6.2 – v3.13.0 (`spades`)
* For long-read or hybrid assembly:
* [Racon](https://github.com/isovic/racon) (`racon`)
* For polishing
-@@ -415,7 +415,7 @@ SPAdes assembly:
+@@ -426,7 +426,7 @@
These options control the short-read SPAdes assembly at the beginning of the Unicycler
pipeline.
@@ -97,7 +97,7 @@ Description: SPAdes is in Debian at /usr/bin/spades
--min_kmer_frac MIN_KMER_FRAC Lowest k-mer size for SPAdes assembly, expressed as a fraction of
--- a/setup.py
+++ b/setup.py
-@@ -63,7 +63,7 @@ def missing_tool(tool_name):
+@@ -63,7 +63,7 @@
def tool_check():
# Check for required programs.
@@ -108,7 +108,7 @@ Description: SPAdes is in Debian at /usr/bin/spades
for tool in tools:
--- a/unicycler/misc.py
+++ b/unicycler/misc.py
-@@ -124,7 +124,7 @@ def check_spades(spades_path):
+@@ -124,7 +124,7 @@
if not err.decode():
quit_with_error('SPAdes was found but does not produce output (make sure to use '
@@ -117,7 +117,7 @@ Description: SPAdes is in Debian at /usr/bin/spades
def find_pilon(pilon_path, java_path, args):
-@@ -916,7 +916,7 @@ def spades_path_and_version(spades_path)
+@@ -902,7 +902,7 @@
def spades_version_from_spades_output(spades_output):
"""
@@ -128,7 +128,7 @@ Description: SPAdes is in Debian at /usr/bin/spades
return re.search(r'v(\d+\.\d+\.\d+)', spades_output).group(1)
--- a/unicycler/unicycler.py
+++ b/unicycler/unicycler.py
-@@ -320,7 +320,7 @@ def get_arguments():
+@@ -320,7 +320,7 @@
'These options control the short-read SPAdes '
'assembly at the beginning of the Unicycler pipeline.'
if show_all_args else argparse.SUPPRESS)
@@ -137,7 +137,7 @@ Description: SPAdes is in Debian at /usr/bin/spades
help='Path to the SPAdes executable'
if show_all_args else argparse.SUPPRESS)
spades_group.add_argument('--no_correct', action='store_true',
-@@ -756,7 +756,7 @@ def check_dependencies(args, short_reads
+@@ -764,7 +764,7 @@
spades_path, spades_version, spades_status = '', '', 'not used'
else:
spades_path, spades_version, spades_status = spades_path_and_version(args.spades_path)
@@ -146,7 +146,7 @@ Description: SPAdes is in Debian at /usr/bin/spades
if args.verbosity > 1:
spades_row.append(spades_path)
program_table.append(spades_row)
-@@ -864,7 +864,7 @@ def quit_if_dependency_problem(spades_st
+@@ -873,7 +873,7 @@
quit_with_error('SPAdes cannot run due to an incompatible Python version')
if spades_status == 'bad':
quit_with_error('SPAdes was found but does not produce output (make sure to use '
=====================================
debian/watch
=====================================
@@ -1,4 +1,4 @@
version=4
opts="repacksuffix=+dfsg,dversionmangle=s/\+dfsg//g,repack,compression=xz" \
- https://github.com/rrwick/Unicycler/releases .*/archive/v?@ANY_VERSION@@ARCHIVE_EXT@
+ https://github.com/rrwick/Unicycler/releases .*/archive/.*/v?@ANY_VERSION@@ARCHIVE_EXT@
=====================================
test/test_misc.py
=====================================
@@ -402,3 +402,36 @@ class TestMiscFunctions(unittest.TestCase):
'-o <output_dir>\n\nBasic options:'
version = unicycler.misc.spades_version_from_spades_output(spades_version_output)
self.assertEqual(version, '2.4.0')
+
+ def test_spades_version_status_1(self):
+ self.assertEqual(unicycler.misc.spades_status_from_version('2.4.0'), 'too old')
+
+ def test_spades_version_status_2(self):
+ self.assertEqual(unicycler.misc.spades_status_from_version('3.4.0'), 'too old')
+
+ def test_spades_version_status_3(self):
+ self.assertEqual(unicycler.misc.spades_status_from_version('3.6.0'), 'too old')
+
+ def test_spades_version_status_4(self):
+ self.assertEqual(unicycler.misc.spades_status_from_version('3.6.1'), 'too old')
+
+ def test_spades_version_status_5(self):
+ self.assertEqual(unicycler.misc.spades_status_from_version('3.6.2'), 'good')
+
+ def test_spades_version_status_6(self):
+ self.assertEqual(unicycler.misc.spades_status_from_version('3.7.0'), 'good')
+
+ def test_spades_version_status_7(self):
+ self.assertEqual(unicycler.misc.spades_status_from_version('3.9.9'), 'good')
+
+ def test_spades_version_status_8(self):
+ self.assertEqual(unicycler.misc.spades_status_from_version('3.13.0'), 'good')
+
+ def test_spades_version_status_9(self):
+ self.assertEqual(unicycler.misc.spades_status_from_version('3.13.1'), 'too new')
+
+ def test_spades_version_status_10(self):
+ self.assertEqual(unicycler.misc.spades_status_from_version('3.14.1'), 'too new')
+
+ def test_spades_version_status_11(self):
+ self.assertEqual(unicycler.misc.spades_status_from_version('4.0.0'), 'too new')
=====================================
unicycler/assembly_graph.py
=====================================
@@ -1643,6 +1643,8 @@ class AssemblyGraph(object):
# We'll search specifically for the middle segments as they should be easy to spot.
for middle in self.segments:
+ if self.segments[middle].get_length() > settings.MAX_SIMPLE_LOOP_SIZE:
+ continue
# A middle segment will always have exactly one connection on each end which connect
# to the same segment (the repeat segment).
=====================================
unicycler/misc.py
=====================================
@@ -891,23 +891,9 @@ def spades_path_and_version(spades_path):
if 'python version' in out and 'is not supported' in out:
return found_spades_path, '', 'Python problem'
- # Make sure SPAdes is 3.6.2+
+ # Make sure SPAdes is 3.6.2 - 3.13.0
try:
- major_version = int(version.split('.')[0])
- if major_version < 3:
- status = 'too old'
- else:
- minor_version = int(version.split('.')[1])
- if minor_version < 6:
- status = 'too old'
- elif minor_version > 6:
- status = 'good'
- else: # minor_version == 6
- patch_version = int(version.split('.')[2])
- if patch_version < 2:
- status = 'too old'
- else:
- status = 'good'
+ status = spades_status_from_version(version)
except (ValueError, IndexError):
version, status = '?', 'too old'
@@ -933,6 +919,36 @@ def spades_version_from_spades_output(spades_output):
return ''
+def spades_status_from_version(version):
+ major_version = int(version.split('.')[0])
+ if major_version < 3:
+ return 'too old'
+ if major_version >= 4:
+ return 'too new'
+
+ minor_version = int(version.split('.')[1])
+ if minor_version < 6:
+ return 'too old'
+ if minor_version > 13:
+ return 'too new'
+ assert 6 <= minor_version <= 13
+
+ patch_version = int(version.split('.')[2])
+ if 6 < minor_version < 13:
+ return 'good'
+ assert minor_version == 6 or minor_version == 13
+ if minor_version == 6:
+ if patch_version < 2:
+ return 'too old'
+ else:
+ return 'good'
+ if minor_version == 13:
+ if patch_version > 0:
+ return 'too new'
+ else:
+ return 'good'
+
+
def racon_path_and_version(racon_path):
found_racon_path = shutil.which(racon_path)
if found_racon_path is None:
@@ -941,7 +957,7 @@ def racon_path_and_version(racon_path):
process = subprocess.Popen(command, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
out, _ = process.communicate()
out = out.decode().lower()
- if 'racon' not in out or 'options' not in out:
+ if 'racon' not in out or 'options' not in out:
return found_racon_path, '-', 'bad'
return found_racon_path, racon_version(found_racon_path), 'good'
=====================================
unicycler/settings.py
=====================================
@@ -181,3 +181,5 @@ REQUIRED_MINIASM_ASSEMBLY_SIZE_FOR_BRIDGING = 0.5
# limits the amount of trimming it's willing to do. I.e. if miniasm trimmed more than this from a
# contig, Unicycler won't.
MAX_MINIASM_DEAD_END_TRIM_SIZE = 100
+
+MAX_SIMPLE_LOOP_SIZE = 10000
=====================================
unicycler/spades_func.py
=====================================
@@ -263,7 +263,8 @@ def spades_read_correction(short1, short2, unpaired, spades_dir, threads, spades
command += ['-1', short1, '-2', short2]
if using_unpaired_reads:
command += ['-s', unpaired]
- command += ['-o', read_correction_dir, '--threads', str(threads), '--only-error-correction']
+ command += ['-o', read_correction_dir, '--threads', str(threads), '--only-error-correction',
+ '--phred-offset', '33']
if spades_tmp_dir is not None:
command += ['--tmp-dir', spades_tmp_dir]
process = subprocess.Popen(command, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
=====================================
unicycler/unicycler.py
=====================================
@@ -162,16 +162,16 @@ def main():
bridges += create_simple_long_read_bridges(graph, args.out, args.keep, args.threads,
read_dict, long_read_filename, scoring_scheme,
anchor_segments)
+ if not args.no_long_read_alignment:
+ read_names, min_scaled_score, min_alignment_length = \
+ align_long_reads_to_assembly_graph(graph, anchor_segments, args, full_command,
+ read_dict, read_names, long_read_filename)
- read_names, min_scaled_score, min_alignment_length = \
- align_long_reads_to_assembly_graph(graph, anchor_segments, args, full_command,
- read_dict, read_names, long_read_filename)
-
- expected_linear_seqs = args.linear_seqs > 0
- bridges += create_long_read_bridges(graph, read_dict, read_names, anchor_segments,
- args.verbosity, min_scaled_score, args.threads,
- scoring_scheme, min_alignment_length,
- expected_linear_seqs, args.min_bridge_qual)
+ expected_linear_seqs = args.linear_seqs > 0
+ bridges += create_long_read_bridges(graph, read_dict, read_names, anchor_segments,
+ args.verbosity, min_scaled_score, args.threads,
+ scoring_scheme, min_alignment_length,
+ expected_linear_seqs, args.min_bridge_qual)
if short_reads_available:
seg_nums_used_in_bridges = graph.apply_bridges(bridges, args.verbosity,
@@ -370,8 +370,8 @@ def get_arguments():
if show_all_args else argparse.SUPPRESS)
miniasm_group.add_argument('--existing_long_read_assembly', type=str, default=None,
help='A pre-prepared long read assembly for the sample in GFA '
- 'format. If this option is used, Unicycler will skip the '
- 'miniasm/Racon steps and instead use the given assembly '
+ 'or FASTA format. If this option is used, Unicycler will skip '
+ 'the miniasm/Racon steps and instead use the given assembly '
'(default: perform long read assembly using miniasm/Racon)'
if show_all_args else argparse.SUPPRESS)
@@ -466,6 +466,10 @@ def get_arguments():
'These options control the alignment of long reads to '
'the assembly graph.'
if show_all_args else argparse.SUPPRESS)
+ align_group.add_argument('--no_long_read_alignment', action='store_true',
+ help='Skip long-read-alignment-bases bridging (default: use '
+ 'long-read alignments to produce bridges)'
+ if show_all_args else argparse.SUPPRESS)
add_aligning_arguments(align_group, show_all_args)
# If no arguments were used, print the entire help (argparse default is to just give an error
@@ -839,7 +843,8 @@ def check_dependencies(args, short_reads_available, long_reads_available):
for i, row in enumerate(program_table):
if 'not used' in row:
row_colours[i] = 'dim'
- elif 'too old' in row or 'not found' in row or 'bad' in row or 'Python problem' in row:
+ elif ('too old' in row or 'too new' in row or 'not found' in row or 'bad' in row or
+ 'Python problem' in row):
row_colours[i] = 'red'
print_table(program_table, alignments='LLLL', row_colour=row_colours, max_col_width=60,
@@ -862,8 +867,8 @@ def quit_if_dependency_problem(spades_status, racon_status, makeblastdb_status,
log.log('')
if spades_status == 'not found':
quit_with_error('could not find SPAdes at ' + args.spades_path)
- if spades_status == 'too old':
- quit_with_error('Unicycler requires SPAdes v3.6.2 or higher')
+ if spades_status == 'too old' or spades_status == 'too new':
+ quit_with_error('Unicycler requires SPAdes v3.6.2 - v3.13.0')
if spades_status == 'Python problem':
quit_with_error('SPAdes cannot run due to an incompatible Python version')
if spades_status == 'bad':
=====================================
unicycler/version.py
=====================================
@@ -13,4 +13,4 @@ details. You should have received a copy of the GNU General Public License along
not, see <http://www.gnu.org/licenses/>.
"""
-__version__ = '0.4.8'
+__version__ = '0.4.9'
View it on GitLab: https://salsa.debian.org/med-team/unicycler/-/compare/8c22dd061ff0c4bff44216b4ebb8234ac55884b6...50a1cc570d350eefe4ebd1fe44137a2c2f520817
--
View it on GitLab: https://salsa.debian.org/med-team/unicycler/-/compare/8c22dd061ff0c4bff44216b4ebb8234ac55884b6...50a1cc570d350eefe4ebd1fe44137a2c2f520817
You're receiving this email because of your account on salsa.debian.org.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://alioth-lists.debian.net/pipermail/debian-med-commit/attachments/20210807/d021ddab/attachment-0001.htm>
More information about the debian-med-commit
mailing list