[med-svn] [Git][med-team/parsnp][master] 4 commits: New upstream version 2.0.3+dfsg
Étienne Mollier (@emollier)
gitlab at salsa.debian.org
Tue Jan 30 19:56:53 GMT 2024
Étienne Mollier pushed to branch master at Debian Med / parsnp
Commits:
21582a1c by Étienne Mollier at 2024-01-30T20:51:55+01:00
New upstream version 2.0.3+dfsg
- - - - -
7c7cdb18 by Étienne Mollier at 2024-01-30T20:51:55+01:00
routine-update: New upstream version
- - - - -
98bc4e43 by Étienne Mollier at 2024-01-30T20:51:56+01:00
Update upstream source from tag 'upstream/2.0.3+dfsg'
Update to upstream version '2.0.3+dfsg'
with Debian dir c79880210dd7c7845c80b2abffe9026363ab1fdc
- - - - -
c555def1 by Étienne Mollier at 2024-01-30T20:55:37+01:00
routine-update: Ready to upload to unstable
- - - - -
5 changed files:
- README.md
- debian/changelog
- debian/patches/proper_calls_to_tools.patch
- debian/patches/py3-parsnp-libs.patch
- parsnp
Changes:
=====================================
README.md
=====================================
@@ -14,7 +14,7 @@ conda install parsnp
## From source
-To build Parsnp from source, users must have automake 1.15, autoconf, and libtool installed. Parsnp also requires RaxML, Phipack, Harvest-tools, and numpy. Some additional features require Mash, FastANI and FastTree. All of these packages are available via Conda (many on the Bioconda channel).
+To build Parsnp from source, users must have automake 1.15, autoconf, and libtool installed. Parsnp also requires RaxML (or FastTree), Harvest-tools, and numpy. Some additional features require pySPOA, Mash, FastANI, and Phipack. All of these packages are available via Conda (many on the Bioconda channel).
### Build instructions
First, you must build the Muscle library
@@ -58,10 +58,26 @@ parsnp -r <reference_fasta> -d <genomes>
```
For example,
```
-./parsnp -g examples/mers_virus/ref/England1.gbk -d examples/mers_virus/genomes/*.fna -c
+parsnp -r examples/mers_virus/ref/England1.fna -d examples/mers_virus/genomes/*.fna -o examples-out
```
+
+## Partition mode
+Parsnp 2 includes a new mode which can be activated with `--partition`. This mode randomly splits the input genomes up into groups of *p* genomes each, where *p* defaults to 50 and can be changed with `--partition-size=p`. Parsnp is then run independently on each group, and the resulting alignment of each group is merged into a single alignment of all input genomes. This mode is intended for large datasets, as it reduces the computational requirements.
+
+```
+parsnp -r examples/mers_virus/ref/England1.fna -d examples/mers_virus/genomes/*.fna --partition --partition-size 10 -o examples-out-partitioned
+```
+
More examples can be found in the [readthedocs tutorial](https://harvest.readthedocs.io/en/latest/content/parsnp/tutorial.html)
+## Output files
+* `parsnp.xmfa` is the core-genome alignment.
+* `parsnp.ggr` is the compressed representation of the alignment generated by the harvest-toolkit. This file can be used to visualize alignments with Gingr.
+* `parsnp.snps.mblocks` is the core-SNP signature of each sequence in fasta format. This is the file which is used to generate `parsnp.tree`
+* `parsnp.tree` is the resulting phylogeny.
+* If run in partition mode, Parsnp will produce a `partition` folder in the output directory, which contains the output of each of the partitioned runs.
+
+
## Misc
=====================================
debian/changelog
=====================================
@@ -1,3 +1,9 @@
+parsnp (2.0.3+dfsg-1) unstable; urgency=medium
+
+ * New upstream version
+
+ -- Étienne Mollier <emollier at debian.org> Tue, 30 Jan 2024 20:53:44 +0100
+
parsnp (2.0.2+dfsg-1) unstable; urgency=medium
* New upstream version 2.0.2+dfsg
=====================================
debian/patches/proper_calls_to_tools.patch
=====================================
@@ -5,7 +5,7 @@ Forwarded: not-needed
--- parsnp.orig/parsnp
+++ parsnp/parsnp
-@@ -145,7 +145,7 @@
+@@ -146,7 +146,7 @@
def run_phipack(query,seqlen,workingdir):
currdir = os.getcwd()
os.chdir(workingdir)
@@ -14,7 +14,7 @@ Forwarded: not-needed
run_command(command, 1)
os.chdir(currdir)
-@@ -643,7 +643,7 @@
+@@ -644,7 +644,7 @@
missing = True
logger.critical("{} not in system path!".format(exe))
if use_phipack:
@@ -23,7 +23,7 @@ Forwarded: not-needed
if shutil.which(exe) is None:
missing = True
logger.critical("{} not in system path!".format(exe))
-@@ -658,7 +658,7 @@
+@@ -659,7 +659,7 @@
logger.critical("No fasttree executable found in system path!".format(exe))
missing = missing or (not has_fasttree)
else:
@@ -32,7 +32,7 @@ Forwarded: not-needed
if shutil.which(exe) is None:
missing = True
logger.critical("{} not in system path!".format(exe))
-@@ -975,7 +975,7 @@
+@@ -976,7 +976,7 @@
logger.debug("Writing .ini file")
if xtrafast or 1:
args.extend = False
@@ -41,7 +41,7 @@ Forwarded: not-needed
inifiled = inifiled.replace("$REF", ref)
inifiled = inifiled.replace("$EXTEND", "%d" % (args.extend))
inifiled = inifiled.replace("$ANCHORS", str(args.min_anchor_length))
-@@ -1074,7 +1074,7 @@
+@@ -1075,7 +1075,7 @@
if not os.path.exists(inifile):
logger.error("ini file %s does not exist!\n"%(inifile))
sys.exit(1)
@@ -50,7 +50,7 @@ Forwarded: not-needed
# with open(f"{outputDir}/parsnpAligner.out", 'w') as stdout_f, open(f"{outputDir}/parsnpAligner.err", 'w') as stderr_f:
# rc = run_command(command, ignorerc=1, stdout=stdout_f, stderr=stderr_f, prepend_time=True)
rc = run_logged_command(command=command, ignorerc=1, label="parsnp-aligner", outputDir=outputDir)
-@@ -1295,10 +1295,10 @@
+@@ -1301,10 +1301,10 @@
logger.info("Recruiting genomes...")
if use_parsnp_mumi:
if not inifile_exists:
@@ -63,7 +63,7 @@ Forwarded: not-needed
run_logged_command(command=command, outputDir=outputDir, label="parsnp-mumi")
# Takes eeach sequence and computes its mumi distance to the reference
try:
-@@ -1736,7 +1736,7 @@
+@@ -1740,7 +1740,7 @@
break
if not use_fasttree:
with TemporaryDirectory() as raxml_output_dir:
=====================================
debian/patches/py3-parsnp-libs.patch
=====================================
@@ -19,7 +19,7 @@ This patch header follows DEP-3: http://dep.debian.net/deps/dep3/
--- parsnp.orig/parsnp
+++ parsnp/parsnp
-@@ -12,7 +12,7 @@
+@@ -13,7 +13,7 @@
from tempfile import TemporaryDirectory
import re
import logging
@@ -28,7 +28,7 @@ This patch header follows DEP-3: http://dep.debian.net/deps/dep3/
import argparse
import signal
from multiprocessing import Pool
-@@ -21,7 +21,7 @@
+@@ -22,7 +22,7 @@
from pathlib import Path
@@ -36,4 +36,4 @@ This patch header follows DEP-3: http://dep.debian.net/deps/dep3/
+import parsnp.extend as ext
from tqdm import tqdm
- __version__ = "2.0.2"
+ __version__ = "2.0.3"
=====================================
parsnp
=====================================
@@ -8,6 +8,7 @@
import os, sys, string, random, subprocess, time, operator, math, datetime, numpy #pysam
from collections import defaultdict
import shutil
+import multiprocessing
import shlex
from tempfile import TemporaryDirectory
import re
@@ -24,7 +25,7 @@ from pathlib import Path
import extend as ext
from tqdm import tqdm
-__version__ = "2.0.2"
+__version__ = "2.0.3"
reroot_tree = True #use --midpoint-reroot
random_seeded = random.Random(42)
@@ -1210,8 +1211,13 @@ SETTINGS:
(len(outputDir)+17)*"*"))
# If we requested more threads than available, give a warning.
- if len(os.sched_getaffinity(0)) < threads:
- logger.warning("You have asked to use more threads than you have available on your machine. This may lead to serious performance degredation with RAxML.")
+ try:
+ if len(os.sched_getaffinity(0)) < threads:
+ logger.warning("You have asked to use more threads than you have available on your machine. This may lead to serious performance degredation with RAxML.")
+ except AttributeError:
+ if multiprocessing.cpu_count() < threads:
+ logger.warning("You have asked to use more threads than you have available on your machine. This may lead to serious performance degredation with RAxML.")
+
logger.info("<<Parsnp started>>")
#1)read fasta files (contigs/scaffolds/finished/DBs/dirs)
@@ -1446,6 +1452,7 @@ SETTINGS:
ref = auto_ref
finalfiles = sorted(finalfiles)
+ random_seeded.shuffle(finalfiles)
totseqs = len(finalfiles)
#initiate parallelPhiPack tasks
@@ -1614,19 +1621,16 @@ SETTINGS:
else:
import partition
- full_query_list_path = f"{outputDir}/config/input-list.txt"
- with open(full_query_list_path, 'w') as input_list_handle:
- random_seeded.shuffle(finalfiles)
- for qf in finalfiles:
- input_list_handle.write(qf + "\n")
-
if len(finalfiles) % args.partition_size == 1:
logger.warning("Incrementing partition size by 1 to avoid having a remainder partition of size 1")
args.partition_size += 1
partition_output_dir = f"{outputDir}/partition"
partition_list_dir = f"{partition_output_dir}/input-lists"
os.makedirs(partition_list_dir, exist_ok=True)
- run_command(f"split -l {args.partition_size} -a 5 --additional-suffix '.txt' {full_query_list_path} {partition_list_dir}/{partition.CHUNK_PREFIX}-")
+ for partition_idx in range(math.ceil(len(finalfiles) / args.partition_size)):
+ with open(f"{partition_list_dir}/{partition.CHUNK_PREFIX}-{partition_idx:010}.txt", 'w') as part_out:
+ for qf in finalfiles[partition_idx*args.partition_size : (partition_idx+1)*args.partition_size]:
+ part_out.write(f"{qf}\n")
chunk_label_parser = re.compile(f'{partition.CHUNK_PREFIX}-(.*).txt')
chunk_labels = []
View it on GitLab: https://salsa.debian.org/med-team/parsnp/-/compare/3ea7771b43f49527e8dde63b94492984dfc05865...c555def109bf8eb1e074936a2f80bf1e0a4ffa3c
--
View it on GitLab: https://salsa.debian.org/med-team/parsnp/-/compare/3ea7771b43f49527e8dde63b94492984dfc05865...c555def109bf8eb1e074936a2f80bf1e0a4ffa3c
You're receiving this email because of your account on salsa.debian.org.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://alioth-lists.debian.net/pipermail/debian-med-commit/attachments/20240130/8880568c/attachment-0001.htm>
More information about the debian-med-commit
mailing list