[med-svn] [Git][med-team/parsnp][upstream] New upstream version 2.0.3+dfsg

Étienne Mollier (@emollier) gitlab at salsa.debian.org
Tue Jan 30 19:57:05 GMT 2024



Étienne Mollier pushed to branch upstream at Debian Med / parsnp


Commits:
21582a1c by Étienne Mollier at 2024-01-30T20:51:55+01:00
New upstream version 2.0.3+dfsg
- - - - -


2 changed files:

- README.md
- parsnp


Changes:

=====================================
README.md
=====================================
@@ -14,7 +14,7 @@ conda install parsnp
 
 ## From source
 
-To build Parsnp from source, users must have automake 1.15, autoconf, and libtool installed. Parsnp also requires RaxML, Phipack, Harvest-tools, and numpy. Some additional features require Mash, FastANI and FastTree. All of these packages are available via Conda (many on the Bioconda channel).
+To build Parsnp from source, users must have automake 1.15, autoconf, and libtool installed. Parsnp also requires RaxML (or FastTree), Harvest-tools, and numpy. Some additional features require  pySPOA, Mash, FastANI, and Phipack. All of these packages are available via Conda (many on the Bioconda channel).
 
 ### Build instructions
 First, you must build the Muscle library
@@ -58,10 +58,26 @@ parsnp -r <reference_fasta> -d <genomes>
 ```
 For example, 
 ```
-./parsnp -g examples/mers_virus/ref/England1.gbk -d examples/mers_virus/genomes/*.fna -c
+parsnp -r examples/mers_virus/ref/England1.fna -d examples/mers_virus/genomes/*.fna -o examples-out
 ```
+
+## Partition mode
+Parsnp 2 includes a new mode which can be activated with `--partition`. This mode randomly splits the input genomes up into groups of *p* genomes each, where *p* defaults to 50 and can be changed with `--partition-size=p`. Parsnp is then run independently on each group, and the resulting alignment of each group is merged into a single alignment of all input genomes. This mode is intended for large datasets, as it reduces the computational requirements. 
+
+```
+parsnp -r examples/mers_virus/ref/England1.fna -d examples/mers_virus/genomes/*.fna --partition --partition-size 10 -o examples-out-partitioned
+```
+
 More examples can be found in the [readthedocs tutorial](https://harvest.readthedocs.io/en/latest/content/parsnp/tutorial.html)
 
+## Output files
+* `parsnp.xmfa` is the core-genome alignment.
+* `parsnp.ggr` is the compressed representation of the alignment generated by the harvest-toolkit. This file can be used to visualize alignments with Gingr.
+* `parsnp.snps.mblocks` is the core-SNP signature of each sequence in fasta format. This is the file which is used to generate `parsnp.tree`
+* `parsnp.tree` is the resulting phylogeny.
+* If run in partition mode, Parsnp will produce a `partition` folder in the output directory, which contains the output of each of the partitioned runs. 
+
+
 
 ## Misc
 


=====================================
parsnp
=====================================
@@ -8,6 +8,7 @@
 import os, sys, string, random, subprocess, time, operator, math, datetime, numpy #pysam
 from collections import defaultdict
 import shutil
+import multiprocessing
 import shlex
 from tempfile import TemporaryDirectory
 import re
@@ -24,7 +25,7 @@ from pathlib import Path
 import extend as ext
 from tqdm import tqdm
 
-__version__ = "2.0.2"
+__version__ = "2.0.3"
 reroot_tree = True #use --midpoint-reroot
 random_seeded = random.Random(42)
 
@@ -1210,8 +1211,13 @@ SETTINGS:
         (len(outputDir)+17)*"*"))
 
     # If we requested more threads than available, give a warning.
-    if len(os.sched_getaffinity(0)) < threads:
-        logger.warning("You have asked to use more threads than you have available on your machine. This may lead to serious performance degredation with RAxML.")
+    try:
+        if len(os.sched_getaffinity(0)) < threads:
+            logger.warning("You have asked to use more threads than you have available on your machine. This may lead to serious performance degredation with RAxML.")
+    except AttributeError:
+        if multiprocessing.cpu_count() < threads:
+            logger.warning("You have asked to use more threads than you have available on your machine. This may lead to serious performance degredation with RAxML.")
+
     logger.info("<<Parsnp started>>")
 
     #1)read fasta files (contigs/scaffolds/finished/DBs/dirs)
@@ -1446,6 +1452,7 @@ SETTINGS:
         ref = auto_ref
 
     finalfiles = sorted(finalfiles)
+    random_seeded.shuffle(finalfiles)
     totseqs = len(finalfiles)
 
     #initiate parallelPhiPack tasks
@@ -1614,19 +1621,16 @@ SETTINGS:
     else:
         import partition 
 
-        full_query_list_path = f"{outputDir}/config/input-list.txt"
-        with open(full_query_list_path, 'w') as input_list_handle:
-            random_seeded.shuffle(finalfiles)
-            for qf in finalfiles:
-                input_list_handle.write(qf + "\n")
-
         if len(finalfiles) % args.partition_size == 1:
             logger.warning("Incrementing partition size by 1 to avoid having a remainder partition of size 1")
             args.partition_size += 1
         partition_output_dir = f"{outputDir}/partition"
         partition_list_dir = f"{partition_output_dir}/input-lists"
         os.makedirs(partition_list_dir, exist_ok=True)
-        run_command(f"split -l {args.partition_size} -a 5 --additional-suffix '.txt' {full_query_list_path} {partition_list_dir}/{partition.CHUNK_PREFIX}-")
+        for partition_idx in range(math.ceil(len(finalfiles) / args.partition_size)):
+            with open(f"{partition_list_dir}/{partition.CHUNK_PREFIX}-{partition_idx:010}.txt", 'w') as part_out:
+                for qf in finalfiles[partition_idx*args.partition_size : (partition_idx+1)*args.partition_size]:
+                    part_out.write(f"{qf}\n")
 
         chunk_label_parser = re.compile(f'{partition.CHUNK_PREFIX}-(.*).txt')
         chunk_labels = []



View it on GitLab: https://salsa.debian.org/med-team/parsnp/-/commit/21582a1c9c7ad96b6515593f187ebf4ab4d1a5f8

-- 
View it on GitLab: https://salsa.debian.org/med-team/parsnp/-/commit/21582a1c9c7ad96b6515593f187ebf4ab4d1a5f8
You're receiving this email because of your account on salsa.debian.org.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://alioth-lists.debian.net/pipermail/debian-med-commit/attachments/20240130/3e50aedf/attachment-0001.htm>


More information about the debian-med-commit mailing list