[med-svn] [Git][med-team/parsnp][master] 4 commits: New upstream version 2.0.3+dfsg

Étienne Mollier (@emollier) gitlab at salsa.debian.org
Tue Jan 30 19:56:53 GMT 2024



Étienne Mollier pushed to branch master at Debian Med / parsnp


Commits:
21582a1c by Étienne Mollier at 2024-01-30T20:51:55+01:00
New upstream version 2.0.3+dfsg
- - - - -
7c7cdb18 by Étienne Mollier at 2024-01-30T20:51:55+01:00
routine-update: New upstream version

- - - - -
98bc4e43 by Étienne Mollier at 2024-01-30T20:51:56+01:00
Update upstream source from tag 'upstream/2.0.3+dfsg'

Update to upstream version '2.0.3+dfsg'
with Debian dir c79880210dd7c7845c80b2abffe9026363ab1fdc
- - - - -
c555def1 by Étienne Mollier at 2024-01-30T20:55:37+01:00
routine-update: Ready to upload to unstable

- - - - -


5 changed files:

- README.md
- debian/changelog
- debian/patches/proper_calls_to_tools.patch
- debian/patches/py3-parsnp-libs.patch
- parsnp


Changes:

=====================================
README.md
=====================================
@@ -14,7 +14,7 @@ conda install parsnp
 
 ## From source
 
-To build Parsnp from source, users must have automake 1.15, autoconf, and libtool installed. Parsnp also requires RaxML, Phipack, Harvest-tools, and numpy. Some additional features require Mash, FastANI and FastTree. All of these packages are available via Conda (many on the Bioconda channel).
+To build Parsnp from source, users must have automake 1.15, autoconf, and libtool installed. Parsnp also requires RaxML (or FastTree), Harvest-tools, and numpy. Some additional features require  pySPOA, Mash, FastANI, and Phipack. All of these packages are available via Conda (many on the Bioconda channel).
 
 ### Build instructions
 First, you must build the Muscle library
@@ -58,10 +58,26 @@ parsnp -r <reference_fasta> -d <genomes>
 ```
 For example, 
 ```
-./parsnp -g examples/mers_virus/ref/England1.gbk -d examples/mers_virus/genomes/*.fna -c
+parsnp -r examples/mers_virus/ref/England1.fna -d examples/mers_virus/genomes/*.fna -o examples-out
 ```
+
+## Partition mode
+Parsnp 2 includes a new mode which can be activated with `--partition`. This mode randomly splits the input genomes up into groups of *p* genomes each, where *p* defaults to 50 and can be changed with `--partition-size=p`. Parsnp is then run independently on each group, and the resulting alignment of each group is merged into a single alignment of all input genomes. This mode is intended for large datasets, as it reduces the computational requirements. 
+
+```
+parsnp -r examples/mers_virus/ref/England1.fna -d examples/mers_virus/genomes/*.fna --partition --partition-size 10 -o examples-out-partitioned
+```
+
 More examples can be found in the [readthedocs tutorial](https://harvest.readthedocs.io/en/latest/content/parsnp/tutorial.html)
 
+## Output files
+* `parsnp.xmfa` is the core-genome alignment.
+* `parsnp.ggr` is the compressed representation of the alignment generated by the harvest-toolkit. This file can be used to visualize alignments with Gingr.
+* `parsnp.snps.mblocks` is the core-SNP signature of each sequence in fasta format. This is the file which is used to generate `parsnp.tree`
+* `parsnp.tree` is the resulting phylogeny.
+* If run in partition mode, Parsnp will produce a `partition` folder in the output directory, which contains the output of each of the partitioned runs. 
+
+
 
 ## Misc
 


=====================================
debian/changelog
=====================================
@@ -1,3 +1,9 @@
+parsnp (2.0.3+dfsg-1) unstable; urgency=medium
+
+  * New upstream version
+
+ -- Étienne Mollier <emollier at debian.org>  Tue, 30 Jan 2024 20:53:44 +0100
+
 parsnp (2.0.2+dfsg-1) unstable; urgency=medium
 
   * New upstream version 2.0.2+dfsg


=====================================
debian/patches/proper_calls_to_tools.patch
=====================================
@@ -5,7 +5,7 @@ Forwarded: not-needed
 
 --- parsnp.orig/parsnp
 +++ parsnp/parsnp
-@@ -145,7 +145,7 @@
+@@ -146,7 +146,7 @@
  def run_phipack(query,seqlen,workingdir):
      currdir = os.getcwd()
      os.chdir(workingdir)
@@ -14,7 +14,7 @@ Forwarded: not-needed
      run_command(command, 1)
      os.chdir(currdir)
  
-@@ -643,7 +643,7 @@
+@@ -644,7 +644,7 @@
          missing = True
          logger.critical("{} not in system path!".format(exe))
      if use_phipack:
@@ -23,7 +23,7 @@ Forwarded: not-needed
          if shutil.which(exe) is None:
              missing = True
              logger.critical("{} not in system path!".format(exe))
-@@ -658,7 +658,7 @@
+@@ -659,7 +659,7 @@
              logger.critical("No fasttree executable found in system path!".format(exe))
          missing = missing or (not has_fasttree)
      else:
@@ -32,7 +32,7 @@ Forwarded: not-needed
          if shutil.which(exe) is None:
              missing = True
              logger.critical("{} not in system path!".format(exe))
-@@ -975,7 +975,7 @@
+@@ -976,7 +976,7 @@
      logger.debug("Writing .ini file")
      if xtrafast or 1:
          args.extend = False
@@ -41,7 +41,7 @@ Forwarded: not-needed
      inifiled = inifiled.replace("$REF", ref)
      inifiled = inifiled.replace("$EXTEND", "%d" % (args.extend))
      inifiled = inifiled.replace("$ANCHORS", str(args.min_anchor_length))
-@@ -1074,7 +1074,7 @@
+@@ -1075,7 +1075,7 @@
      if not os.path.exists(inifile):
          logger.error("ini file %s does not exist!\n"%(inifile))
          sys.exit(1)
@@ -50,7 +50,7 @@ Forwarded: not-needed
      # with open(f"{outputDir}/parsnpAligner.out", 'w') as stdout_f, open(f"{outputDir}/parsnpAligner.err", 'w') as stderr_f:
          # rc = run_command(command, ignorerc=1, stdout=stdout_f, stderr=stderr_f, prepend_time=True)
      rc = run_logged_command(command=command, ignorerc=1, label="parsnp-aligner", outputDir=outputDir)
-@@ -1295,10 +1295,10 @@
+@@ -1301,10 +1301,10 @@
          logger.info("Recruiting genomes...")
          if use_parsnp_mumi:
              if not inifile_exists:
@@ -63,7 +63,7 @@ Forwarded: not-needed
              run_logged_command(command=command, outputDir=outputDir, label="parsnp-mumi")
              # Takes eeach sequence and computes its mumi distance to the reference
              try:
-@@ -1736,7 +1736,7 @@
+@@ -1740,7 +1740,7 @@
                  break
          if not use_fasttree:
              with TemporaryDirectory() as raxml_output_dir:


=====================================
debian/patches/py3-parsnp-libs.patch
=====================================
@@ -19,7 +19,7 @@ This patch header follows DEP-3: http://dep.debian.net/deps/dep3/
  
 --- parsnp.orig/parsnp
 +++ parsnp/parsnp
-@@ -12,7 +12,7 @@
+@@ -13,7 +13,7 @@
  from tempfile import TemporaryDirectory
  import re
  import logging
@@ -28,7 +28,7 @@ This patch header follows DEP-3: http://dep.debian.net/deps/dep3/
  import argparse
  import signal
  from multiprocessing import Pool
-@@ -21,7 +21,7 @@
+@@ -22,7 +22,7 @@
  from pathlib import Path
  
  
@@ -36,4 +36,4 @@ This patch header follows DEP-3: http://dep.debian.net/deps/dep3/
 +import parsnp.extend as ext
  from tqdm import tqdm
  
- __version__ = "2.0.2"
+ __version__ = "2.0.3"


=====================================
parsnp
=====================================
@@ -8,6 +8,7 @@
 import os, sys, string, random, subprocess, time, operator, math, datetime, numpy #pysam
 from collections import defaultdict
 import shutil
+import multiprocessing
 import shlex
 from tempfile import TemporaryDirectory
 import re
@@ -24,7 +25,7 @@ from pathlib import Path
 import extend as ext
 from tqdm import tqdm
 
-__version__ = "2.0.2"
+__version__ = "2.0.3"
 reroot_tree = True #use --midpoint-reroot
 random_seeded = random.Random(42)
 
@@ -1210,8 +1211,13 @@ SETTINGS:
         (len(outputDir)+17)*"*"))
 
     # If we requested more threads than available, give a warning.
-    if len(os.sched_getaffinity(0)) < threads:
-        logger.warning("You have asked to use more threads than you have available on your machine. This may lead to serious performance degredation with RAxML.")
+    try:
+        if len(os.sched_getaffinity(0)) < threads:
+            logger.warning("You have asked to use more threads than you have available on your machine. This may lead to serious performance degredation with RAxML.")
+    except AttributeError:
+        if multiprocessing.cpu_count() < threads:
+            logger.warning("You have asked to use more threads than you have available on your machine. This may lead to serious performance degredation with RAxML.")
+
     logger.info("<<Parsnp started>>")
 
     #1)read fasta files (contigs/scaffolds/finished/DBs/dirs)
@@ -1446,6 +1452,7 @@ SETTINGS:
         ref = auto_ref
 
     finalfiles = sorted(finalfiles)
+    random_seeded.shuffle(finalfiles)
     totseqs = len(finalfiles)
 
     #initiate parallelPhiPack tasks
@@ -1614,19 +1621,16 @@ SETTINGS:
     else:
         import partition 
 
-        full_query_list_path = f"{outputDir}/config/input-list.txt"
-        with open(full_query_list_path, 'w') as input_list_handle:
-            random_seeded.shuffle(finalfiles)
-            for qf in finalfiles:
-                input_list_handle.write(qf + "\n")
-
         if len(finalfiles) % args.partition_size == 1:
             logger.warning("Incrementing partition size by 1 to avoid having a remainder partition of size 1")
             args.partition_size += 1
         partition_output_dir = f"{outputDir}/partition"
         partition_list_dir = f"{partition_output_dir}/input-lists"
         os.makedirs(partition_list_dir, exist_ok=True)
-        run_command(f"split -l {args.partition_size} -a 5 --additional-suffix '.txt' {full_query_list_path} {partition_list_dir}/{partition.CHUNK_PREFIX}-")
+        for partition_idx in range(math.ceil(len(finalfiles) / args.partition_size)):
+            with open(f"{partition_list_dir}/{partition.CHUNK_PREFIX}-{partition_idx:010}.txt", 'w') as part_out:
+                for qf in finalfiles[partition_idx*args.partition_size : (partition_idx+1)*args.partition_size]:
+                    part_out.write(f"{qf}\n")
 
         chunk_label_parser = re.compile(f'{partition.CHUNK_PREFIX}-(.*).txt')
         chunk_labels = []



View it on GitLab: https://salsa.debian.org/med-team/parsnp/-/compare/3ea7771b43f49527e8dde63b94492984dfc05865...c555def109bf8eb1e074936a2f80bf1e0a4ffa3c

-- 
View it on GitLab: https://salsa.debian.org/med-team/parsnp/-/compare/3ea7771b43f49527e8dde63b94492984dfc05865...c555def109bf8eb1e074936a2f80bf1e0a4ffa3c
You're receiving this email because of your account on salsa.debian.org.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://alioth-lists.debian.net/pipermail/debian-med-commit/attachments/20240130/8880568c/attachment-0001.htm>


More information about the debian-med-commit mailing list