[med-svn] [Git][med-team/bcalm][upstream] New upstream version 2.2.3

David Parsons gitlab at salsa.debian.org
Thu Aug 20 15:05:33 BST 2020



David Parsons pushed to branch upstream at Debian Med / bcalm


Commits:
81349d42 by David Parsons at 2020-08-20T15:57:55+02:00
New upstream version 2.2.3
- - - - -


3 changed files:

- VERSION
- bidirected-graphs-in-bcalm2/bidirected-graphs-in-bcalm2.md
- + scripts/abundance_stats.py


Changes:

=====================================
VERSION
=====================================
@@ -1 +1 @@
-v2.2.2
+v2.2.3


=====================================
bidirected-graphs-in-bcalm2/bidirected-graphs-in-bcalm2.md
=====================================
@@ -8,7 +8,7 @@ There are several ways of representing bi-directed graphs. Here we describe the
 
 ![Fig1](fig1.png)
 
-Note that we draw the fromSign next to the "from" vertex, and the toSign next to the "to" vertex. In other words, the following graph is equivalent to the one above:
+Note that we draw the fromSign next to the "from" vertex, and the toSign next to the "to" vertex. In other words, the following two graph are equivalent:
 
 ![Fig2](fig2.png)
 


=====================================
scripts/abundance_stats.py
=====================================
@@ -0,0 +1,42 @@
+#!/usr/bin/env python
+import sys, os
+if len(sys.argv) < 2:
+    print("prints some abundance statitics of a unitigs FASTA file produced by BCALM")
+    exit("arguments: unitigs.fa")
+
+# https://www.biostars.org/p/710/#1412
+from itertools import groupby
+def fasta_iter(fasta_name):
+    """
+    given a fasta file. yield tuples of header, sequence
+    """
+    fh = open(fasta_name)
+    # ditch the boolean (x[0]) and just keep the header or sequence since
+    # we know they alternate.
+    faiter = (x[1] for x in groupby(fh, lambda line: line[0] == ">"))
+    for header in faiter:
+        # drop the ">"
+        header = next(header)[1:].strip()
+        # join all sequence lines to one.
+        seq = "".join(s.strip() for s in next(faiter))
+        yield header, seq
+
+unitigs = sys.argv[1]
+abundances = []
+from collections import defaultdict
+totsize = defaultdict(int)
+for header, unitig in fasta_iter(unitigs):
+    for field in header.split():
+        if field.startswith("km:f:"):
+            abundance = field.split(":")[-1]
+            #print(abundance)
+            abundance = int(float(abundance)) # convert to rounded int
+            abundances += [abundance]
+            totsize[abundance] += len(unitig)
+
+from collections import Counter
+c = Counter(abundances)
+print("'value' : 'number of unitigs having this mean abundance value' : 'total size of unitigs having this mean abundance'")
+for val in sorted(list(c)):
+    print(val,":",c[val],':',totsize[val])
+



View it on GitLab: https://salsa.debian.org/med-team/bcalm/-/commit/81349d42b89623b54a6a4fd4b3a443348a8586e3

-- 
View it on GitLab: https://salsa.debian.org/med-team/bcalm/-/commit/81349d42b89623b54a6a4fd4b3a443348a8586e3
You're receiving this email because of your account on salsa.debian.org.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://alioth-lists.debian.net/pipermail/debian-med-commit/attachments/20200820/1ec78bc8/attachment-0001.html>


More information about the debian-med-commit mailing list