[med-svn] [Git][med-team/lamassemble][master] 6 commits: New upstream version 1.6.0
Nilesh Patra (@nilesh)
gitlab at salsa.debian.org
Mon May 8 20:06:33 BST 2023
Nilesh Patra pushed to branch master at Debian Med / lamassemble
Commits:
e8aa9065 by Nilesh Patra at 2023-05-09T00:29:02+05:30
New upstream version 1.6.0
- - - - -
a775faca by Nilesh Patra at 2023-05-09T00:29:03+05:30
Update upstream source from tag 'upstream/1.6.0'
Update to upstream version '1.6.0'
with Debian dir 67b42c04ebe80b5d6fa1f78c0632aa2153623a62
- - - - -
6ae7aa46 by Nilesh Patra at 2023-05-09T00:30:56+05:30
Re-diff patch
- - - - -
c0179c25 by Nilesh Patra at 2023-05-09T00:36:17+05:30
Minor fixes
- - - - -
7f361f04 by Nilesh Patra at 2023-05-09T00:36:17+05:30
Bump Standards-Version to 4.6.2 (no changes needed)
- - - - -
dad66114 by Nilesh Patra at 2023-05-09T00:36:17+05:30
Interim d/ch
- - - - -
8 changed files:
- README.md
- debian/changelog
- debian/control
- debian/copyright
- debian/patches/set_package_field.patch
- debian/source/lintian-overrides
- lamassemble
- setup.py
Changes:
=====================================
README.md
=====================================
@@ -11,7 +11,7 @@ from huge tandem repeats, wrong parts of the reads may get merged.
interface](https://mafft.cbrc.jp/alignment/server/index-rawreads.html).
**Usage option 2:** run it on your computer. You can install it from
-[bioconda][]:
+[Debian Med][] or [bioconda][]:
conda install -c bioconda lamassemble
@@ -26,7 +26,7 @@ on your computer, i.e. put them in your `PATH`. You can install
After installing, you can run it like this:
- lamassemble last-train.mat sequences.fx > consensus.fa
+ lamassemble last-train-file sequences.fx > consensus.fa
* The sequence file may be in fasta or fastq format (it makes no difference).
* It's OK to use gzipped (`.gz`) files.
@@ -34,19 +34,20 @@ After installing, you can run it like this:
You need to give it a file made by `last-train`, with the rates of
insertions, deletions, and substitutions in the reads.
-## Included `last-train` files
+## Built-in `last-train` files
-The `train` directory has these files:
+These files (with case-insensitive names) probably won't be ideal for
+your data, but they might be good enough:
-* `promethion.mat`: from human DNA sequenced with PromethION R9.4,
+* `promethion-2019`: from human DNA sequenced with PromethION R9.4,
base-called with Guppy 1.4.0 ([De Coster et al. Genome
Res. 2019](https://www.ncbi.nlm.nih.gov/pubmed/31186302),
ERR2631604).
-* `promethion-rna.mat`: from human RNA (direct RNA), sequenced with
+* `promethion-rna-2019`: from human RNA (direct RNA), sequenced with
PromethION R9.4, base-called with Albacore.
-* `sequel-II-CLR.mat`: from human DNA sequenced with PacBio Sequel II,
+* `sequel-II-CLR-2019`: from human DNA sequenced with PacBio Sequel II,
Continuous Long Reads ([Wenger et
al. Nat. Biotechnol. 2019](https://www.ncbi.nlm.nih.gov/pubmed/31406327),
SRR9972588).
@@ -96,6 +97,11 @@ similarities by increasing option `-m` (and/or decreasing `-W`).
- `-c`, `--consensus`: just make a consensus, of already-aligned sequences.
+- `-f FMT`, `--format=FMT`: consensus output format, fasta/fa or
+ fastq/fq. Fastq shows the error probability of each base (assuming
+ the alignment is correct, so over-optimistic). The format name is
+ case-insensitive.
+
- `-g G`, `--gap-max=G`: make the consensus sequence from alignment
columns with <= G% gaps.
@@ -144,3 +150,4 @@ similarities by increasing option `-m` (and/or decreasing `-W`).
bit, but increase run time.
[bioconda]: https://bioconda.github.io/user/install.html
+[Debian Med]: https://www.debian.org/devel/debian-med/
=====================================
debian/changelog
=====================================
@@ -1,3 +1,10 @@
+lamassemble (1.6.0-1) UNRELEASED; urgency=medium
+
+ * New upstream version 1.6.0
+ * Bump Standards-Version to 4.6.2 (no changes needed)
+
+ -- Nilesh Patra <nilesh at debian.org> Tue, 09 May 2023 00:35:55 +0530
+
lamassemble (1.4.2-5) unstable; urgency=medium
* Team Upload.
=====================================
debian/control
=====================================
@@ -4,12 +4,12 @@ Priority: optional
Maintainer: Debian Med Packaging Team <debian-med-packaging at lists.alioth.debian.org>
Uploaders:
Nilesh Patra <nilesh at debian.org>,
-Build-Depends:
+Build-Depends:
debhelper-compat (= 13),
dh-python,
python3-all,
python3-setuptools,
-Standards-Version: 4.6.1
+Standards-Version: 4.6.2
Vcs-Browser: https://salsa.debian.org/med-team/lamassemble
Vcs-Git: https://salsa.debian.org/med-team/lamassemble.git
Homepage: https://gitlab.com/mcfrith/lamassemble
=====================================
debian/copyright
=====================================
@@ -7,7 +7,7 @@ Copyright: 2019-2022 Martin Frith
License: Expat
Files: debian/*
-Copyright: 2022 Nilesh Patra <nilesh at debian.org>
+Copyright: 2022-2023 Nilesh Patra <nilesh at debian.org>
License: Expat
Files: debian/tests/data/*
=====================================
debian/patches/set_package_field.patch
=====================================
@@ -7,7 +7,7 @@ Last-Update: 2022-10-23
import setuptools
+from setuptools import find_packages
- commitInfo = " (HEAD -> master, tag: 1.4.2)".strip("( )").split()
+ commitInfo = " (HEAD -> master, tag: 1.6.0)".strip("( )").split()
version = commitInfo[commitInfo.index("tag:") + 1].rstrip(",")
@@ -6,6 +7,7 @@
setuptools.setup(
=====================================
debian/source/lintian-overrides
=====================================
@@ -1,4 +1,4 @@
-# False postive, this is a data file
+# False positive, this is a data file
lamassemble source: very-long-line-length-in-source-file 43984 > 512 [debian/tests/data/lama-tests.out:31]
# Testing triggers bash
lamassemble source: test-leaves-python-version-untested [debian/tests/run-unit-test]
=====================================
lamassemble
=====================================
@@ -5,7 +5,6 @@
from __future__ import print_function
import collections
-import functools
import gzip
import itertools
import logging
@@ -48,7 +47,52 @@ def nameAndSeqFromFastx(seqLines):
##### Routines for getting score parameters from a last-train file:
-def parametersFromLastTrain(lines):
+builtinTrainFiles = {
+ "promethion-2019": """
+# scale of score parameters: 4.5512
+# delOpenProb: 0.0369615
+# insOpenProb: 0.0340916
+# delExtendProb: 0.439744
+# insExtendProb: 0.403943
+# probability matrix (query letters = columns, reference letters = rows):
+# A C G T
+# A 0.278908 0.00109169 0.012899 0.00107855
+# C 0.00154506 0.197869 0.00044938 0.00272089
+# G 0.0272926 0.000508552 0.177789 0.000859018
+# T 0.00126293 0.00328421 0.000545484 0.291896
+""",
+ "promethion-rna-2019" :"""
+# scale of score parameters: 4.5512
+# delOpenProb: 0.0578071
+# insOpenProb: 0.0297262
+# delExtendProb: 0.438808
+# insExtendProb: 0.380129
+# probability matrix (query letters = columns, reference letters = rows):
+# A C G T
+# A 0.246603 0.00134 0.00742782 0.00762015
+# C 0.00168584 0.221462 0.000388905 0.0143456
+# G 0.00693157 0.000421241 0.239649 0.000805103
+# T 0.00803073 0.0131941 0.000617237 0.229477
+""",
+ "sequel-ii-clr-2019": """
+# scale of score parameters: 4.5512
+# delOpenProb: 0.0329087
+# insOpenProb: 0.0343556
+# delExtendProb: 0.0674455
+# insExtendProb: 0.288643
+# probability matrix (query letters = columns, reference letters = rows):
+# A C G T
+# A 0.292742 0.00340172 0.000378204 0.00239594
+# C 0.000849175 0.20731 0.000104201 0.00037284
+# G 0.0013137 0.000304338 0.187333 0.00680248
+# T 0.0135957 0.000785525 0.00475765 0.277554
+"""
+}
+
+def parametersFromLastTrain(trainFile):
+ builtin = builtinTrainFiles.get(trainFile.lower())
+ lines = builtin.splitlines() if builtin else openFile(trainFile)
+
alphabetSize = 4
scale = delOpenProb = delExtendProb = insOpenProb = insExtendProb = -1
probMatrix = range(alphabetSize)
@@ -242,12 +286,19 @@ def columnScore(priorScores, scoreMatrix, column, fwdBaseIndex):
revScore = strandScore(scoreMatrix[revBaseIndex], column, "tgca")
return priorScores[fwdBaseIndex] + fwdScore + revScore
+def asciiFromInvProb(invProb):
+ errProb = 1 - 1 / invProb
+ s = int(math.floor(-10 * math.log10(max(errProb, 1e-10))))
+ return chr(min(s + 33, 126))
+
def consensusCol(priorScores, scoreMatrix, column):
bases = "acgt"
column = column.replace("U", "T") # keep
- func = functools.partial(columnScore, priorScores, scoreMatrix, column)
- m = max(range(len(bases)), key=func)
- return bases[m]
+ scores = [columnScore(priorScores, scoreMatrix, column, i)
+ for i in range(len(bases))]
+ j, m = max(enumerate(scores), key=itemgetter(1))
+ invProb = sum(math.exp(i - m) for i in scores)
+ return bases[j], invProb
def consensusSeq(opts, probMatrix, alignedSequences):
if not alignedSequences:
@@ -259,11 +310,16 @@ def consensusSeq(opts, probMatrix, alignedSequences):
rows = [seq for name, seq in alignedSequences]
cols = alignmentColumnsForConsensus(opts, rows)
- return "".join(consensusCol(priorScores, scoreMatrix, i) for i in cols)
+ return [consensusCol(priorScores, scoreMatrix, i) for i in cols]
def printConsensusSeq(opts, probMatrix, alignedSequences):
- s = consensusSeq(opts, probMatrix, alignedSequences)
- printFasta(opts.name, s, sys.stdout)
+ c = consensusSeq(opts, probMatrix, alignedSequences)
+ seq = "".join(i[0] for i in c)
+ if opts.format.lower() in ("fastq", "fq"):
+ qual = "".join(asciiFromInvProb(i[1]) for i in c)
+ print("@" + opts.name, seq, "+", qual, sep="\n")
+ else:
+ print(">" + opts.name, seq, sep="\n")
##### Routines for aligning the sequences:
@@ -620,7 +676,7 @@ def main(opts, trainFile, seqFile):
logLevel = logging.INFO if opts.verbose else logging.WARNING
logging.basicConfig(format="%(filename)s: %(message)s", level=logLevel)
- scale, probMatrix, gapProbs = parametersFromLastTrain(openFile(trainFile))
+ scale, probMatrix, gapProbs = parametersFromLastTrain(trainFile)
probMatrix = list(matrixWithComplementSymmetricRows(probMatrix))
fastx = fastxInput(openFile(seqFile))
@@ -657,6 +713,8 @@ if __name__ == "__main__":
help="print an alignment, not a consensus")
op.add_option("-c", "--consensus", action="store_true",
help="just make a consensus, of already-aligned sequences")
+ op.add_option("-f", "--format", metavar="FMT", default="fasta", help=
+ "output format: fasta/fa or fastq/fq (default=%default)")
op.add_option("-g", "--gap-max", metavar="G", type="float", default=50,
help="use alignment columns with <= G% gaps "
"(default=%default)")
=====================================
setup.py
=====================================
@@ -1,6 +1,6 @@
import setuptools
-commitInfo = " (HEAD -> master, tag: 1.4.2)".strip("( )").split()
+commitInfo = " (HEAD -> master, tag: 1.6.0)".strip("( )").split()
version = commitInfo[commitInfo.index("tag:") + 1].rstrip(",")
setuptools.setup(
View it on GitLab: https://salsa.debian.org/med-team/lamassemble/-/compare/df225c1df7a26c6c2be16e2e9173f76da0f2984d...dad661141f8126898ec4e33f18f305a11eac8fa2
--
View it on GitLab: https://salsa.debian.org/med-team/lamassemble/-/compare/df225c1df7a26c6c2be16e2e9173f76da0f2984d...dad661141f8126898ec4e33f18f305a11eac8fa2
You're receiving this email because of your account on salsa.debian.org.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://alioth-lists.debian.net/pipermail/debian-med-commit/attachments/20230508/7a0ea3db/attachment-0001.htm>
More information about the debian-med-commit
mailing list