[med-svn] [Git][med-team/last-align][master] 4 commits: New upstream version 1454
Nilesh Patra (@nilesh)
gitlab at salsa.debian.org
Mon Jun 19 14:58:59 BST 2023
Nilesh Patra pushed to branch master at Debian Med / last-align
Commits:
d6583835 by Nilesh Patra at 2023-05-24T15:06:21+05:30
New upstream version 1454
- - - - -
455804a1 by Nilesh Patra at 2023-06-19T19:16:30+05:30
New upstream version 1456
- - - - -
beb6be55 by Nilesh Patra at 2023-06-19T19:16:34+05:30
Update upstream source from tag 'upstream/1456'
Update to upstream version '1456'
with Debian dir c5fd912158df1e579ecc85cada12ea60e474ba91
- - - - -
c54288e3 by Nilesh Patra at 2023-06-19T19:17:29+05:30
Upload to unstable
- - - - -
8 changed files:
- bin/last-train
- bin/parallel-fasta
- bin/parallel-fastq
- debian/changelog
- doc/last-cookbook.rst
- doc/last-train.rst
- src/makefile
- test/last-train-test.out
Changes:
=====================================
bin/last-train
=====================================
@@ -854,9 +854,15 @@ def doTraining(opts, args):
ss = scoresAndScaleFunc(outerScale, matParams, delRatios, insRatios)
matScores, delCosts, insCosts, scale, rowFreqs, colFreqs = ss
if not opts.codon:
+ rowSum = sum(rowFreqs)
+ colSum = sum(colFreqs)
pid = sum(math.exp(matScores[i][i] / scale) * rowFreqs[i] * colFreqs[i]
- for i in range(len(matScores))) / sum(colFreqs)
+ for i in range(len(matScores))) / colSum
+ rowProbs = [i / rowSum for i in rowFreqs]
+ colProbs = [i / colSum for i in colFreqs]
print("# substitution percent identity: {0:.6}".format(100 * pid))
+ print("# ref letter %:", *(format(100 * i, "#.3") for i in rowProbs))
+ print("# qry letter %:", *(format(100 * i, "#.3") for i in colProbs))
if opts.X: print("#last -X", opts.X)
if opts.R: print("#last -R", opts.R)
if opts.Q: print("#last -Q", opts.Q)
=====================================
bin/parallel-fasta
=====================================
@@ -1,3 +1,3 @@
#! /bin/sh
-exec parallel --round --recstart '>' "$@"
+exec parallel --pipe --round --recstart '>' "$@"
=====================================
bin/parallel-fastq
=====================================
@@ -1,3 +1,3 @@
#! /bin/sh
-exec parallel --round -L8 "$@"
+exec parallel --pipe --round -L8 "$@"
=====================================
debian/changelog
=====================================
@@ -1,3 +1,9 @@
+last-align (1456-1) unstable; urgency=medium
+
+ * New upstream version 1456
+
+ -- Nilesh Patra <nilesh at debian.org> Mon, 19 Jun 2023 19:17:17 +0530
+
last-align (1454-1) unstable; urgency=medium
* New upstream version 1454. Re-diff patches
=====================================
doc/last-cookbook.rst
=====================================
@@ -313,6 +313,11 @@ use, add lastdb seeding_ option ``-uMAM4`` or or ``-uMAM8``. To
increase them even more, add lastal_ option ``-m100`` (or as high as
you can bear).
+Aligning distantly-related genomes
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+See https://github.com/mcfrith/last-genome-alignments
+
Large reference sequences
-------------------------
=====================================
doc/last-train.rst
=====================================
@@ -3,25 +3,41 @@ last-train
last-train finds the rates (probabilities) of insertion, deletion, and
substitutions between two sets of sequences. It thereby finds
-suitable substitution and gap scores for aligning them.
+suitable substitution and gap scores for aligning them. You can use
+it like this::
-It (probabilistically) aligns the sequences using some initial score
-parameters, then estimates better score parameters based on the
-alignments, and repeats this procedure until the parameters stop
-changing.
+ lastdb mydb reference.fasta
+ last-train mydb queries.fasta > my.train
-The usage is like this::
+last-train can read .gz files, or from pipes::
- lastdb mydb reference.fasta
- last-train mydb queries.fasta
+ bzcat queries.fasta.bz2 | last-train mydb > my.train
-last-train prints a summary of each alignment step, followed by the
-final score parameters, in a format that can be read by `lastal's -p
-option <doc/lastal.rst>`_.
+How it works
+------------
-last-train can read .gz files, or from pipes::
+1. For sake of speed, last-train just uses some pseudo-random chunks
+ of ``queries.fasta``.
+
+2. It starts with an initial guess for substitution and gap
+ parameters.
+
+3. Using these parameters, it finds similar segments between the
+ chunks and ``reference.fasta``.
+
+ If one part of the chunks matches several parts of
+ ``reference.fasta``, only the best matches are kept.
+
+4. It gets substitution and gap parameters from these similar
+ segments.
+
+5. It uses these parameters to find similar segments more accurately,
+ then gets parameters again, and repeats until the result stops
+ changing.
- bzcat queries.fasta.bz2 | last-train mydb
+last-train prints a summary of each iteration, followed by the final
+score parameters in a format that can be read by `lastal's -p option
+<doc/lastal.rst>`_.
Options
-------
@@ -44,9 +60,10 @@ Training options
--gapsym
Force the insertion costs to equal the deletion costs.
--pid=PID
- Ignore alignments with > PID% identity (matches / [matches +
- mismatches]). This aims to optimize the parameters for
- low-similarity alignments (similarly to the BLOSUM matrices).
+ Ignore similar segments with > PID% identity (matches /
+ [matches + mismatches]). This aims to optimize the parameters
+ for low-similarity alignments (similarly to the BLOSUM
+ matrices).
--postmask=NUMBER
By default, last-train ignores alignments of mostly-lowercase
sequence (by using `last-postmask <doc/last-postmask.rst>`_).
@@ -185,6 +202,7 @@ Bugs
* last-train can fail for various reasons, e.g. if the sequences are
too dissimilar. If it fails to find any alignments, you could try
- reducing the alignment significance_ threshold with option ``-D``.
+ increasing the sample number, or reducing the alignment
+ significance_ threshold with option ``-D``.
.. _significance: doc/last-evalues.rst
=====================================
src/makefile
=====================================
@@ -143,7 +143,7 @@ ScoreMatrixData.hh: ../data/*.mat
../build/mat-inc.sh ../data/*.mat > $@
VERSION1 = git describe --dirty
-VERSION2 = echo ' (HEAD -> main, tag: 1454) ' | sed -e 's/.*tag: *//' -e 's/[,) ].*//'
+VERSION2 = echo ' (HEAD -> main, tag: 1456) ' | sed -e 's/.*tag: *//' -e 's/[,) ].*//'
VERSION = \"`test -e ../.git && $(VERSION1) || $(VERSION2)`\"
=====================================
test/last-train-test.out
=====================================
@@ -391,6 +391,8 @@ TEST last-train -m1 /tmp/last-train-test < ../examples/mouseMito.fa
# T -112 -68 -215 89
# substitution percent identity: 72.6161
+# ref letter %: 28.4 30.3 13.4 27.9
+# qry letter %: 30.8 25.2 12.5 31.5
#last -t4.4363
#last -a 26
#last -A 24
@@ -840,6 +842,8 @@ TEST last-train -m1 -C2 --revsym /tmp/last-train-test ../examples/mouseMito.fa
# T -111 -77 -126 82
# substitution percent identity: 71.8801
+# ref letter %: 29.0 21.0 21.0 29.0
+# qry letter %: 33.1 16.9 16.9 33.1
#last -t4.6475
#last -a 26
#last -A 23
@@ -1246,6 +1250,8 @@ TEST last-train -m1 -k16 --matsym --gapsym /tmp/last-train-test ../examples/mous
# T -111 -44 -186 89
# substitution percent identity: 73.3262
+# ref letter %: 30.6 27.0 13.0 29.4
+# qry letter %: 30.6 27.0 13.0 29.4
#last -t4.41768
#last -a 26
#last -A 26
@@ -1738,6 +1744,8 @@ TEST last-train -Q1 /tmp/last-train-test bs100.fastq
# T -497 -228 -437 61
# substitution percent identity: 76.2866
+# ref letter %: 31.0 23.6 19.4 26.0
+# qry letter %: 31.0 0.0154 19.4 49.5
#last -Q 1
#last -t4.28179
#last -a 39
View it on GitLab: https://salsa.debian.org/med-team/last-align/-/compare/829496a40533203b1fc92ef1ae52dd96456d83a5...c54288e30a886d3327dd283a89ac44a04019155c
--
View it on GitLab: https://salsa.debian.org/med-team/last-align/-/compare/829496a40533203b1fc92ef1ae52dd96456d83a5...c54288e30a886d3327dd283a89ac44a04019155c
You're receiving this email because of your account on salsa.debian.org.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://alioth-lists.debian.net/pipermail/debian-med-commit/attachments/20230619/b0e50f28/attachment-0001.htm>
More information about the debian-med-commit
mailing list