[med-svn] [Git][med-team/last-align][master] 5 commits: New upstream version 1080
Charles Plessy
gitlab at salsa.debian.org
Wed Aug 12 12:28:13 BST 2020
Charles Plessy pushed to branch master at Debian Med / last-align
Commits:
1db35c0c by Charles Plessy at 2020-08-12T11:11:56+00:00
New upstream version 1080
- - - - -
455686f1 by Charles Plessy at 2020-08-12T11:11:56+00:00
routine-update: New upstream version
- - - - -
94c46d31 by Charles Plessy at 2020-08-12T11:11:57+00:00
Update upstream source from tag 'upstream/1080'
Update to upstream version '1080'
with Debian dir 790a640288529045514b78c4a24d5a8f1c34dc30
- - - - -
0e985d37 by Charles Plessy at 2020-08-12T11:12:38+00:00
Refreshed patches
- - - - -
3004da21 by Charles Plessy at 2020-08-12T11:21:47+00:00
routine-update: Ready to upload to unstable
- - - - -
7 changed files:
- ChangeLog.txt
- debian/changelog
- debian/patches/2to3.patch
- scripts/last-dotplot
- scripts/maf-convert
- src/split/cbrc_split_aligner.cc
- src/version.hh
Changes:
=====================================
ChangeLog.txt
=====================================
@@ -1,8 +1,69 @@
+2020-07-31 Martin C. Frith <Martin C. Frith>
+
+ * src/split/cbrc_split_aligner.cc, test/last-split-test.out, test
+ /last-split-test.sh, test/spliceWithGap.maf:
+ last-split: catch spliced-alignment math overflow
+ [de7890c559b8] [tip]
+
+ * scripts/last-dotplot:
+ last-dotplot: fix bug in annotation names limit
+ [5aaae537a930]
+
+ * scripts/last-dotplot:
+ last-dotplot: read BED-like files more robustly
+ [579d74b5066f]
+
+ * scripts/last-dotplot:
+ last-dotplot: show annotation names!
+ [5209deca661c]
+
+ * scripts/last-dotplot:
+ last-dotplot: implementing annotation names...
+ [3b8d229f8359]
+
+ * scripts/last-dotplot:
+ last-dotplot: implementing annotation names...
+ [70b0e512d696]
+
+ * scripts/last-dotplot:
+ last-dotplot: implementing annotation names...
+ [170ab6c64728]
+
+ * scripts/last-dotplot:
+ last-dotplot: read annotations a bit faster
+ [8ec299b42e28]
+
+ * scripts/last-dotplot:
+ last-dotplot: refactor
+ [e43ecf1c3fb6]
+
+2020-07-30 Martin C. Frith <Martin C. Frith>
+
+ * scripts/last-dotplot:
+ last-dotplot: refactor
+ [c5d3cf8c1058]
+
+ * scripts/last-dotplot:
+ last-dotplot: refactor
+ [6cf9358f80da]
+
+ * scripts/last-dotplot:
+ last-dotplot: refactor
+ [da8b081ec496]
+
+ * scripts/last-dotplot:
+ last-dotplot: refactor
+ [a29f9cec16e1]
+
+ * scripts/maf-convert:
+ maf-convert: warn instead of crash on NaN
+ [eb2abd424c87]
+
2020-06-11 Martin C. Frith <Martin C. Frith>
* doc/last-train.txt, scripts/last-train, src/lastal.cc:
Make inconsistent last-train/lastal -Q an error
- [f2071b3c3173] [tip]
+ [f2071b3c3173]
* doc/last-train.txt, doc/lastal.txt, doc/lastdb.txt, scripts/last-
train, src/LastalArguments.cc, src/LastdbArguments.cc,
=====================================
debian/changelog
=====================================
@@ -1,3 +1,10 @@
+last-align (1080-1) unstable; urgency=medium
+
+ * New upstream version
+ * Refreshed patches
+
+ -- Charles Plessy <plessy at debian.org> Wed, 12 Aug 2020 11:12:45 +0000
+
last-align (1066-1) unstable; urgency=medium
* Team upload.
=====================================
debian/patches/2to3.patch
=====================================
@@ -5,15 +5,15 @@ Last-Update: Fri, 22 Nov 2019 11:23:44 +0100
Index: last-align/scripts/last-dotplot
===================================================================
---- last-align.orig/scripts/last-dotplot
-+++ last-align/scripts/last-dotplot
+--- last-align.orig/scripts/last-dotplot 2020-08-12 11:12:17.632834797 +0000
++++ last-align/scripts/last-dotplot 2020-08-12 11:12:17.628834864 +0000
@@ -1,4 +1,4 @@
-#! /usr/bin/env python
+#! /usr/bin/python3
# Author: Martin C. Frith 2008
# SPDX-License-Identifier: GPL-3.0-or-later
-@@ -264,12 +264,12 @@ def mergedRanges(ranges):
+@@ -264,12 +264,12 @@
yield oldBeg, maxEnd
def mergedRangesPerSeq(coverDict):
@@ -28,7 +28,7 @@ Index: last-align/scripts/last-dotplot
def trimmed(seqRanges, coverDict, minAlignedBases, maxGapFrac, endPad, midPad):
maxEndGapFrac, maxMidGapFrac = twoValuesFromOption(maxGapFrac, ",")
-@@ -310,7 +310,7 @@ def rangesWithStrandInfo(seqRanges, stra
+@@ -310,7 +310,7 @@
def natural_sort_key(my_string):
'''Return a sort key for "natural" ordering, e.g. chr9 < chr10.'''
parts = re.split(r'(\d+)', my_string)
@@ -37,7 +37,7 @@ Index: last-align/scripts/last-dotplot
return parts
def nameKey(oneSeqRanges):
-@@ -579,7 +579,7 @@ def drawJoins(im, alignments, bpPerPix,
+@@ -581,7 +581,7 @@
def expandedSeqDict(seqDict):
'''Allow lookup by short sequence names, e.g. chr7 as well as hg19.chr7.'''
newDict = seqDict.copy()
@@ -46,8 +46,8 @@ Index: last-align/scripts/last-dotplot
if "." in name:
base = name.split(".")[-1]
if base in newDict: # an ambiguous case was found:
-@@ -613,7 +613,7 @@ def readBed(fileName, rangeDict):
- yield layer, color, seqName, beg, end
+@@ -621,7 +621,7 @@
+ yield layer, color, seqName, beg, end, itemName
def commaSeparatedInts(text):
- return map(int, text.rstrip(",").split(","))
@@ -57,8 +57,8 @@ Index: last-align/scripts/last-dotplot
for line in myOpen(fileName):
Index: last-align/scripts/last-map-probs
===================================================================
---- last-align.orig/scripts/last-map-probs
-+++ last-align/scripts/last-map-probs
+--- last-align.orig/scripts/last-map-probs 2020-08-12 11:12:17.632834797 +0000
++++ last-align/scripts/last-map-probs 2020-08-12 11:12:17.628834864 +0000
@@ -1,4 +1,4 @@
-#! /usr/bin/env python
+#!/usr/bin/python3
@@ -67,15 +67,15 @@ Index: last-align/scripts/last-map-probs
Index: last-align/scripts/last-postmask
===================================================================
---- last-align.orig/scripts/last-postmask
-+++ last-align/scripts/last-postmask
+--- last-align.orig/scripts/last-postmask 2020-08-12 11:12:17.632834797 +0000
++++ last-align/scripts/last-postmask 2020-08-12 11:12:17.628834864 +0000
@@ -1,4 +1,4 @@
-#! /usr/bin/env python
+#!/usr/bin/python3
# Copyright 2014 Martin C. Frith
-@@ -37,7 +37,7 @@ def complement(base):
+@@ -37,7 +37,7 @@
def fastScoreMatrix(rowHeads, colHeads, matrix, deleteCost, insertCost):
matrixLen = 128
@@ -84,7 +84,7 @@ Index: last-align/scripts/last-postmask
fastMatrix = [[defaultScore for i in range(matrixLen)]
for j in range(matrixLen)]
for i, x in enumerate(rowHeads):
-@@ -111,7 +111,7 @@ def doOneFile(lines):
+@@ -111,7 +111,7 @@
if i.startswith("B="): bIns = int(i[2:])
if i.startswith("e="): minScore = int(i[2:])
if i.startswith("S="): strandParam = int(i[2:])
@@ -95,15 +95,15 @@ Index: last-align/scripts/last-postmask
rowHeads.append(fields[1])
Index: last-align/scripts/last-train
===================================================================
---- last-align.orig/scripts/last-train
-+++ last-align/scripts/last-train
+--- last-align.orig/scripts/last-train 2020-08-12 11:12:17.632834797 +0000
++++ last-align/scripts/last-train 2020-08-12 11:12:17.628834864 +0000
@@ -1,4 +1,4 @@
-#! /usr/bin/env python
+#!/usr/bin/python3
# Copyright 2015 Martin C. Frith
# SPDX-License-Identifier: GPL-3.0-or-later
-@@ -249,7 +249,7 @@ def writeScoreMatrix(outFile, matrix, pr
+@@ -249,7 +249,7 @@
writeMatrixBody(outFile, prefix, alphabet, matrix, "%6s")
def matProbsFromCounts(counts, opts):
@@ -114,8 +114,8 @@ Index: last-align/scripts/last-train
if opts.matsym: # symmetrize the substitution matrix
Index: last-align/scripts/maf-convert
===================================================================
---- last-align.orig/scripts/maf-convert
-+++ last-align/scripts/maf-convert
+--- last-align.orig/scripts/maf-convert 2020-08-12 11:12:17.632834797 +0000
++++ last-align/scripts/maf-convert 2020-08-12 11:12:17.628834864 +0000
@@ -1,4 +1,4 @@
-#! /usr/bin/env python
+#!/usr/bin/python3
@@ -124,15 +124,15 @@ Index: last-align/scripts/maf-convert
# Seems to work with Python 2.x, x>=6
Index: last-align/scripts/maf-cut
===================================================================
---- last-align.orig/scripts/maf-cut
-+++ last-align/scripts/maf-cut
+--- last-align.orig/scripts/maf-cut 2020-08-12 11:12:17.632834797 +0000
++++ last-align/scripts/maf-cut 2020-08-12 11:12:17.628834864 +0000
@@ -1,4 +1,4 @@
-#! /usr/bin/env python
+#! /usr/bin/python3
# Author: Martin C. Frith 2018
from __future__ import print_function
-@@ -71,12 +71,12 @@ def cutMafRecords(mafLines, alnBeg, alnE
+@@ -71,12 +71,12 @@
def mafFieldWidths(mafRecords):
sRecords = (i for i in mafRecords if i[0] == "s")
@@ -150,15 +150,15 @@ Index: last-align/scripts/maf-cut
def cutOneMaf(cutRange, mafLines):
Index: last-align/scripts/maf-join
===================================================================
---- last-align.orig/scripts/maf-join
-+++ last-align/scripts/maf-join
+--- last-align.orig/scripts/maf-join 2020-08-12 11:12:17.632834797 +0000
++++ last-align/scripts/maf-join 2020-08-12 11:12:17.628834864 +0000
@@ -1,4 +1,4 @@
-#! /usr/bin/env python
+#!/usr/bin/python3
# Copyright 2009, 2010, 2011 Martin C. Frith
-@@ -229,7 +229,7 @@ def nextWindow(window, theInput, referen
+@@ -229,7 +229,7 @@
while True:
maf = theInput.peek()
if maf.after(referenceMaf): break
@@ -169,15 +169,15 @@ Index: last-align/scripts/maf-join
Index: last-align/scripts/maf-swap
===================================================================
---- last-align.orig/scripts/maf-swap
-+++ last-align/scripts/maf-swap
+--- last-align.orig/scripts/maf-swap 2020-08-12 11:12:17.632834797 +0000
++++ last-align/scripts/maf-swap 2020-08-12 11:12:17.628834864 +0000
@@ -1,4 +1,4 @@
-#! /usr/bin/env python
+#!/usr/bin/python3
# Read MAF-format alignments, and write them, after moving the Nth
# sequence to the top in each alignment.
-@@ -69,12 +69,12 @@ def flippedMafRecords(mafLines):
+@@ -69,12 +69,12 @@
def sLineFieldWidths(mafLines):
sLines = (i for i in mafLines if i[0] == "s")
=====================================
scripts/last-dotplot
=====================================
@@ -393,6 +393,13 @@ def allSortedRanges(opts, alignments, alignmentsB,
t2 = mySortedRanges(seqRangesB2, oB2, 1, alignmentsB, s1)
return s1 + t1, s2 + t2
+def sizesPerText(texts, font, textDraw):
+ sizes = 0, 0
+ for t in texts:
+ if textDraw is not None:
+ sizes = textDraw.textsize(t, font=font)
+ yield t, sizes
+
def prettyNum(n):
t = str(n)
groups = []
@@ -420,21 +427,17 @@ def labelText(seqRange, labelOpt):
return seqName + ":" + prettyNum(beg) + "-" + prettyNum(end)
return seqName
-def rangeLabels(seqRanges, labelOpt, font, fontsize, image_mode, textRot):
- if fontsize:
- image_size = 1, 1
- im = Image.new(image_mode, image_size)
- draw = ImageDraw.Draw(im)
+def rangeLabels(seqRanges, labelOpt, font, textDraw, textRot):
x = y = 0
for r in seqRanges:
text = labelText(r, labelOpt)
- if fontsize:
- x, y = draw.textsize(text, font=font)
+ if textDraw is not None:
+ x, y = textDraw.textsize(text, font=font)
if textRot:
x, y = y, x
yield text, x, y, r[3]
-def dataFromRanges(sortedRanges, font, fontSize, imageMode, labelOpt, textRot):
+def dataFromRanges(sortedRanges, font, textDraw, labelOpt, textRot):
for seqName, rangeBeg, rangeEnd, strandNum in sortedRanges:
out = [seqName, str(rangeBeg), str(rangeEnd)]
if strandNum > 0:
@@ -442,8 +445,7 @@ def dataFromRanges(sortedRanges, font, fontSize, imageMode, labelOpt, textRot):
logging.info("\t".join(out))
logging.info("")
rangeSizes = [e - b for n, b, e, s in sortedRanges]
- labs = list(rangeLabels(sortedRanges, labelOpt, font, fontSize,
- imageMode, textRot))
+ labs = list(rangeLabels(sortedRanges, labelOpt, font, textDraw, textRot))
margin = max(i[2] for i in labs)
# xxx the margin may be too big, because some labels may get omitted
return rangeSizes, labs, margin
@@ -591,10 +593,16 @@ def readBed(fileName, rangeDict):
for line in myOpen(fileName):
w = line.split()
if not w: continue
+ if not w[1].isdigit():
+ w = w[1:]
seqName = w[0]
if seqName not in rangeDict: continue
+ seqRanges = rangeDict[seqName]
beg = int(w[1])
end = int(w[2])
+ if all(beg >= i[2] or end <= i[1] for i in seqRanges):
+ continue
+ itemName = w[3] if len(w) > 3 and w[3] != "." else ""
layer = 900
color = "#fbf"
if len(w) > 4:
@@ -605,33 +613,38 @@ def readBed(fileName, rangeDict):
color = "rgb(" + w[8] + ")"
else:
strand = w[5]
- isRev = rangeDict[seqName][0][2]
+ isRev = (seqRanges[0][3] > 1)
if strand == "+" and not isRev or strand == "-" and isRev:
color = "#ffe8e8"
if strand == "-" and not isRev or strand == "+" and isRev:
color = "#e8e8ff"
- yield layer, color, seqName, beg, end
+ yield layer, color, seqName, beg, end, itemName
def commaSeparatedInts(text):
return map(int, text.rstrip(",").split(","))
def readGenePred(opts, fileName, rangeDict):
for line in myOpen(fileName):
- fields = line.split()
+ fields = line.split() # xxx tab ???
if not fields: continue
- if fields[2] not in "+-": fields = fields[1:]
+ geneName = fields[12 if len(fields) > 12 else 0] # XXX ???
+ if fields[2] not in "+-":
+ fields = fields[1:]
seqName = fields[1]
if seqName not in rangeDict: continue
+ seqRanges = rangeDict[seqName]
#strand = fields[2]
cdsBeg = int(fields[5])
cdsEnd = int(fields[6])
exonBegs = commaSeparatedInts(fields[8])
exonEnds = commaSeparatedInts(fields[9])
for beg, end in zip(exonBegs, exonEnds):
- yield 300, opts.exon_color, seqName, beg, end
+ if all(beg >= i[2] or end <= i[1] for i in seqRanges):
+ continue
+ yield 300, opts.exon_color, seqName, beg, end, geneName
b = max(beg, cdsBeg)
e = min(end, cdsEnd)
- if b < e: yield 400, opts.cds_color, seqName, b, e
+ if b < e: yield 400, opts.cds_color, seqName, b, e, ""
def readRmsk(fileName, rangeDict):
for line in myOpen(fileName):
@@ -642,6 +655,7 @@ def readRmsk(fileName, rangeDict):
beg = int(fields[6])
end = int(fields[7])
strand = fields[9]
+ repeatName = fields[10]
repeatClass = fields[11]
elif len(fields) == 15: # .out
seqName = fields[4]
@@ -649,15 +663,19 @@ def readRmsk(fileName, rangeDict):
beg = int(fields[5]) - 1
end = int(fields[6])
strand = fields[8]
+ repeatName = fields[9]
repeatClass = fields[10].split("/")[0]
else:
continue
+ seqRanges = rangeDict[seqName]
+ if all(beg >= i[2] or end <= i[1] for i in seqRanges):
+ continue
if repeatClass in ("Low_complexity", "Simple_repeat", "Satellite"):
- yield 200, "#fbf", seqName, beg, end
- elif (strand == "+") != rangeDict[seqName][0][2]:
- yield 100, "#ffe8e8", seqName, beg, end
+ yield 200, "#fbf", seqName, beg, end, repeatName
+ elif (strand == "+") != (seqRanges[0][3] > 1):
+ yield 100, "#ffe8e8", seqName, beg, end, repeatName
else:
- yield 100, "#e8e8ff", seqName, beg, end
+ yield 100, "#e8e8ff", seqName, beg, end, repeatName
def isExtraFirstGapField(fields):
return fields[4].isdigit()
@@ -674,12 +692,13 @@ def readGaps(opts, fileName, rangeDict):
end = int(w[2])
beg = end - int(w[5]) # zero-based coordinate
if w[7] == "yes":
- yield 3000, opts.bridged_color, seqName, beg, end
+ yield 3000, opts.bridged_color, seqName, beg, end, ""
else:
- yield 2000, opts.unbridged_color, seqName, beg, end
+ yield 2000, opts.unbridged_color, seqName, beg, end, ""
-def bedBoxes(beds, rangeDict, margin, edge, isTop, bpPerPix):
- for layer, color, seqName, bedBeg, bedEnd in beds:
+def bedBoxes(beds, rangeDict, limit, isTop, bpPerPix, textSizes):
+ cover = [(limit, limit)]
+ for layer, color, seqName, bedBeg, bedEnd, name in reversed(beds):
for rangeBeg, rangeEnd, isReverseRange, origin in rangeDict[seqName]:
beg = max(bedBeg, rangeBeg)
end = min(bedEnd, rangeEnd)
@@ -688,27 +707,34 @@ def bedBoxes(beds, rangeDict, margin, edge, isTop, bpPerPix):
beg, end = -end, -beg
if layer <= 1000:
# include partly-covered pixels
- b = (origin + beg) // bpPerPix
- e = div_ceil(origin + end, bpPerPix)
+ pixBeg = (origin + beg) // bpPerPix
+ pixEnd = div_ceil(origin + end, bpPerPix)
else:
# exclude partly-covered pixels
- b = div_ceil(origin + beg, bpPerPix)
- e = (origin + end) // bpPerPix
- if e <= b: continue
+ pixBeg = div_ceil(origin + beg, bpPerPix)
+ pixEnd = (origin + end) // bpPerPix
+ if pixEnd <= pixBeg: continue
if bedEnd >= rangeEnd: # include partly-covered end pixels
if isReverseRange:
- b = (origin + beg) // bpPerPix
+ pixBeg = (origin + beg) // bpPerPix
else:
- e = div_ceil(origin + end, bpPerPix)
- if isTop:
- box = b, margin, e, edge
+ pixEnd = div_ceil(origin + end, bpPerPix)
+ textWidth, textHeight = textSizes[name]
+ nameBeg = (pixBeg + pixEnd - textHeight) // 2
+ nameEnd = nameBeg + textHeight
+ if name and all(e <= nameBeg or b >= nameEnd for b, e in cover):
+ cover.append((nameBeg, nameEnd))
else:
- box = margin, b, edge, e
- yield layer, color, box
+ name = ""
+ yield layer, color, isTop, pixBeg, pixEnd, name, nameBeg, textWidth
-def drawAnnotations(im, boxes):
+def drawAnnotations(im, boxes, tMargin, bMarginBeg, lMargin, rMarginBeg):
# xxx use partial transparency for different-color overlaps?
- for layer, color, box in boxes:
+ for layer, color, isTop, beg, end, name, nameBeg, nameLen in boxes:
+ if isTop:
+ box = beg, tMargin, end, bMarginBeg
+ else:
+ box = lMargin, beg, rMarginBeg, end
im.paste(color, box)
def placedLabels(labels, rangePixBegs, rangePixLens, beg, end):
@@ -760,6 +786,20 @@ def axisImage(labels, rangePixBegs, rangePixLens, textRot,
draw.text(position, text, font=font, fill=fill)
return im
+def annoTextImage(opts, image_mode, font, margin, length, boxes, isLeftAlign):
+ image_size = margin, length
+ im = Image.new(image_mode, image_size, opts.margin_color)
+ draw = ImageDraw.Draw(im)
+ for layer, color, isTop, beg, end, name, nameBeg, nameLen in boxes:
+ xPosition = 0 if isLeftAlign else margin - nameLen
+ position = xPosition, nameBeg
+ draw.text(position, name, font=font, fill="black")
+ return im
+
+def rangesPerSeq(sortedRanges):
+ for seqName, group in itertools.groupby(sortedRanges, itemgetter(0)):
+ yield seqName, sorted(group)
+
def rangesWithOrigins(sortedRanges, rangePixBegs, rangePixLens, bpPerPix):
for i, j, k in zip(sortedRanges, rangePixBegs, rangePixLens):
seqName, rangeBeg, rangeEnd, strandNum = i
@@ -770,10 +810,10 @@ def rangesWithOrigins(sortedRanges, rangePixBegs, rangePixLens, bpPerPix):
origin = bpPerPix * j - rangeBeg
yield seqName, (rangeBeg, rangeEnd, isReverseRange, origin)
-def rangesPerSeq(sortedRanges, rangePixBegs, rangePixLens, bpPerPix):
+def rangesAndOriginsPerSeq(sortedRanges, rangePixBegs, rangePixLens, bpPerPix):
a = rangesWithOrigins(sortedRanges, rangePixBegs, rangePixLens, bpPerPix)
- for k, v in itertools.groupby(a, itemgetter(0)):
- yield k, sorted(i[1] for i in v)
+ for seqName, group in itertools.groupby(a, itemgetter(0)):
+ yield seqName, sorted(i[1] for i in group)
def getFont(opts):
if opts.fontfile:
@@ -813,6 +853,18 @@ def remainingSequenceRanges(seqRanges, alignments, seqIndex):
remainingSequences = set(i[seqIndex] for i in alignments)
return [i for i in seqRanges if i[0] in remainingSequences]
+def readAnnotations(opts, font, textDraw, sortedRanges,
+ bedFile, repFile, geneFile, gapFile):
+ rangeDict = expandedSeqDict(dict(rangesPerSeq(sortedRanges)))
+ annots = sorted(itertools.chain(readBed(bedFile, rangeDict),
+ readRmsk(repFile, rangeDict),
+ readGenePred(opts, geneFile, rangeDict),
+ readGaps(opts, gapFile, rangeDict)))
+ names = set(i[5] for i in annots)
+ textSizes = dict(sizesPerText(names, font, textDraw))
+ margin = max(i[0] for i in textSizes.values()) if textSizes else 0
+ return annots, textSizes, margin
+
def lastDotplot(opts, args):
logLevel = logging.INFO if opts.verbose else logging.WARNING
logging.basicConfig(format="%(filename)s: %(message)s", level=logLevel)
@@ -867,32 +919,46 @@ def lastDotplot(opts, args):
cutRanges1, cutRangesB1, cutRanges2, cutRangesB2)
sortedRanges1, sortedRanges2 = sortOut
+ textDraw = None
+ if opts.fontsize:
+ textDraw = ImageDraw.Draw(Image.new(image_mode, (1, 1)))
+
textRot1 = "vertical".startswith(opts.rot1)
- i1 = dataFromRanges(sortedRanges1, font,
- opts.fontsize, image_mode, opts.labels1, textRot1)
+ i1 = dataFromRanges(sortedRanges1, font, textDraw, opts.labels1, textRot1)
rangeSizes1, labelData1, tMargin = i1
textRot2 = "horizontal".startswith(opts.rot2)
- i2 = dataFromRanges(sortedRanges2, font,
- opts.fontsize, image_mode, opts.labels2, textRot2)
+ i2 = dataFromRanges(sortedRanges2, font, textDraw, opts.labels2, textRot2)
rangeSizes2, labelData2, lMargin = i2
- maxPixels1 = opts.width - lMargin
- maxPixels2 = opts.height - tMargin
+ logging.info("reading annotations...")
+
+ a1 = readAnnotations(opts, font, textDraw, sortedRanges1,
+ opts.bed1, opts.rmsk1, opts.genePred1, opts.gap1)
+ annots1, annoTextSizes1, bMargin = a1
+
+ a2 = readAnnotations(opts, font, textDraw, sortedRanges2,
+ opts.bed2, opts.rmsk2, opts.genePred2, opts.gap2)
+ annots2, annoTextSizes2, rMargin = a2
+
+ maxPixels1 = opts.width - lMargin - rMargin
+ maxPixels2 = opts.height - tMargin - bMargin
bpPerPix1 = get_bp_per_pix(rangeSizes1, opts.border_pixels, maxPixels1)
bpPerPix2 = get_bp_per_pix(rangeSizes2, opts.border_pixels, maxPixels2)
bpPerPix = max(bpPerPix1, bpPerPix2)
logging.info("bp per pixel = " + str(bpPerPix))
p1 = pixelData(rangeSizes1, bpPerPix, opts.border_pixels, lMargin)
- rangePixBegs1, rangePixLens1, width = p1
- rangeDict1 = dict(rangesPerSeq(sortedRanges1, rangePixBegs1,
- rangePixLens1, bpPerPix))
+ rangePixBegs1, rangePixLens1, rMarginBeg = p1
+ width = rMarginBeg + rMargin
+ rangeDict1 = dict(rangesAndOriginsPerSeq(sortedRanges1, rangePixBegs1,
+ rangePixLens1, bpPerPix))
p2 = pixelData(rangeSizes2, bpPerPix, opts.border_pixels, tMargin)
- rangePixBegs2, rangePixLens2, height = p2
- rangeDict2 = dict(rangesPerSeq(sortedRanges2, rangePixBegs2,
- rangePixLens2, bpPerPix))
+ rangePixBegs2, rangePixLens2, bMarginBeg = p2
+ height = bMarginBeg + bMargin
+ rangeDict2 = dict(rangesAndOriginsPerSeq(sortedRanges2, rangePixBegs2,
+ rangePixLens2, bpPerPix))
logging.info("width: " + str(width))
logging.info("height: " + str(height))
@@ -902,30 +968,21 @@ def lastDotplot(opts, args):
hits = alignmentPixels(width, height, allAlignments, bpPerPix,
rangeDict1, rangeDict2)
- logging.info("reading annotations...")
-
rangeDict1 = expandedSeqDict(rangeDict1)
- beds1 = itertools.chain(readBed(opts.bed1, rangeDict1),
- readRmsk(opts.rmsk1, rangeDict1),
- readGenePred(opts, opts.genePred1, rangeDict1),
- readGaps(opts, opts.gap1, rangeDict1))
- b1 = bedBoxes(beds1, rangeDict1, tMargin, height, True, bpPerPix)
-
rangeDict2 = expandedSeqDict(rangeDict2)
- beds2 = itertools.chain(readBed(opts.bed2, rangeDict2),
- readRmsk(opts.rmsk2, rangeDict2),
- readGenePred(opts, opts.genePred2, rangeDict2),
- readGaps(opts, opts.gap2, rangeDict2))
- b2 = bedBoxes(beds2, rangeDict2, lMargin, width, False, bpPerPix)
- boxes = sorted(itertools.chain(b1, b2))
+ boxes1 = list(bedBoxes(annots1, rangeDict1, rMarginBeg, True, bpPerPix,
+ annoTextSizes1))
+ boxes2 = list(bedBoxes(annots2, rangeDict2, bMarginBeg, False, bpPerPix,
+ annoTextSizes2))
+ boxes = sorted(itertools.chain(boxes1, boxes2))
logging.info("drawing...")
image_size = width, height
im = Image.new(image_mode, image_size, opts.background_color)
- drawAnnotations(im, boxes)
+ drawAnnotations(im, boxes, tMargin, bMarginBeg, lMargin, rMarginBeg)
joinA, joinB = twoValuesFromOption(opts.join, ":")
if joinA in "13":
@@ -957,12 +1014,20 @@ def lastDotplot(opts, args):
im.paste(axis1, (0, 0))
im.paste(axis2, (0, 0))
+ annoImage1 = annoTextImage(opts, image_mode, font, bMargin, width,
+ boxes1, False)
+ annoImage1 = annoImage1.transpose(Image.ROTATE_90)
+ annoImage2 = annoTextImage(opts, image_mode, font, rMargin, height,
+ boxes2, True)
+ im.paste(annoImage1, (0, bMarginBeg))
+ im.paste(annoImage2, (rMarginBeg, 0))
+
for i in rangePixBegs1[1:]:
- box = i - opts.border_pixels, tMargin, i, height
+ box = i - opts.border_pixels, tMargin, i, bMarginBeg
im.paste(opts.border_color, box)
for i in rangePixBegs2[1:]:
- box = lMargin, i - opts.border_pixels, width, i
+ box = lMargin, i - opts.border_pixels, rMarginBeg, i
im.paste(opts.border_color, box)
im.save(args[1])
=====================================
scripts/maf-convert
=====================================
@@ -12,6 +12,7 @@ from itertools import *
import collections
import functools
import gzip
+import logging
import math
import operator
import optparse
@@ -479,9 +480,15 @@ mapqMaximum = "254"
mapqMaximumNum = float(mapqMaximum)
def mapqFromProb(probString):
- try: p = float(probString)
- except ValueError: raise Exception("bad probability: " + probString)
- if p < 0 or p > 1: raise Exception("bad probability: " + probString)
+ try:
+ p = float(probString)
+ except ValueError:
+ raise ValueError("bad probability: " + probString)
+ if math.isnan(p):
+ logging.warning("bad probability: " + probString)
+ return mapqMissing
+ if p < 0 or p > 1:
+ raise ValueError("bad probability: " + probString)
if p == 0: return mapqMaximum
phred = -10 * math.log(p, 10)
if phred >= mapqMaximumNum: return mapqMaximum
@@ -936,6 +943,8 @@ def mafConvertOneFile(opts, formatName, lines):
raise Exception("unknown format: " + formatName)
def mafConvert(opts, args):
+ logging.basicConfig(format="%(filename)s: %(message)s")
+
formatName = args[0].lower()
fileNames = args[1:]
=====================================
src/split/cbrc_split_aligner.cc
=====================================
@@ -13,6 +13,8 @@
#include <sstream>
#include <stdexcept>
+#include <float.h>
+
static void err(const std::string& s) {
throw std::runtime_error(s);
}
@@ -696,6 +698,11 @@ void SplitAligner::backwardSplice() {
if (alns[i].qend == j) p += endprob;
p = p * Sexp[ij*2-1] * rescale;
+ // XXX p can overflow to inf. This can happen if there is
+ // a large unaligned part in the middle of the query
+ // sequence. Then, in forwardSplice, Fmat may underflow
+ // to 0, so the subsequent rescales are all 1.
+
Bmat[ij - 1] = p;
//if (alns[i].qstart == j-1) zB += p;
pSum += p * spliceEndProb(ij - 1);
@@ -713,7 +720,9 @@ SplitAligner::marginalProbs(unsigned queryBeg, unsigned alnNum,
unsigned j = queryBeg;
for (unsigned pos = alnBeg; pos < alnEnd; ++pos) {
size_t ij = matrixRowOrigins[i] + j;
- if (alns[i].qalign[pos] == '-') {
+ if (Bmat[ij] > DBL_MAX) { // can happen for spliced alignment
+ output.push_back(0);
+ } else if (alns[i].qalign[pos] == '-') {
double value = Fmat[ij] * Bmat[ij] * Sexp[ij*2] * cell(rescales, j);
output.push_back(value);
} else {
@@ -1113,6 +1122,7 @@ void SplitAligner::flipSpliceSignals() {
}
double SplitAligner::spliceSignalStrandLogOdds() const {
+ // XXX if Bmat overflowed to inf, then I think this is unreliable
assert(rescales.size() == rescalesRev.size());
double logOdds = 0;
for (unsigned j = 0; j < rescales.size(); ++j) {
=====================================
src/version.hh
=====================================
@@ -1 +1 @@
-"1066"
+"1080"
View it on GitLab: https://salsa.debian.org/med-team/last-align/-/compare/b412e2ccdedbc7b1f1b5cd6ea79fbb11b76da3b9...3004da21edf89f66332d7591e9ba5f047431185f
--
View it on GitLab: https://salsa.debian.org/med-team/last-align/-/compare/b412e2ccdedbc7b1f1b5cd6ea79fbb11b76da3b9...3004da21edf89f66332d7591e9ba5f047431185f
You're receiving this email because of your account on salsa.debian.org.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://alioth-lists.debian.net/pipermail/debian-med-commit/attachments/20200812/dcd4f3a2/attachment-0001.html>
More information about the debian-med-commit
mailing list