[med-svn] [Git][med-team/last-align][master] 5 commits: New upstream version 1080

Charles Plessy gitlab at salsa.debian.org
Wed Aug 12 12:28:13 BST 2020



Charles Plessy pushed to branch master at Debian Med / last-align


Commits:
1db35c0c by Charles Plessy at 2020-08-12T11:11:56+00:00
New upstream version 1080
- - - - -
455686f1 by Charles Plessy at 2020-08-12T11:11:56+00:00
routine-update: New upstream version

- - - - -
94c46d31 by Charles Plessy at 2020-08-12T11:11:57+00:00
Update upstream source from tag 'upstream/1080'

Update to upstream version '1080'
with Debian dir 790a640288529045514b78c4a24d5a8f1c34dc30
- - - - -
0e985d37 by Charles Plessy at 2020-08-12T11:12:38+00:00
Refreshed patches

- - - - -
3004da21 by Charles Plessy at 2020-08-12T11:21:47+00:00
routine-update: Ready to upload to unstable

- - - - -


7 changed files:

- ChangeLog.txt
- debian/changelog
- debian/patches/2to3.patch
- scripts/last-dotplot
- scripts/maf-convert
- src/split/cbrc_split_aligner.cc
- src/version.hh


Changes:

=====================================
ChangeLog.txt
=====================================
@@ -1,8 +1,69 @@
+2020-07-31  Martin C. Frith  <Martin C. Frith>
+
+	* src/split/cbrc_split_aligner.cc, test/last-split-test.out, test
+	/last-split-test.sh, test/spliceWithGap.maf:
+	last-split: catch spliced-alignment math overflow
+	[de7890c559b8] [tip]
+
+	* scripts/last-dotplot:
+	last-dotplot: fix bug in annotation names limit
+	[5aaae537a930]
+
+	* scripts/last-dotplot:
+	last-dotplot: read BED-like files more robustly
+	[579d74b5066f]
+
+	* scripts/last-dotplot:
+	last-dotplot: show annotation names!
+	[5209deca661c]
+
+	* scripts/last-dotplot:
+	last-dotplot: implementing annotation names...
+	[3b8d229f8359]
+
+	* scripts/last-dotplot:
+	last-dotplot: implementing annotation names...
+	[70b0e512d696]
+
+	* scripts/last-dotplot:
+	last-dotplot: implementing annotation names...
+	[170ab6c64728]
+
+	* scripts/last-dotplot:
+	last-dotplot: read annotations a bit faster
+	[8ec299b42e28]
+
+	* scripts/last-dotplot:
+	last-dotplot: refactor
+	[e43ecf1c3fb6]
+
+2020-07-30  Martin C. Frith  <Martin C. Frith>
+
+	* scripts/last-dotplot:
+	last-dotplot: refactor
+	[c5d3cf8c1058]
+
+	* scripts/last-dotplot:
+	last-dotplot: refactor
+	[6cf9358f80da]
+
+	* scripts/last-dotplot:
+	last-dotplot: refactor
+	[da8b081ec496]
+
+	* scripts/last-dotplot:
+	last-dotplot: refactor
+	[a29f9cec16e1]
+
+	* scripts/maf-convert:
+	maf-convert: warn instead of crash on NaN
+	[eb2abd424c87]
+
 2020-06-11  Martin C. Frith  <Martin C. Frith>
 
 	* doc/last-train.txt, scripts/last-train, src/lastal.cc:
 	Make inconsistent last-train/lastal -Q an error
-	[f2071b3c3173] [tip]
+	[f2071b3c3173]
 
 	* doc/last-train.txt, doc/lastal.txt, doc/lastdb.txt, scripts/last-
 	train, src/LastalArguments.cc, src/LastdbArguments.cc,


=====================================
debian/changelog
=====================================
@@ -1,3 +1,10 @@
+last-align (1080-1) unstable; urgency=medium
+
+  * New upstream version
+  * Refreshed patches
+
+ -- Charles Plessy <plessy at debian.org>  Wed, 12 Aug 2020 11:12:45 +0000
+
 last-align (1066-1) unstable; urgency=medium
 
   * Team upload.


=====================================
debian/patches/2to3.patch
=====================================
@@ -5,15 +5,15 @@ Last-Update: Fri, 22 Nov 2019 11:23:44 +0100
 
 Index: last-align/scripts/last-dotplot
 ===================================================================
---- last-align.orig/scripts/last-dotplot
-+++ last-align/scripts/last-dotplot
+--- last-align.orig/scripts/last-dotplot	2020-08-12 11:12:17.632834797 +0000
++++ last-align/scripts/last-dotplot	2020-08-12 11:12:17.628834864 +0000
 @@ -1,4 +1,4 @@
 -#! /usr/bin/env python
 +#! /usr/bin/python3
  # Author: Martin C. Frith 2008
  # SPDX-License-Identifier: GPL-3.0-or-later
  
-@@ -264,12 +264,12 @@ def mergedRanges(ranges):
+@@ -264,12 +264,12 @@
      yield oldBeg, maxEnd
  
  def mergedRangesPerSeq(coverDict):
@@ -28,7 +28,7 @@ Index: last-align/scripts/last-dotplot
  
  def trimmed(seqRanges, coverDict, minAlignedBases, maxGapFrac, endPad, midPad):
      maxEndGapFrac, maxMidGapFrac = twoValuesFromOption(maxGapFrac, ",")
-@@ -310,7 +310,7 @@ def rangesWithStrandInfo(seqRanges, stra
+@@ -310,7 +310,7 @@
  def natural_sort_key(my_string):
      '''Return a sort key for "natural" ordering, e.g. chr9 < chr10.'''
      parts = re.split(r'(\d+)', my_string)
@@ -37,7 +37,7 @@ Index: last-align/scripts/last-dotplot
      return parts
  
  def nameKey(oneSeqRanges):
-@@ -579,7 +579,7 @@ def drawJoins(im, alignments, bpPerPix,
+@@ -581,7 +581,7 @@
  def expandedSeqDict(seqDict):
      '''Allow lookup by short sequence names, e.g. chr7 as well as hg19.chr7.'''
      newDict = seqDict.copy()
@@ -46,8 +46,8 @@ Index: last-align/scripts/last-dotplot
          if "." in name:
              base = name.split(".")[-1]
              if base in newDict:  # an ambiguous case was found:
-@@ -613,7 +613,7 @@ def readBed(fileName, rangeDict):
-         yield layer, color, seqName, beg, end
+@@ -621,7 +621,7 @@
+         yield layer, color, seqName, beg, end, itemName
  
  def commaSeparatedInts(text):
 -    return map(int, text.rstrip(",").split(","))
@@ -57,8 +57,8 @@ Index: last-align/scripts/last-dotplot
      for line in myOpen(fileName):
 Index: last-align/scripts/last-map-probs
 ===================================================================
---- last-align.orig/scripts/last-map-probs
-+++ last-align/scripts/last-map-probs
+--- last-align.orig/scripts/last-map-probs	2020-08-12 11:12:17.632834797 +0000
++++ last-align/scripts/last-map-probs	2020-08-12 11:12:17.628834864 +0000
 @@ -1,4 +1,4 @@
 -#! /usr/bin/env python
 +#!/usr/bin/python3
@@ -67,15 +67,15 @@ Index: last-align/scripts/last-map-probs
  
 Index: last-align/scripts/last-postmask
 ===================================================================
---- last-align.orig/scripts/last-postmask
-+++ last-align/scripts/last-postmask
+--- last-align.orig/scripts/last-postmask	2020-08-12 11:12:17.632834797 +0000
++++ last-align/scripts/last-postmask	2020-08-12 11:12:17.628834864 +0000
 @@ -1,4 +1,4 @@
 -#! /usr/bin/env python
 +#!/usr/bin/python3
  
  # Copyright 2014 Martin C. Frith
  
-@@ -37,7 +37,7 @@ def complement(base):
+@@ -37,7 +37,7 @@
  
  def fastScoreMatrix(rowHeads, colHeads, matrix, deleteCost, insertCost):
      matrixLen = 128
@@ -84,7 +84,7 @@ Index: last-align/scripts/last-postmask
      fastMatrix = [[defaultScore for i in range(matrixLen)]
                    for j in range(matrixLen)]
      for i, x in enumerate(rowHeads):
-@@ -111,7 +111,7 @@ def doOneFile(lines):
+@@ -111,7 +111,7 @@
                      if i.startswith("B="): bIns = int(i[2:])
                      if i.startswith("e="): minScore = int(i[2:])
                      if i.startswith("S="): strandParam = int(i[2:])
@@ -95,15 +95,15 @@ Index: last-align/scripts/last-postmask
                  rowHeads.append(fields[1])
 Index: last-align/scripts/last-train
 ===================================================================
---- last-align.orig/scripts/last-train
-+++ last-align/scripts/last-train
+--- last-align.orig/scripts/last-train	2020-08-12 11:12:17.632834797 +0000
++++ last-align/scripts/last-train	2020-08-12 11:12:17.628834864 +0000
 @@ -1,4 +1,4 @@
 -#! /usr/bin/env python
 +#!/usr/bin/python3
  # Copyright 2015 Martin C. Frith
  # SPDX-License-Identifier: GPL-3.0-or-later
  
-@@ -249,7 +249,7 @@ def writeScoreMatrix(outFile, matrix, pr
+@@ -249,7 +249,7 @@
      writeMatrixBody(outFile, prefix, alphabet, matrix, "%6s")
  
  def matProbsFromCounts(counts, opts):
@@ -114,8 +114,8 @@ Index: last-align/scripts/last-train
      if opts.matsym:  # symmetrize the substitution matrix
 Index: last-align/scripts/maf-convert
 ===================================================================
---- last-align.orig/scripts/maf-convert
-+++ last-align/scripts/maf-convert
+--- last-align.orig/scripts/maf-convert	2020-08-12 11:12:17.632834797 +0000
++++ last-align/scripts/maf-convert	2020-08-12 11:12:17.628834864 +0000
 @@ -1,4 +1,4 @@
 -#! /usr/bin/env python
 +#!/usr/bin/python3
@@ -124,15 +124,15 @@ Index: last-align/scripts/maf-convert
  # Seems to work with Python 2.x, x>=6
 Index: last-align/scripts/maf-cut
 ===================================================================
---- last-align.orig/scripts/maf-cut
-+++ last-align/scripts/maf-cut
+--- last-align.orig/scripts/maf-cut	2020-08-12 11:12:17.632834797 +0000
++++ last-align/scripts/maf-cut	2020-08-12 11:12:17.628834864 +0000
 @@ -1,4 +1,4 @@
 -#! /usr/bin/env python
 +#! /usr/bin/python3
  # Author: Martin C. Frith 2018
  
  from __future__ import print_function
-@@ -71,12 +71,12 @@ def cutMafRecords(mafLines, alnBeg, alnE
+@@ -71,12 +71,12 @@
  
  def mafFieldWidths(mafRecords):
      sRecords = (i for i in mafRecords if i[0] == "s")
@@ -150,15 +150,15 @@ Index: last-align/scripts/maf-cut
  def cutOneMaf(cutRange, mafLines):
 Index: last-align/scripts/maf-join
 ===================================================================
---- last-align.orig/scripts/maf-join
-+++ last-align/scripts/maf-join
+--- last-align.orig/scripts/maf-join	2020-08-12 11:12:17.632834797 +0000
++++ last-align/scripts/maf-join	2020-08-12 11:12:17.628834864 +0000
 @@ -1,4 +1,4 @@
 -#! /usr/bin/env python
 +#!/usr/bin/python3
  
  # Copyright 2009, 2010, 2011 Martin C. Frith
  
-@@ -229,7 +229,7 @@ def nextWindow(window, theInput, referen
+@@ -229,7 +229,7 @@
          while True:
              maf = theInput.peek()
              if maf.after(referenceMaf): break
@@ -169,15 +169,15 @@ Index: last-align/scripts/maf-join
  
 Index: last-align/scripts/maf-swap
 ===================================================================
---- last-align.orig/scripts/maf-swap
-+++ last-align/scripts/maf-swap
+--- last-align.orig/scripts/maf-swap	2020-08-12 11:12:17.632834797 +0000
++++ last-align/scripts/maf-swap	2020-08-12 11:12:17.628834864 +0000
 @@ -1,4 +1,4 @@
 -#! /usr/bin/env python
 +#!/usr/bin/python3
  
  # Read MAF-format alignments, and write them, after moving the Nth
  # sequence to the top in each alignment.
-@@ -69,12 +69,12 @@ def flippedMafRecords(mafLines):
+@@ -69,12 +69,12 @@
  
  def sLineFieldWidths(mafLines):
      sLines = (i for i in mafLines if i[0] == "s")


=====================================
scripts/last-dotplot
=====================================
@@ -393,6 +393,13 @@ def allSortedRanges(opts, alignments, alignmentsB,
     t2 = mySortedRanges(seqRangesB2, oB2, 1, alignmentsB, s1)
     return s1 + t1, s2 + t2
 
+def sizesPerText(texts, font, textDraw):
+    sizes = 0, 0
+    for t in texts:
+        if textDraw is not None:
+            sizes = textDraw.textsize(t, font=font)
+        yield t, sizes
+
 def prettyNum(n):
     t = str(n)
     groups = []
@@ -420,21 +427,17 @@ def labelText(seqRange, labelOpt):
         return seqName + ":" + prettyNum(beg) + "-" + prettyNum(end)
     return seqName
 
-def rangeLabels(seqRanges, labelOpt, font, fontsize, image_mode, textRot):
-    if fontsize:
-        image_size = 1, 1
-        im = Image.new(image_mode, image_size)
-        draw = ImageDraw.Draw(im)
+def rangeLabels(seqRanges, labelOpt, font, textDraw, textRot):
     x = y = 0
     for r in seqRanges:
         text = labelText(r, labelOpt)
-        if fontsize:
-            x, y = draw.textsize(text, font=font)
+        if textDraw is not None:
+            x, y = textDraw.textsize(text, font=font)
             if textRot:
                 x, y = y, x
         yield text, x, y, r[3]
 
-def dataFromRanges(sortedRanges, font, fontSize, imageMode, labelOpt, textRot):
+def dataFromRanges(sortedRanges, font, textDraw, labelOpt, textRot):
     for seqName, rangeBeg, rangeEnd, strandNum in sortedRanges:
         out = [seqName, str(rangeBeg), str(rangeEnd)]
         if strandNum > 0:
@@ -442,8 +445,7 @@ def dataFromRanges(sortedRanges, font, fontSize, imageMode, labelOpt, textRot):
         logging.info("\t".join(out))
     logging.info("")
     rangeSizes = [e - b for n, b, e, s in sortedRanges]
-    labs = list(rangeLabels(sortedRanges, labelOpt, font, fontSize,
-                            imageMode, textRot))
+    labs = list(rangeLabels(sortedRanges, labelOpt, font, textDraw, textRot))
     margin = max(i[2] for i in labs)
     # xxx the margin may be too big, because some labels may get omitted
     return rangeSizes, labs, margin
@@ -591,10 +593,16 @@ def readBed(fileName, rangeDict):
     for line in myOpen(fileName):
         w = line.split()
         if not w: continue
+        if not w[1].isdigit():
+            w = w[1:]
         seqName = w[0]
         if seqName not in rangeDict: continue
+        seqRanges = rangeDict[seqName]
         beg = int(w[1])
         end = int(w[2])
+        if all(beg >= i[2] or end <= i[1] for i in seqRanges):
+            continue
+        itemName = w[3] if len(w) > 3 and w[3] != "." else ""
         layer = 900
         color = "#fbf"
         if len(w) > 4:
@@ -605,33 +613,38 @@ def readBed(fileName, rangeDict):
                     color = "rgb(" + w[8] + ")"
                 else:
                     strand = w[5]
-                    isRev = rangeDict[seqName][0][2]
+                    isRev = (seqRanges[0][3] > 1)
                     if strand == "+" and not isRev or strand == "-" and isRev:
                         color = "#ffe8e8"
                     if strand == "-" and not isRev or strand == "+" and isRev:
                         color = "#e8e8ff"
-        yield layer, color, seqName, beg, end
+        yield layer, color, seqName, beg, end, itemName
 
 def commaSeparatedInts(text):
     return map(int, text.rstrip(",").split(","))
 
 def readGenePred(opts, fileName, rangeDict):
     for line in myOpen(fileName):
-        fields = line.split()
+        fields = line.split()  # xxx tab ???
         if not fields: continue
-        if fields[2] not in "+-": fields = fields[1:]
+        geneName = fields[12 if len(fields) > 12 else 0]  # XXX ???
+        if fields[2] not in "+-":
+            fields = fields[1:]
         seqName = fields[1]
         if seqName not in rangeDict: continue
+        seqRanges = rangeDict[seqName]
         #strand = fields[2]
         cdsBeg = int(fields[5])
         cdsEnd = int(fields[6])
         exonBegs = commaSeparatedInts(fields[8])
         exonEnds = commaSeparatedInts(fields[9])
         for beg, end in zip(exonBegs, exonEnds):
-            yield 300, opts.exon_color, seqName, beg, end
+            if all(beg >= i[2] or end <= i[1] for i in seqRanges):
+                continue
+            yield 300, opts.exon_color, seqName, beg, end, geneName
             b = max(beg, cdsBeg)
             e = min(end, cdsEnd)
-            if b < e: yield 400, opts.cds_color, seqName, b, e
+            if b < e: yield 400, opts.cds_color, seqName, b, e, ""
 
 def readRmsk(fileName, rangeDict):
     for line in myOpen(fileName):
@@ -642,6 +655,7 @@ def readRmsk(fileName, rangeDict):
             beg = int(fields[6])
             end = int(fields[7])
             strand = fields[9]
+            repeatName = fields[10]
             repeatClass = fields[11]
         elif len(fields) == 15:  # .out
             seqName = fields[4]
@@ -649,15 +663,19 @@ def readRmsk(fileName, rangeDict):
             beg = int(fields[5]) - 1
             end = int(fields[6])
             strand = fields[8]
+            repeatName = fields[9]
             repeatClass = fields[10].split("/")[0]
         else:
             continue
+        seqRanges = rangeDict[seqName]
+        if all(beg >= i[2] or end <= i[1] for i in seqRanges):
+            continue
         if repeatClass in ("Low_complexity", "Simple_repeat", "Satellite"):
-            yield 200, "#fbf", seqName, beg, end
-        elif (strand == "+") != rangeDict[seqName][0][2]:
-            yield 100, "#ffe8e8", seqName, beg, end
+            yield 200, "#fbf", seqName, beg, end, repeatName
+        elif (strand == "+") != (seqRanges[0][3] > 1):
+            yield 100, "#ffe8e8", seqName, beg, end, repeatName
         else:
-            yield 100, "#e8e8ff", seqName, beg, end
+            yield 100, "#e8e8ff", seqName, beg, end, repeatName
 
 def isExtraFirstGapField(fields):
     return fields[4].isdigit()
@@ -674,12 +692,13 @@ def readGaps(opts, fileName, rangeDict):
         end = int(w[2])
         beg = end - int(w[5])  # zero-based coordinate
         if w[7] == "yes":
-            yield 3000, opts.bridged_color, seqName, beg, end
+            yield 3000, opts.bridged_color, seqName, beg, end, ""
         else:
-            yield 2000, opts.unbridged_color, seqName, beg, end
+            yield 2000, opts.unbridged_color, seqName, beg, end, ""
 
-def bedBoxes(beds, rangeDict, margin, edge, isTop, bpPerPix):
-    for layer, color, seqName, bedBeg, bedEnd in beds:
+def bedBoxes(beds, rangeDict, limit, isTop, bpPerPix, textSizes):
+    cover = [(limit, limit)]
+    for layer, color, seqName, bedBeg, bedEnd, name in reversed(beds):
         for rangeBeg, rangeEnd, isReverseRange, origin in rangeDict[seqName]:
             beg = max(bedBeg, rangeBeg)
             end = min(bedEnd, rangeEnd)
@@ -688,27 +707,34 @@ def bedBoxes(beds, rangeDict, margin, edge, isTop, bpPerPix):
                 beg, end = -end, -beg
             if layer <= 1000:
                 # include partly-covered pixels
-                b = (origin + beg) // bpPerPix
-                e = div_ceil(origin + end, bpPerPix)
+                pixBeg = (origin + beg) // bpPerPix
+                pixEnd = div_ceil(origin + end, bpPerPix)
             else:
                 # exclude partly-covered pixels
-                b = div_ceil(origin + beg, bpPerPix)
-                e = (origin + end) // bpPerPix
-                if e <= b: continue
+                pixBeg = div_ceil(origin + beg, bpPerPix)
+                pixEnd = (origin + end) // bpPerPix
+                if pixEnd <= pixBeg: continue
                 if bedEnd >= rangeEnd:  # include partly-covered end pixels
                     if isReverseRange:
-                        b = (origin + beg) // bpPerPix
+                        pixBeg = (origin + beg) // bpPerPix
                     else:
-                        e = div_ceil(origin + end, bpPerPix)
-            if isTop:
-                box = b, margin, e, edge
+                        pixEnd = div_ceil(origin + end, bpPerPix)
+            textWidth, textHeight = textSizes[name]
+            nameBeg = (pixBeg + pixEnd - textHeight) // 2
+            nameEnd = nameBeg + textHeight
+            if name and all(e <= nameBeg or b >= nameEnd for b, e in cover):
+                cover.append((nameBeg, nameEnd))
             else:
-                box = margin, b, edge, e
-            yield layer, color, box
+                name = ""
+            yield layer, color, isTop, pixBeg, pixEnd, name, nameBeg, textWidth
 
-def drawAnnotations(im, boxes):
+def drawAnnotations(im, boxes, tMargin, bMarginBeg, lMargin, rMarginBeg):
     # xxx use partial transparency for different-color overlaps?
-    for layer, color, box in boxes:
+    for layer, color, isTop, beg, end, name, nameBeg, nameLen in boxes:
+        if isTop:
+            box = beg, tMargin, end, bMarginBeg
+        else:
+            box = lMargin, beg, rMarginBeg, end
         im.paste(color, box)
 
 def placedLabels(labels, rangePixBegs, rangePixLens, beg, end):
@@ -760,6 +786,20 @@ def axisImage(labels, rangePixBegs, rangePixLens, textRot,
         draw.text(position, text, font=font, fill=fill)
     return im
 
+def annoTextImage(opts, image_mode, font, margin, length, boxes, isLeftAlign):
+    image_size = margin, length
+    im = Image.new(image_mode, image_size, opts.margin_color)
+    draw = ImageDraw.Draw(im)
+    for layer, color, isTop, beg, end, name, nameBeg, nameLen in boxes:
+        xPosition = 0 if isLeftAlign else margin - nameLen
+        position = xPosition, nameBeg
+        draw.text(position, name, font=font, fill="black")
+    return im
+
+def rangesPerSeq(sortedRanges):
+    for seqName, group in itertools.groupby(sortedRanges, itemgetter(0)):
+        yield seqName, sorted(group)
+
 def rangesWithOrigins(sortedRanges, rangePixBegs, rangePixLens, bpPerPix):
     for i, j, k in zip(sortedRanges, rangePixBegs, rangePixLens):
         seqName, rangeBeg, rangeEnd, strandNum = i
@@ -770,10 +810,10 @@ def rangesWithOrigins(sortedRanges, rangePixBegs, rangePixLens, bpPerPix):
             origin = bpPerPix * j - rangeBeg
         yield seqName, (rangeBeg, rangeEnd, isReverseRange, origin)
 
-def rangesPerSeq(sortedRanges, rangePixBegs, rangePixLens, bpPerPix):
+def rangesAndOriginsPerSeq(sortedRanges, rangePixBegs, rangePixLens, bpPerPix):
     a = rangesWithOrigins(sortedRanges, rangePixBegs, rangePixLens, bpPerPix)
-    for k, v in itertools.groupby(a, itemgetter(0)):
-        yield k, sorted(i[1] for i in v)
+    for seqName, group in itertools.groupby(a, itemgetter(0)):
+        yield seqName, sorted(i[1] for i in group)
 
 def getFont(opts):
     if opts.fontfile:
@@ -813,6 +853,18 @@ def remainingSequenceRanges(seqRanges, alignments, seqIndex):
     remainingSequences = set(i[seqIndex] for i in alignments)
     return [i for i in seqRanges if i[0] in remainingSequences]
 
+def readAnnotations(opts, font, textDraw, sortedRanges,
+                    bedFile, repFile, geneFile, gapFile):
+    rangeDict = expandedSeqDict(dict(rangesPerSeq(sortedRanges)))
+    annots = sorted(itertools.chain(readBed(bedFile, rangeDict),
+                                    readRmsk(repFile, rangeDict),
+                                    readGenePred(opts, geneFile, rangeDict),
+                                    readGaps(opts, gapFile, rangeDict)))
+    names = set(i[5] for i in annots)
+    textSizes = dict(sizesPerText(names, font, textDraw))
+    margin = max(i[0] for i in textSizes.values()) if textSizes else 0
+    return annots, textSizes, margin
+
 def lastDotplot(opts, args):
     logLevel = logging.INFO if opts.verbose else logging.WARNING
     logging.basicConfig(format="%(filename)s: %(message)s", level=logLevel)
@@ -867,32 +919,46 @@ def lastDotplot(opts, args):
                               cutRanges1, cutRangesB1, cutRanges2, cutRangesB2)
     sortedRanges1, sortedRanges2 = sortOut
 
+    textDraw = None
+    if opts.fontsize:
+        textDraw = ImageDraw.Draw(Image.new(image_mode, (1, 1)))
+
     textRot1 = "vertical".startswith(opts.rot1)
-    i1 = dataFromRanges(sortedRanges1, font,
-                        opts.fontsize, image_mode, opts.labels1, textRot1)
+    i1 = dataFromRanges(sortedRanges1, font, textDraw, opts.labels1, textRot1)
     rangeSizes1, labelData1, tMargin = i1
 
     textRot2 = "horizontal".startswith(opts.rot2)
-    i2 = dataFromRanges(sortedRanges2, font,
-                        opts.fontsize, image_mode, opts.labels2, textRot2)
+    i2 = dataFromRanges(sortedRanges2, font, textDraw, opts.labels2, textRot2)
     rangeSizes2, labelData2, lMargin = i2
 
-    maxPixels1 = opts.width  - lMargin
-    maxPixels2 = opts.height - tMargin
+    logging.info("reading annotations...")
+
+    a1 = readAnnotations(opts, font, textDraw, sortedRanges1,
+                         opts.bed1, opts.rmsk1, opts.genePred1, opts.gap1)
+    annots1, annoTextSizes1, bMargin = a1
+
+    a2 = readAnnotations(opts, font, textDraw, sortedRanges2,
+                         opts.bed2, opts.rmsk2, opts.genePred2, opts.gap2)
+    annots2, annoTextSizes2, rMargin = a2
+
+    maxPixels1 = opts.width  - lMargin - rMargin
+    maxPixels2 = opts.height - tMargin - bMargin
     bpPerPix1 = get_bp_per_pix(rangeSizes1, opts.border_pixels, maxPixels1)
     bpPerPix2 = get_bp_per_pix(rangeSizes2, opts.border_pixels, maxPixels2)
     bpPerPix = max(bpPerPix1, bpPerPix2)
     logging.info("bp per pixel = " + str(bpPerPix))
 
     p1 = pixelData(rangeSizes1, bpPerPix, opts.border_pixels, lMargin)
-    rangePixBegs1, rangePixLens1, width = p1
-    rangeDict1 = dict(rangesPerSeq(sortedRanges1, rangePixBegs1,
-                                   rangePixLens1, bpPerPix))
+    rangePixBegs1, rangePixLens1, rMarginBeg = p1
+    width = rMarginBeg + rMargin
+    rangeDict1 = dict(rangesAndOriginsPerSeq(sortedRanges1, rangePixBegs1,
+                                             rangePixLens1, bpPerPix))
 
     p2 = pixelData(rangeSizes2, bpPerPix, opts.border_pixels, tMargin)
-    rangePixBegs2, rangePixLens2, height = p2
-    rangeDict2 = dict(rangesPerSeq(sortedRanges2, rangePixBegs2,
-                                   rangePixLens2, bpPerPix))
+    rangePixBegs2, rangePixLens2, bMarginBeg = p2
+    height = bMarginBeg + bMargin
+    rangeDict2 = dict(rangesAndOriginsPerSeq(sortedRanges2, rangePixBegs2,
+                                             rangePixLens2, bpPerPix))
 
     logging.info("width:  " + str(width))
     logging.info("height: " + str(height))
@@ -902,30 +968,21 @@ def lastDotplot(opts, args):
     hits = alignmentPixels(width, height, allAlignments, bpPerPix,
                            rangeDict1, rangeDict2)
 
-    logging.info("reading annotations...")
-
     rangeDict1 = expandedSeqDict(rangeDict1)
-    beds1 = itertools.chain(readBed(opts.bed1, rangeDict1),
-                            readRmsk(opts.rmsk1, rangeDict1),
-                            readGenePred(opts, opts.genePred1, rangeDict1),
-                            readGaps(opts, opts.gap1, rangeDict1))
-    b1 = bedBoxes(beds1, rangeDict1, tMargin, height, True, bpPerPix)
-
     rangeDict2 = expandedSeqDict(rangeDict2)
-    beds2 = itertools.chain(readBed(opts.bed2, rangeDict2),
-                            readRmsk(opts.rmsk2, rangeDict2),
-                            readGenePred(opts, opts.genePred2, rangeDict2),
-                            readGaps(opts, opts.gap2, rangeDict2))
-    b2 = bedBoxes(beds2, rangeDict2, lMargin, width, False, bpPerPix)
 
-    boxes = sorted(itertools.chain(b1, b2))
+    boxes1 = list(bedBoxes(annots1, rangeDict1, rMarginBeg, True, bpPerPix,
+                           annoTextSizes1))
+    boxes2 = list(bedBoxes(annots2, rangeDict2, bMarginBeg, False, bpPerPix,
+                           annoTextSizes2))
+    boxes = sorted(itertools.chain(boxes1, boxes2))
 
     logging.info("drawing...")
 
     image_size = width, height
     im = Image.new(image_mode, image_size, opts.background_color)
 
-    drawAnnotations(im, boxes)
+    drawAnnotations(im, boxes, tMargin, bMarginBeg, lMargin, rMarginBeg)
 
     joinA, joinB = twoValuesFromOption(opts.join, ":")
     if joinA in "13":
@@ -957,12 +1014,20 @@ def lastDotplot(opts, args):
         im.paste(axis1, (0, 0))
         im.paste(axis2, (0, 0))
 
+        annoImage1 = annoTextImage(opts, image_mode, font, bMargin, width,
+                                   boxes1, False)
+        annoImage1 = annoImage1.transpose(Image.ROTATE_90)
+        annoImage2 = annoTextImage(opts, image_mode, font, rMargin, height,
+                                   boxes2, True)
+        im.paste(annoImage1, (0, bMarginBeg))
+        im.paste(annoImage2, (rMarginBeg, 0))
+
     for i in rangePixBegs1[1:]:
-        box = i - opts.border_pixels, tMargin, i, height
+        box = i - opts.border_pixels, tMargin, i, bMarginBeg
         im.paste(opts.border_color, box)
 
     for i in rangePixBegs2[1:]:
-        box = lMargin, i - opts.border_pixels, width, i
+        box = lMargin, i - opts.border_pixels, rMarginBeg, i
         im.paste(opts.border_color, box)
 
     im.save(args[1])


=====================================
scripts/maf-convert
=====================================
@@ -12,6 +12,7 @@ from itertools import *
 import collections
 import functools
 import gzip
+import logging
 import math
 import operator
 import optparse
@@ -479,9 +480,15 @@ mapqMaximum = "254"
 mapqMaximumNum = float(mapqMaximum)
 
 def mapqFromProb(probString):
-    try: p = float(probString)
-    except ValueError: raise Exception("bad probability: " + probString)
-    if p < 0 or p > 1: raise Exception("bad probability: " + probString)
+    try:
+        p = float(probString)
+    except ValueError:
+        raise ValueError("bad probability: " + probString)
+    if math.isnan(p):
+        logging.warning("bad probability: " + probString)
+        return mapqMissing
+    if p < 0 or p > 1:
+        raise ValueError("bad probability: " + probString)
     if p == 0: return mapqMaximum
     phred = -10 * math.log(p, 10)
     if phred >= mapqMaximumNum: return mapqMaximum
@@ -936,6 +943,8 @@ def mafConvertOneFile(opts, formatName, lines):
         raise Exception("unknown format: " + formatName)
 
 def mafConvert(opts, args):
+    logging.basicConfig(format="%(filename)s: %(message)s")
+
     formatName = args[0].lower()
     fileNames = args[1:]
 


=====================================
src/split/cbrc_split_aligner.cc
=====================================
@@ -13,6 +13,8 @@
 #include <sstream>
 #include <stdexcept>
 
+#include <float.h>
+
 static void err(const std::string& s) {
   throw std::runtime_error(s);
 }
@@ -696,6 +698,11 @@ void SplitAligner::backwardSplice() {
 	    if (alns[i].qend == j) p += endprob;
 	    p = p * Sexp[ij*2-1] * rescale;
 
+	    // XXX p can overflow to inf.  This can happen if there is
+	    // a large unaligned part in the middle of the query
+	    // sequence.  Then, in forwardSplice, Fmat may underflow
+	    // to 0, so the subsequent rescales are all 1.
+
 	    Bmat[ij - 1] = p;
 	    //if (alns[i].qstart == j-1) zB += p;
 	    pSum += p * spliceEndProb(ij - 1);
@@ -713,7 +720,9 @@ SplitAligner::marginalProbs(unsigned queryBeg, unsigned alnNum,
   unsigned j = queryBeg;
   for (unsigned pos = alnBeg; pos < alnEnd; ++pos) {
     size_t ij = matrixRowOrigins[i] + j;
-    if (alns[i].qalign[pos] == '-') {
+    if (Bmat[ij] > DBL_MAX) {  // can happen for spliced alignment
+      output.push_back(0);
+    } else if (alns[i].qalign[pos] == '-') {
       double value = Fmat[ij] * Bmat[ij] * Sexp[ij*2] * cell(rescales, j);
       output.push_back(value);
     } else {
@@ -1113,6 +1122,7 @@ void SplitAligner::flipSpliceSignals() {
 }
 
 double SplitAligner::spliceSignalStrandLogOdds() const {
+  // XXX if Bmat overflowed to inf, then I think this is unreliable
   assert(rescales.size() == rescalesRev.size());
   double logOdds = 0;
   for (unsigned j = 0; j < rescales.size(); ++j) {


=====================================
src/version.hh
=====================================
@@ -1 +1 @@
-"1066"
+"1080"



View it on GitLab: https://salsa.debian.org/med-team/last-align/-/compare/b412e2ccdedbc7b1f1b5cd6ea79fbb11b76da3b9...3004da21edf89f66332d7591e9ba5f047431185f

-- 
View it on GitLab: https://salsa.debian.org/med-team/last-align/-/compare/b412e2ccdedbc7b1f1b5cd6ea79fbb11b76da3b9...3004da21edf89f66332d7591e9ba5f047431185f
You're receiving this email because of your account on salsa.debian.org.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://alioth-lists.debian.net/pipermail/debian-med-commit/attachments/20200812/dcd4f3a2/attachment-0001.html>


More information about the debian-med-commit mailing list