[med-svn] [Git][med-team/mapdamage][upstream] New upstream version 2.2.0+dfsg
Andreas Tille
gitlab at salsa.debian.org
Fri Dec 13 13:06:28 GMT 2019
Andreas Tille pushed to branch upstream at Debian Med / mapdamage
Commits:
2d75da74 by Andreas Tille at 2019-12-13T10:54:26Z
New upstream version 2.2.0+dfsg
- - - - -
7 changed files:
- README.md
- mapdamage/mp_test.py
- mapdamage/rescale.py
- + mapdamage/tests/probs/Stats_out_MCMC_correct_prob.csv
- + mapdamage/tests/test.rescaled.correct.sam
- mapdamage/version.py
- setup.py
Changes:
=====================================
README.md
=====================================
@@ -1,19 +1,35 @@
## mapDamage
-[![install with bioconda](https://img.shields.io/badge/install%20with-bioconda-brightgreen.svg?style=flat)](http://bioconda.github.io/recipes/mapdamage2/README.html) [![Conda](https://img.shields.io/conda/dn/bioconda/mapdamage2.svg)](https://anaconda.org/bioconda/mapdamage2/files)
+[![install with bioconda](https://img.shields.io/badge/install%20with-bioconda-brightgreen.svg?style=flat)](http://bioconda.github.io/recipes/mapdamage2/README.html) [![Conda](https://img.shields.io/conda/dn/bioconda/mapdamage2.svg)](https://anaconda.org/bioconda/mapdamage2/files) ![Conda](https://anaconda.org/bioconda/mapdamage2/badges/latest_release_date.svg) ![Conda](https://anaconda.org/bioconda/mapdamage2/badges/version.svg) [![Project Status: Inactive – The project has reached a stable, usable state but is no longer being actively developed; support/maintenance will be provided as time allows.](https://www.repostatus.org/badges/latest/inactive.svg)](https://www.repostatus.org/#inactive)
+
+
+#### `bioconda` installation
+
+- python3 version **2.2.0**
+```
+conda install -c bioconda mapdamage2=2.2.0
+```
+
+- python3 version **2.2.0** **with** R and 4 mandatory packages for the Bayesian inference:
+```
+conda install -c bioconda mapdamage2=2.2.0=pyr36_1
+```
---
### Important
-Users with versions dating prior to June 12 2013 please update. A nasty bug that caused the statistical part of `mapDamage` to use half of the data for estimation of the damage parameters, sorry for the inconvenience.
+
+- From version `2.2.0` the `master` branch is requiring **python3** as `python2` is not supported from 2020-01-01.
+
+- Users with versions dating prior to June 12 2013 please update. A nasty bug that caused the statistical part of `mapDamage` to use half of the data for estimation of the damage parameters, sorry for the inconvenience.
### Introduction
Complete documentation, instructions, examples, screenshots and FAQ are available at [this address](http://ginolhac.github.io/mapDamage/).
-[mapDamage2](http://geogenetics.ku.dk/publications/mapdamage2.0/) is a computational framework written in **Python3** and **R**, which tracks and quantifies DNA damage patterns
+[mapDamage2](https://geogenetics.ku.dk/publications/mapdamage2.0/) is a computational framework written in **Python3** and **R**, which tracks and quantifies DNA damage patterns
among ancient DNA sequencing reads generated by Next-Generation Sequencing platforms.
-`mapDamage` was developed at the [Centre for GeoGenetics](http://geogenetics.ku.dk/) by the [Orlando Group ](http://geogenetics.ku.dk/research/research_groups/palaeomix_group/).
+`mapDamage` was developed at the [Centre for GeoGenetics](https://geogenetics.ku.dk/) by the [Orlando Group ](https://geogenetics.ku.dk/research_groups/palaeomix_group/).
### Citation
@@ -29,15 +45,30 @@ Ginolhac A, Rasmussen M, Gilbert MT, Willerslev E, Orlando L.
http://bioinformatics.oxfordjournals.org/content/27/15/2153](http://bioinformatics.oxfordjournals.org/content/27/15/2153)
-### Test
+### Test the no-stats part and rescaling
-in the package, you can test `mapDamage` by running:
+you can test `mapDamage` by running:
```
cd mapDamage/mapdamage/
python3 mp_test.py
```
+should return
+```
+Started with the command: /usr/local/bin/mapDamage -i tests/test.bam -r tests/fake1.fasta -d tests/results --no-stats
+ Reading from 'tests/test.bam'
+ Writing results to 'tests/results/'
+pdf tests/results/Fragmisincorporation_plot.pdf generated
+additional tests/results/Length_plot.pdf generated
+Successful run
+.
+----------------------------------------------------------------------
+Ran 2 tests in 3.357s
+
+OK
+```
+
### Contact
Please report bugs and suggest possible improvements to Aurélien Ginolhac, Mikkel Schubert or Hákon Jónsson by email:
=====================================
mapdamage/mp_test.py
=====================================
@@ -1,12 +1,28 @@
import unittest
import subprocess
-
+import pysam
+import mapdamage
+import optparse
+import filecmp
def read_nocomment(dnacomp):
with open(dnacomp, 'r') as f:
a = f.readlines()
return [x for x in a if not x.startswith('#')]
+def mock_options(filename,rescale_out,folder):
+ """Make the options object with nice values for testing"""
+ return optparse.Values({
+ "filename":filename,
+ "rescale_out":rescale_out,
+ "verbose":True,
+ "folder":folder,
+ "rescale_length_5p":12, # default values as in --seq-length
+ "rescale_length_3p":12, # default values as in --seq-length
+ "quiet":True
+ })
+
+
class testCases(unittest.TestCase):
def test_no_stats(self):
@@ -14,5 +30,31 @@ class testCases(unittest.TestCase):
subprocess.run(["mapDamage", "-i", "tests/test.bam", "-r", "tests/fake1.fasta", "-d", "tests/results", "--no-stats"], check = True)
self.assertTrue(read_nocomment("tests/dnacomp.txt") == read_nocomment("tests/results/dnacomp.txt"))
+class testRescaling(unittest.TestCase):
+ def test_single_end_file(self):
+ """Test, rescaling BAM file"""
+ #
+ # The expected substition frequencies before and after scaling using the scaled qualities as probalities:
+ # CT 0.06226411977920493 0.04163524443356556
+ # TC 0.020395286584806528 0.020395286584806528
+ # GA 0.04400459948304954 0.03794905109091021
+ # AG 0.05355350777642983 0.05355350777642983
+ # Quality metrics before and after scaling
+ # CT-Q0 5 5
+ # CT-Q10 5 2
+ # CT-Q20 3 2
+ # CT-Q30 3 2
+ # CT-Q40 0 0
+ # GA-Q0 5 5
+ # GA-Q10 5 4
+ # GA-Q20 1 0
+ # GA-Q30 1 0
+ # GA-Q40 0 0
+ options = mock_options("tests/test.bam","tests/test.rescaled.sam","tests/probs/")
+ ref = pysam.Fastafile("tests/fake1.fasta")
+ mapdamage.rescale.rescale_qual(ref,options,debug=True)
+ self.assertTrue(filecmp.cmp("tests/test.rescaled.sam","tests/test.rescaled.correct.sam"))
+
+
if __name__=='__main__':
unittest.main()
=====================================
mapdamage/rescale.py
=====================================
@@ -19,8 +19,8 @@ def phred_char_to_pval(ch):
def get_corr_prob(folder, rescale_length_5p, rescale_length_3p):
"""
- Reads the damage probability correction file, returns a
- dictionary with this structure
+ Reads the damage probability correction file, returns a
+ dictionary with this structure
position (one based) - CT - probability
- GA - probability
"""
@@ -28,21 +28,21 @@ def get_corr_prob(folder, rescale_length_5p, rescale_length_3p):
if not os.path.isfile(full_path):
sys.exit("Missing file, the file \n\tStats_out_MCMC_correct_prob.csv\nshould be in the folder\n\t"+folder+"\nDid you run the MCMC estimates of the parameters?")
try:
- fi_handle = csv.DictReader(open(full_path))
- corr_prob = {}
- for line in fi_handle:
- if (line["Position"] in corr_prob):
- sys.exit('This file has multiple position definitions %s, line %d: %s' % \
- (folder, fi_handle.line_num, corr_prob[line["Position"]]))
- else:
- corr_prob[int(line["Position"])] = {'C.T':float(line["C.T"]), 'G.A':float(line["G.A"])}
-
- # Exclude probabilities for positions outside of user-specified region
- for key in list(corr_prob.keys()):
- if key < -rescale_length_3p or key > rescale_length_5p:
- corr_prob.pop(key)
+ with open(full_path) as fi:
+ fi_handle = csv.DictReader(fi)
+ corr_prob = {}
+ for line in fi_handle:
+ if (line["Position"] in corr_prob):
+ sys.exit('This file has multiple position definitions %s, line %d: %s' % \
+ (folder, fi_handle.line_num, corr_prob[line["Position"]]))
+ else:
+ corr_prob[int(line["Position"])] = {'C.T':float(line["C.T"]), 'G.A':float(line["G.A"])}
- return corr_prob
+ # Exclude probabilities for positions outside of user-specified region
+ for key in list(corr_prob.keys()):
+ if key < -rescale_length_3p or key > rescale_length_5p:
+ corr_prob.pop(key)
+ return corr_prob
except csv.Error as e:
sys.exit('File %s, line %d: %s' % (os.path.join(folder,"Stats_out_MCMC_correct_prob.csv"), \
fi_handle.line_num, e))
@@ -50,11 +50,11 @@ def get_corr_prob(folder, rescale_length_5p, rescale_length_3p):
def corr_this_base(corr_prob, nt_seq, nt_ref, pos, length,direction="both"):
"""
- The position specific damaging correction, using the input
- corr_prob dictionary holding the damage correcting values
- nt_seq nucleotide in the sequence
+ The position specific damaging correction, using the input
+ corr_prob dictionary holding the damage correcting values
+ nt_seq nucleotide in the sequence
nt_ref nucleotide in the reference
- pos relative position from the 5' end
+ pos relative position from the 5' end
length length of the sequence
direction which end to consider the rescaling
returns the correction probability for this particular set
@@ -71,8 +71,8 @@ def corr_this_base(corr_prob, nt_seq, nt_ref, pos, length,direction="both"):
else:
# other transitions/transversions are not affected by damage
return 0
-
- back_pos = pos-length-1
+
+ back_pos = pos-length-1
# position from 3' end
if pos in corr_prob:
@@ -90,7 +90,7 @@ def corr_this_base(corr_prob, nt_seq, nt_ref, pos, length,direction="both"):
if direction == "forward":
return p5_corr
elif direction == "backward":
- return p3_corr
+ return p3_corr
elif direction == "both":
if pos < abs(back_pos) :
# then we use the forward correction
@@ -151,7 +151,7 @@ def record_subs(subs,nt_seq,nt_ref,nt_qual,nt_newqual,prob_corr):
else:
sub_type = "NN"
if (sub_type != "NN"):
- # record only transitions
+ # record only transitions
subs[sub_type+"-before"][int(ord(nt_qual))-33] += 1
subs[sub_type+"-after"][int(ord(nt_newqual))-33] += 1
if (nt_ref in ["A","C","G","T"]):
@@ -175,7 +175,7 @@ def print_subs(subs):
if subs["C"]!=0:
# the special case of no substitutions
print(("\tCT\t"+str(subs["CT-pvals_before"]/subs["C"])+"\t\t"+str(subs["CT-pvals"]/subs["C"])))
- else:
+ else:
print("\tCT\tNA\t\tNA")
if subs["T"]!=0:
print(("\tTC\t"+str(subs["TC-pvals"]/subs["T"])+"\t\t"+str(subs["TC-pvals"]/subs["T"])))
@@ -200,19 +200,19 @@ def print_subs(subs):
print(("\tGA-Q20 \t"+str(subs["GA-before-Q20"])+"\t\t"+str(subs["GA-after-Q20"])))
print(("\tGA-Q30 \t"+str(subs["GA-before-Q30"])+"\t\t"+str(subs["GA-after-Q30"])))
print(("\tGA-Q40 \t"+str(subs["GA-before-Q40"])+"\t\t"+str(subs["GA-after-Q40"])))
-
+
def rescale_qual_read(bam, read, ref, corr_prob,subs, debug = False,direction="both"):
"""
bam a pysam bam object
read a pysam read object
ref a pysam fasta ref file
- reflengths a dictionary holding the length of the references
- subs a dictionary holding the corrected number of substition before and after scaling
+ reflengths a dictionary holding the length of the references
+ subs a dictionary holding the corrected number of substition before and after scaling
corr_prob dictionary from get_corr_prob
returns a read with rescaled quality score
-
- Iterates through the read and reference, rescales the quality
+
+ Iterates through the read and reference, rescales the quality
according to corr_prob
"""
if not debug:
@@ -238,7 +238,7 @@ def rescale_qual_read(bam, read, ref, corr_prob,subs, debug = False,direction="b
pos_on_read = 0
number_of_rescaled_bases = 0.0
for (i, nt_seq, nt_ref, nt_qual) in zip(range(length_align), seq, refseq, qual):
- # rescale the quality according to the triplet position,
+ # rescale the quality according to the triplet position,
# pair of the reference and the sequence
if ((nt_seq == "T" and nt_ref =="C") or (nt_seq == "A" and nt_ref =="G")):
# need to rescale this subs.
@@ -250,13 +250,13 @@ def rescale_qual_read(bam, read, ref, corr_prob,subs, debug = False,direction="b
else:
# don't rescale, other bases
newp = 1 - phred_char_to_pval(nt_qual)
- newq = nt_qual
+ newq = nt_qual
if pos_on_read < length_read:
- new_qual[pos_on_read] = newq
+ new_qual[pos_on_read] = newq
record_subs(subs,nt_seq,nt_ref,nt_qual,new_qual[pos_on_read],newp)
if nt_seq != "-":
pos_on_read += 1
- # done with the aligned portion of the read
+ # done with the aligned portion of the read
else:
if not debug:
logger.warning("Warning: The aligment of the read is longer than the actual read %s",(read.qname))
@@ -266,7 +266,7 @@ def rescale_qual_read(bam, read, ref, corr_prob,subs, debug = False,direction="b
if read.is_reverse:
new_qual = new_qual[::-1]
if (read.cigar[0][0] == 4):
- # check for soft clipping at forward end
+ # check for soft clipping at forward end
new_qual = read.qual[0:read.cigar[0][1]] + new_qual
if (read.cigar[-1][0] == 4):
# the same backwards
@@ -285,7 +285,7 @@ def rescale_qual_read(bam, read, ref, corr_prob,subs, debug = False,direction="b
def rescale_qual(ref, options,debug=False):
- """
+ """
ref a pysam fasta ref file
bam_filename name of a BAM/SAM file to read
fi file containing the csv with correction probabilities
@@ -319,20 +319,20 @@ def rescale_qual(ref, options,debug=False):
pass
elif not hit.qual and not debug:
logger.warning("Cannot rescale base PHRED scores for read '%s'; no scores assigned." % hit.qname)
- elif hit.is_paired :
+ elif hit.is_paired :
if first_pair and not debug:
- # assuming the ends are non-overlapping
+ # assuming the ends are non-overlapping
logger.warning("Warning! Assuming the pairs are non-overlapping, facing inwards and correctly paired.")
first_pair=False
#5p --------------> 3p
#3p <-------------- 5p
# pair 1 (inwards)
- #5p ---->
+ #5p ---->
# <---- 5p
# A B
- # pair 2 (outwards), this happens if the reference is RC this is not supported
+ # pair 2 (outwards), this happens if the reference is RC this is not supported
# ----> 3p
- #3p <----
+ #3p <----
# A B
# Correct outwards pairs from the 3p and inwards pairs with the 5p end
if ((not hit.is_reverse) and hit.mate_is_reverse and (hit.pnext>hit.pos) and hit.tid==hit.mrnm):
=====================================
mapdamage/tests/probs/Stats_out_MCMC_correct_prob.csv
=====================================
@@ -0,0 +1,25 @@
+"","Position","C.T","G.A"
+"1",1,0.653616777018436,0
+"2",2,0.574821404505089,0
+"3",3,0.524660717195785,0
+"4",4,0.491676371805506,0
+"5",5,0.46919312975022,0
+"6",6,0.453361082263795,0
+"7",7,0.441882878430246,0
+"8",8,0.433345603590333,0
+"9",9,0.426853067214069,0
+"10",10,0.4218187429273,0
+"11",11,0.417847465539491,0
+"12",12,0.414666179579865,0
+"13",-12,0,0.427459648772127
+"14",-11,0,0.430678304009036
+"15",-10,0,0.434694598982698
+"16",-9,0,0.439783285536968
+"17",-8,0,0.446341431282045
+"18",-7,0,0.454957458647015
+"19",-6,0,0.466528745659899
+"20",-5,0,0.482466321045583
+"21",-4,0,0.505055143009322
+"22",-3,0,0.538101706832429
+"23",-2,0,0.588153916474448
+"24",-1,0,0.666277684522864
=====================================
mapdamage/tests/test.rescaled.correct.sam
=====================================
@@ -0,0 +1,11 @@
+ at SQ SN:fake1 LN:201
+ at PG ID:bwa PN:bwa VN:0.7.17-r1188 CL:bwa samse -f input.sam fake1.fasta - input.fastq
+fake1:50-100 0 fake1 69 37 31M * 0 0 CCATGTCGGGCAGGCTGGTCTCGAACTCCTG ////////E6/EE/E/A////E//AAAE/EE XT:A:U NM:i:0 X0:i:1 X1:i:0 XM:i:0 XO:i:0 XG:i:0 MD:Z:31 MR:f:0
+fake1:50-100b 0 fake1 69 37 31M * 0 0 CTATGTCGGGCAGGCTGGTCTCGAACTCCTG /#//////E6/EE/E/A////E//AAAE/EE XT:A:U NM:i:1 X0:i:1 X1:i:0 XM:i:1 XO:i:0 XG:i:0 MD:Z:1C29 MR:f:0.57482
+fake1:50-100c 4 * 0 0 * * 0 0 CGGTAGAGATGGAGTTTCACCATGTCGGGCAAG //////////////A////////////E6/EE/
+fake1:50-100d 4 * 0 0 * * 0 0 TAATAGAGATGGAGTTTCACCATGTCGTGCAAG //////////////A////////////E6/EE/
+fake1:70-120 0 fake1 70 37 38M * 0 0 TATGTCGGGCAGGCTGGTCTCGAACTCCTGACCTCAGA #////6EE//AA6E//E66E/EAEEEEEAEEEAAEEA# XT:A:U NM:i:2 X0:i:1 X1:i:0 XM:i:2 XO:i:0 XG:i:0 MD:Z:0C36G0 MR:f:1.31989
+fake1:80-130 16 fake1 80 25 41M * 0 0 GAGCTGGTCTCGAACTCCTGACCTCAGGCGATCTGCCTGTC AAABBCCDDFEEEEEAEEEEA/EE/E/AAE//A//E/A/// XT:A:U NM:i:3 X0:i:1 X1:i:0 XM:i:3 XO:i:0 XG:i:0 MD:Z:0A0G37C1 MR:f:0
+fake1:80-130b 16 fake1 80 25 51M * 0 0 AGGCCGGTCTCGAACTCCTGACCTCAGGCGATCTGCCTGCCTTAACCTCCC AAABBCCDDFEEEEEAEEEEA/EE/E/AAE//A//E/A///E/A$/EE66/ XT:A:U NM:i:3 X0:i:1 X1:i:0 XM:i:3 XO:i:0 XG:i:0 MD:Z:4T37C1G6 MR:f:0.44188
+fake1:80-130c 16 fake1 80 37 51M * 0 0 AGGCTGGTCTCGAACTCCTGACCTCAGGCGATCTGCCTGCCTTAGCCCCCC AAABBCCDDFEEEEEAEEEEA/EE/E/AAE//A//E/A///E/A//EE66/ XT:A:U NM:i:2 X0:i:1 X1:i:0 XM:i:2 XO:i:0 XG:i:0 MD:Z:42C4T3 MR:f:0
+fake1:80-130d 16 fake1 80 25 51M * 0 0 AGGCTGGTCTCGAACTCCTGACCTCAGACGATCTGCCTGCCTTAGCCCCCC AAABBCCDDFEEEEEAEEEEA/EE/E/AAE//A//E/A///E/A//EE66/ XT:A:U NM:i:3 X0:i:1 X1:i:0 XM:i:3 XO:i:0 XG:i:0 MD:Z:27G14C4T3 MR:f:0
=====================================
mapdamage/version.py
=====================================
@@ -2,4 +2,4 @@
try:
from mapdamage._version import __version__
except ImportError:
- __version__ = "2.1.0"
+ __version__ = "2.2.0"
=====================================
setup.py
=====================================
@@ -4,8 +4,13 @@
from distutils.core import setup
from distutils.command.install import install as DistutilsInstall
import os
+import sys
import subprocess
+if sys.version_info < (3, 5):
+ print("At least Python 3.5 is required.\n", file=sys.stderr)
+ exit(1)
+
def compile_seqtk():
"""Compiling the seqtk toolkit"""
old_wd = os.getcwd()
@@ -55,11 +60,11 @@ class compileInstall(DistutilsInstall):
setup(
cmdclass={'install': compileInstall},
name='mapdamage',
- version='2.1.0',
- author='Aurélien Ginolhac, Mikkel Schubert, Hákon Jónsson',
+ version='2.2.0',
+ author='Aurélien Ginolhac, Mikkel Schubert, Ãkon Jónsson',
author_email='MSchubert at snm.ku.dk, jonsson.hakon at gmail.com',
packages=['mapdamage'],
- package_data={'mapdamage': ['Rscripts/*.R','Rscripts/stats/*.R','seqtk/seqtk']},
+ package_data={'mapdamage': ['Rscripts/*.R','Rscripts/stats/*.R','tests/*','seqtk/seqtk']},
scripts=['bin/mapDamage'],
url='https://github.com/ginolhac/mapDamage',
license='LICENSE.txt',
View it on GitLab: https://salsa.debian.org/med-team/mapdamage/commit/2d75da747584b1166c8a9200b964ea44c346a9ed
--
View it on GitLab: https://salsa.debian.org/med-team/mapdamage/commit/2d75da747584b1166c8a9200b964ea44c346a9ed
You're receiving this email because of your account on salsa.debian.org.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://alioth-lists.debian.net/pipermail/debian-med-commit/attachments/20191213/2112964d/attachment-0001.html>
More information about the debian-med-commit
mailing list