[med-svn] [Git][med-team/mapdamage][upstream] New upstream version 2.2.0+dfsg

Fri Dec 13 13:06:28 GMT 2019


Andreas Tille pushed to branch upstream at Debian Med / mapdamage


Commits:
2d75da74 by Andreas Tille at 2019-12-13T10:54:26Z
New upstream version 2.2.0+dfsg
- - - - -


7 changed files:

- README.md
- mapdamage/mp_test.py
- mapdamage/rescale.py
- + mapdamage/tests/probs/Stats_out_MCMC_correct_prob.csv
- + mapdamage/tests/test.rescaled.correct.sam
- mapdamage/version.py
- setup.py


Changes:

=====================================
README.md
=====================================
@@ -1,19 +1,35 @@
 ## mapDamage
 
-[![install with bioconda](https://img.shields.io/badge/install%20with-bioconda-brightgreen.svg?style=flat)](http://bioconda.github.io/recipes/mapdamage2/README.html) [![Conda](https://img.shields.io/conda/dn/bioconda/mapdamage2.svg)](https://anaconda.org/bioconda/mapdamage2/files)
+[![install with bioconda](https://img.shields.io/badge/install%20with-bioconda-brightgreen.svg?style=flat)](http://bioconda.github.io/recipes/mapdamage2/README.html) [![Conda](https://img.shields.io/conda/dn/bioconda/mapdamage2.svg)](https://anaconda.org/bioconda/mapdamage2/files) ![Conda](https://anaconda.org/bioconda/mapdamage2/badges/latest_release_date.svg) ![Conda](https://anaconda.org/bioconda/mapdamage2/badges/version.svg) [![Project Status: Inactive – The project has reached a stable, usable state but is no longer being actively developed; support/maintenance will be provided as time allows.](https://www.repostatus.org/badges/latest/inactive.svg)](https://www.repostatus.org/#inactive)
+
+
+#### `bioconda` installation
+
+- python3 version **2.2.0**
+```
+conda install -c bioconda mapdamage2=2.2.0
+```
+
+- python3 version **2.2.0** **with** R and 4 mandatory packages for the Bayesian inference:
+```
+conda install -c bioconda mapdamage2=2.2.0=pyr36_1
+```
 
 ---
 
 ### Important
-Users with versions dating prior to June 12 2013 please update. A nasty bug that caused the statistical part of `mapDamage` to use half of the data for estimation of the damage parameters, sorry for the inconvenience.
+
+- From version `2.2.0` the `master` branch is requiring **python3** as `python2` is not supported from 2020-01-01.
+
+- Users with versions dating prior to June 12 2013 please update. A nasty bug that caused the statistical part of `mapDamage` to use half of the data for estimation of the damage parameters, sorry for the inconvenience.
 
 ### Introduction
 Complete documentation, instructions, examples, screenshots and FAQ are available at [this address](http://ginolhac.github.io/mapDamage/).
 
-[mapDamage2](http://geogenetics.ku.dk/publications/mapdamage2.0/) is a computational framework written in **Python3** and **R**, which tracks and quantifies DNA damage patterns
+[mapDamage2](https://geogenetics.ku.dk/publications/mapdamage2.0/) is a computational framework written in **Python3** and **R**, which tracks and quantifies DNA damage patterns
 among ancient DNA sequencing reads generated by Next-Generation Sequencing platforms.
 
-`mapDamage` was developed at the [Centre for GeoGenetics](http://geogenetics.ku.dk/) by the [Orlando Group ](http://geogenetics.ku.dk/research/research_groups/palaeomix_group/).
+`mapDamage` was developed at the [Centre for GeoGenetics](https://geogenetics.ku.dk/) by the [Orlando Group ](https://geogenetics.ku.dk/research_groups/palaeomix_group/).
 
 
 ### Citation
@@ -29,15 +45,30 @@ Ginolhac A, Rasmussen M, Gilbert MT, Willerslev E, Orlando L.
 http://bioinformatics.oxfordjournals.org/content/27/15/2153](http://bioinformatics.oxfordjournals.org/content/27/15/2153)
 
 
-### Test
+### Test the no-stats part and rescaling
 
-in the package, you can test `mapDamage` by running:
+you can test `mapDamage` by running:
 
 ```
 cd mapDamage/mapdamage/
 python3 mp_test.py
 ```
 
+should return
+```
+Started with the command: /usr/local/bin/mapDamage -i tests/test.bam -r tests/fake1.fasta -d tests/results --no-stats
+	Reading from 'tests/test.bam'
+	Writing results to 'tests/results/'
+pdf tests/results/Fragmisincorporation_plot.pdf generated
+additional tests/results/Length_plot.pdf generated
+Successful run
+.
+----------------------------------------------------------------------
+Ran 2 tests in 3.357s
+
+OK
+```
+
 
 ### Contact
 Please report bugs and suggest possible improvements to Aurélien Ginolhac, Mikkel Schubert or Hákon Jónsson by email:


=====================================
mapdamage/mp_test.py
=====================================
@@ -1,12 +1,28 @@
 import unittest
 import subprocess
-
+import pysam
+import mapdamage
+import optparse
+import filecmp
 
 def read_nocomment(dnacomp):
     with open(dnacomp, 'r') as f:
         a = f.readlines()
         return [x for x in a if not x.startswith('#')]
 
+def mock_options(filename,rescale_out,folder):
+    """Make the options object with nice values for testing"""
+    return optparse.Values({
+        "filename":filename,
+        "rescale_out":rescale_out,
+        "verbose":True,
+        "folder":folder,
+        "rescale_length_5p":12, # default values as in --seq-length
+        "rescale_length_3p":12, # default values as in --seq-length
+        "quiet":True
+        })
+
+
 class testCases(unittest.TestCase):
 
     def test_no_stats(self):
@@ -14,5 +30,31 @@ class testCases(unittest.TestCase):
         subprocess.run(["mapDamage",  "-i",  "tests/test.bam", "-r", "tests/fake1.fasta", "-d", "tests/results", "--no-stats"], check = True)
         self.assertTrue(read_nocomment("tests/dnacomp.txt") == read_nocomment("tests/results/dnacomp.txt"))
 
+class testRescaling(unittest.TestCase):
+    def test_single_end_file(self):
+        """Test, rescaling BAM file"""
+        #
+	    # The expected substition frequencies before and after scaling using the scaled qualities as probalities:
+	    # CT	0.06226411977920493		0.04163524443356556
+	    # TC	0.020395286584806528		0.020395286584806528
+	    # GA	0.04400459948304954		0.03794905109091021
+	    # AG	0.05355350777642983		0.05355350777642983
+	    # Quality metrics before and after scaling
+	    # CT-Q0 	5		5
+	    # CT-Q10 	5		2
+	    # CT-Q20 	3		2
+	    # CT-Q30 	3		2
+	    # CT-Q40 	0		0
+	    # GA-Q0 	5		5
+	    # GA-Q10 	5		4
+	    # GA-Q20 	1		0
+	    # GA-Q30 	1		0
+	    # GA-Q40 	0		0
+        options = mock_options("tests/test.bam","tests/test.rescaled.sam","tests/probs/")
+        ref = pysam.Fastafile("tests/fake1.fasta")
+        mapdamage.rescale.rescale_qual(ref,options,debug=True)
+        self.assertTrue(filecmp.cmp("tests/test.rescaled.sam","tests/test.rescaled.correct.sam"))
+
+
 if  __name__=='__main__':
     unittest.main()


=====================================
mapdamage/rescale.py
=====================================
@@ -19,8 +19,8 @@ def phred_char_to_pval(ch):
 
 def get_corr_prob(folder, rescale_length_5p, rescale_length_3p):
     """
-    Reads the damage probability correction file, returns a 
-    dictionary with this structure 
+    Reads the damage probability correction file, returns a
+    dictionary with this structure
     position (one based)  -  CT  -  probability
                           -  GA  -  probability
     """
@@ -28,21 +28,21 @@ def get_corr_prob(folder, rescale_length_5p, rescale_length_3p):
     if not os.path.isfile(full_path):
         sys.exit("Missing file, the file \n\tStats_out_MCMC_correct_prob.csv\nshould be in the folder\n\t"+folder+"\nDid you run the MCMC estimates of the parameters?")
     try:
-        fi_handle = csv.DictReader(open(full_path))
-        corr_prob = {}
-        for line in fi_handle:
-            if (line["Position"] in corr_prob):
-                sys.exit('This file has multiple position definitions %s, line %d: %s' % \
-                    (folder, fi_handle.line_num, corr_prob[line["Position"]]))
-            else:
-                corr_prob[int(line["Position"])] = {'C.T':float(line["C.T"]), 'G.A':float(line["G.A"])}
-
-        # Exclude probabilities for positions outside of user-specified region
-        for key in list(corr_prob.keys()):
-            if key < -rescale_length_3p or key > rescale_length_5p:
-                corr_prob.pop(key)
+        with open(full_path) as fi:
+            fi_handle = csv.DictReader(fi)
+            corr_prob = {}
+            for line in fi_handle:
+                if (line["Position"] in corr_prob):
+                    sys.exit('This file has multiple position definitions %s, line %d: %s' % \
+                        (folder, fi_handle.line_num, corr_prob[line["Position"]]))
+                else:
+                    corr_prob[int(line["Position"])] = {'C.T':float(line["C.T"]), 'G.A':float(line["G.A"])}
 
-        return corr_prob
+            # Exclude probabilities for positions outside of user-specified region
+            for key in list(corr_prob.keys()):
+                if key < -rescale_length_3p or key > rescale_length_5p:
+                    corr_prob.pop(key)
+            return corr_prob
     except csv.Error as e:
         sys.exit('File %s, line %d: %s' % (os.path.join(folder,"Stats_out_MCMC_correct_prob.csv"), \
             fi_handle.line_num, e))
@@ -50,11 +50,11 @@ def get_corr_prob(folder, rescale_length_5p, rescale_length_3p):
 
 def corr_this_base(corr_prob, nt_seq, nt_ref, pos, length,direction="both"):
     """
-    The position specific damaging correction, using the input 
-    corr_prob dictionary holding the damage correcting values 
-    nt_seq nucleotide in the sequence 
+    The position specific damaging correction, using the input
+    corr_prob dictionary holding the damage correcting values
+    nt_seq nucleotide in the sequence
     nt_ref nucleotide in the reference
-    pos relative position from the 5' end 
+    pos relative position from the 5' end
     length length of the sequence
     direction which end to consider the rescaling
     returns the correction probability for this particular set
@@ -71,8 +71,8 @@ def corr_this_base(corr_prob, nt_seq, nt_ref, pos, length,direction="both"):
     else:
         # other transitions/transversions are not affected by damage
         return 0
-    
-    back_pos = pos-length-1 
+
+    back_pos = pos-length-1
     # position from 3' end
 
     if pos in corr_prob:
@@ -90,7 +90,7 @@ def corr_this_base(corr_prob, nt_seq, nt_ref, pos, length,direction="both"):
     if direction == "forward":
         return p5_corr
     elif direction == "backward":
-        return p3_corr 
+        return p3_corr
     elif direction == "both":
         if pos < abs(back_pos) :
             # then we use the forward correction
@@ -151,7 +151,7 @@ def record_subs(subs,nt_seq,nt_ref,nt_qual,nt_newqual,prob_corr):
     else:
         sub_type = "NN"
     if (sub_type != "NN"):
-        # record only transitions 
+        # record only transitions
         subs[sub_type+"-before"][int(ord(nt_qual))-33] += 1
         subs[sub_type+"-after"][int(ord(nt_newqual))-33] += 1
     if (nt_ref in ["A","C","G","T"]):
@@ -175,7 +175,7 @@ def print_subs(subs):
     if subs["C"]!=0:
         # the special case of no substitutions
         print(("\tCT\t"+str(subs["CT-pvals_before"]/subs["C"])+"\t\t"+str(subs["CT-pvals"]/subs["C"])))
-    else: 
+    else:
         print("\tCT\tNA\t\tNA")
     if subs["T"]!=0:
         print(("\tTC\t"+str(subs["TC-pvals"]/subs["T"])+"\t\t"+str(subs["TC-pvals"]/subs["T"])))
@@ -200,19 +200,19 @@ def print_subs(subs):
     print(("\tGA-Q20 \t"+str(subs["GA-before-Q20"])+"\t\t"+str(subs["GA-after-Q20"])))
     print(("\tGA-Q30 \t"+str(subs["GA-before-Q30"])+"\t\t"+str(subs["GA-after-Q30"])))
     print(("\tGA-Q40 \t"+str(subs["GA-before-Q40"])+"\t\t"+str(subs["GA-after-Q40"])))
-    
+
 
 def rescale_qual_read(bam, read, ref, corr_prob,subs, debug = False,direction="both"):
     """
     bam              a pysam bam object
     read             a pysam read object
     ref              a pysam fasta ref file
-    reflengths       a dictionary holding the length of the references 
-    subs             a dictionary holding the corrected number of substition before and after scaling 
+    reflengths       a dictionary holding the length of the references
+    subs             a dictionary holding the corrected number of substition before and after scaling
     corr_prob dictionary from get_corr_prob
     returns a read with rescaled quality score
-    
-    Iterates through the read and reference, rescales the quality 
+
+    Iterates through the read and reference, rescales the quality
     according to corr_prob
     """
     if not debug:
@@ -238,7 +238,7 @@ def rescale_qual_read(bam, read, ref, corr_prob,subs, debug = False,direction="b
     pos_on_read = 0
     number_of_rescaled_bases = 0.0
     for (i, nt_seq, nt_ref, nt_qual) in zip(range(length_align), seq, refseq, qual):
-        # rescale the quality according to the triplet position, 
+        # rescale the quality according to the triplet position,
         # pair of the reference and the sequence
         if ((nt_seq == "T" and nt_ref =="C") or (nt_seq == "A" and nt_ref =="G")):
             # need to rescale this subs.
@@ -250,13 +250,13 @@ def rescale_qual_read(bam, read, ref, corr_prob,subs, debug = False,direction="b
         else:
             # don't rescale, other bases
             newp = 1 - phred_char_to_pval(nt_qual)
-            newq = nt_qual 
+            newq = nt_qual
         if pos_on_read < length_read:
-            new_qual[pos_on_read] = newq 
+            new_qual[pos_on_read] = newq
             record_subs(subs,nt_seq,nt_ref,nt_qual,new_qual[pos_on_read],newp)
             if nt_seq != "-":
                 pos_on_read += 1
-            # done with the aligned portion of the read 
+            # done with the aligned portion of the read
         else:
             if not debug:
                 logger.warning("Warning: The aligment of the read is longer than the actual read %s",(read.qname))
@@ -266,7 +266,7 @@ def rescale_qual_read(bam, read, ref, corr_prob,subs, debug = False,direction="b
     if read.is_reverse:
         new_qual = new_qual[::-1]
     if (read.cigar[0][0] == 4):
-        # check for soft clipping at forward end 
+        # check for soft clipping at forward end
         new_qual = read.qual[0:read.cigar[0][1]] + new_qual
     if (read.cigar[-1][0] == 4):
         # the same backwards
@@ -285,7 +285,7 @@ def rescale_qual_read(bam, read, ref, corr_prob,subs, debug = False,direction="b
 
 
 def rescale_qual(ref, options,debug=False):
-    """    
+    """
     ref                a pysam fasta ref file
     bam_filename       name of a BAM/SAM file to read
     fi                 file containing the csv with correction probabilities
@@ -319,20 +319,20 @@ def rescale_qual(ref, options,debug=False):
             pass
         elif not hit.qual and not debug:
             logger.warning("Cannot rescale base PHRED scores for read '%s'; no scores assigned." % hit.qname)
-        elif hit.is_paired : 
+        elif hit.is_paired :
             if first_pair and not debug:
-                # assuming the ends are non-overlapping 
+                # assuming the ends are non-overlapping
                 logger.warning("Warning! Assuming the pairs are non-overlapping, facing inwards and correctly paired.")
                 first_pair=False
             #5p --------------> 3p
             #3p <-------------- 5p
             # pair 1 (inwards)
-            #5p ----> 
+            #5p ---->
             #             <---- 5p
             #     A         B
-            # pair 2 (outwards), this happens if the reference is RC this is not supported 
+            # pair 2 (outwards), this happens if the reference is RC this is not supported
             #             ----> 3p
-            #3p <----         
+            #3p <----
             #     A         B
             # Correct outwards pairs from the 3p and inwards pairs with the 5p end
             if ((not hit.is_reverse) and hit.mate_is_reverse and (hit.pnext>hit.pos) and hit.tid==hit.mrnm):


=====================================
mapdamage/tests/probs/Stats_out_MCMC_correct_prob.csv
=====================================
@@ -0,0 +1,25 @@
+"","Position","C.T","G.A"
+"1",1,0.653616777018436,0
+"2",2,0.574821404505089,0
+"3",3,0.524660717195785,0
+"4",4,0.491676371805506,0
+"5",5,0.46919312975022,0
+"6",6,0.453361082263795,0
+"7",7,0.441882878430246,0
+"8",8,0.433345603590333,0
+"9",9,0.426853067214069,0
+"10",10,0.4218187429273,0
+"11",11,0.417847465539491,0
+"12",12,0.414666179579865,0
+"13",-12,0,0.427459648772127
+"14",-11,0,0.430678304009036
+"15",-10,0,0.434694598982698
+"16",-9,0,0.439783285536968
+"17",-8,0,0.446341431282045
+"18",-7,0,0.454957458647015
+"19",-6,0,0.466528745659899
+"20",-5,0,0.482466321045583
+"21",-4,0,0.505055143009322
+"22",-3,0,0.538101706832429
+"23",-2,0,0.588153916474448
+"24",-1,0,0.666277684522864


=====================================
mapdamage/tests/test.rescaled.correct.sam
=====================================
@@ -0,0 +1,11 @@
+ at SQ	SN:fake1	LN:201
+ at PG	ID:bwa	PN:bwa	VN:0.7.17-r1188	CL:bwa samse -f input.sam fake1.fasta - input.fastq
+fake1:50-100	0	fake1	69	37	31M	*	0	0	CCATGTCGGGCAGGCTGGTCTCGAACTCCTG	////////E6/EE/E/A////E//AAAE/EE	XT:A:U	NM:i:0	X0:i:1	X1:i:0	XM:i:0	XO:i:0	XG:i:0	MD:Z:31	MR:f:0
+fake1:50-100b	0	fake1	69	37	31M	*	0	0	CTATGTCGGGCAGGCTGGTCTCGAACTCCTG	/#//////E6/EE/E/A////E//AAAE/EE	XT:A:U	NM:i:1	X0:i:1	X1:i:0	XM:i:1	XO:i:0	XG:i:0	MD:Z:1C29	MR:f:0.57482
+fake1:50-100c	4	*	0	0	*	*	0	0	CGGTAGAGATGGAGTTTCACCATGTCGGGCAAG	//////////////A////////////E6/EE/
+fake1:50-100d	4	*	0	0	*	*	0	0	TAATAGAGATGGAGTTTCACCATGTCGTGCAAG	//////////////A////////////E6/EE/
+fake1:70-120	0	fake1	70	37	38M	*	0	0	TATGTCGGGCAGGCTGGTCTCGAACTCCTGACCTCAGA	#////6EE//AA6E//E66E/EAEEEEEAEEEAAEEA#	XT:A:U	NM:i:2	X0:i:1	X1:i:0	XM:i:2	XO:i:0	XG:i:0	MD:Z:0C36G0	MR:f:1.31989
+fake1:80-130	16	fake1	80	25	41M	*	0	0	GAGCTGGTCTCGAACTCCTGACCTCAGGCGATCTGCCTGTC	AAABBCCDDFEEEEEAEEEEA/EE/E/AAE//A//E/A///	XT:A:U	NM:i:3	X0:i:1	X1:i:0	XM:i:3	XO:i:0	XG:i:0	MD:Z:0A0G37C1	MR:f:0
+fake1:80-130b	16	fake1	80	25	51M	*	0	0	AGGCCGGTCTCGAACTCCTGACCTCAGGCGATCTGCCTGCCTTAACCTCCC	AAABBCCDDFEEEEEAEEEEA/EE/E/AAE//A//E/A///E/A$/EE66/	XT:A:U	NM:i:3	X0:i:1	X1:i:0	XM:i:3	XO:i:0	XG:i:0	MD:Z:4T37C1G6	MR:f:0.44188
+fake1:80-130c	16	fake1	80	37	51M	*	0	0	AGGCTGGTCTCGAACTCCTGACCTCAGGCGATCTGCCTGCCTTAGCCCCCC	AAABBCCDDFEEEEEAEEEEA/EE/E/AAE//A//E/A///E/A//EE66/	XT:A:U	NM:i:2	X0:i:1	X1:i:0	XM:i:2	XO:i:0	XG:i:0	MD:Z:42C4T3	MR:f:0
+fake1:80-130d	16	fake1	80	25	51M	*	0	0	AGGCTGGTCTCGAACTCCTGACCTCAGACGATCTGCCTGCCTTAGCCCCCC	AAABBCCDDFEEEEEAEEEEA/EE/E/AAE//A//E/A///E/A//EE66/	XT:A:U	NM:i:3	X0:i:1	X1:i:0	XM:i:3	XO:i:0	XG:i:0	MD:Z:27G14C4T3	MR:f:0


=====================================
mapdamage/version.py
=====================================
@@ -2,4 +2,4 @@
 try:
     from mapdamage._version import __version__
 except ImportError:
-    __version__ = "2.1.0"
+    __version__ = "2.2.0"


=====================================
setup.py
=====================================
@@ -4,8 +4,13 @@
 from distutils.core import setup
 from distutils.command.install import install as DistutilsInstall
 import os
+import sys
 import subprocess
 
+if sys.version_info < (3, 5):
+    print("At least Python 3.5 is required.\n", file=sys.stderr)
+    exit(1)
+
 def compile_seqtk():
     """Compiling the seqtk toolkit"""
     old_wd = os.getcwd()
@@ -55,11 +60,11 @@ class compileInstall(DistutilsInstall):
 setup(
     cmdclass={'install': compileInstall},
     name='mapdamage',
-    version='2.1.0',
-    author='Aurélien Ginolhac, Mikkel Schubert, Hákon Jónsson',
+    version='2.2.0',
+    author='AurÃ©lien Ginolhac, Mikkel Schubert, Ãkon JÃ³nsson',
     author_email='MSchubert at snm.ku.dk, jonsson.hakon at gmail.com',
     packages=['mapdamage'],
-    package_data={'mapdamage': ['Rscripts/*.R','Rscripts/stats/*.R','seqtk/seqtk']},
+    package_data={'mapdamage': ['Rscripts/*.R','Rscripts/stats/*.R','tests/*','seqtk/seqtk']},
     scripts=['bin/mapDamage'],
     url='https://github.com/ginolhac/mapDamage',
     license='LICENSE.txt',



View it on GitLab: https://salsa.debian.org/med-team/mapdamage/commit/2d75da747584b1166c8a9200b964ea44c346a9ed

-- 
View it on GitLab: https://salsa.debian.org/med-team/mapdamage/commit/2d75da747584b1166c8a9200b964ea44c346a9ed
You're receiving this email because of your account on salsa.debian.org.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://alioth-lists.debian.net/pipermail/debian-med-commit/attachments/20191213/2112964d/attachment-0001.html>