[med-svn] [Git][med-team/mapdamage][master] 5 commits: New upstream version 2.1.0+dfsg
Andreas Tille
gitlab at salsa.debian.org
Tue Oct 1 16:40:33 BST 2019
Andreas Tille pushed to branch master at Debian Med / mapdamage
Commits:
a8ef77dc by Andreas Tille at 2019-10-01T15:13:12Z
New upstream version 2.1.0+dfsg
- - - - -
0ba24d6d by Andreas Tille at 2019-10-01T15:13:12Z
New upstream version
- - - - -
fc460225 by Andreas Tille at 2019-10-01T15:13:12Z
Update upstream source from tag 'upstream/2.1.0+dfsg'
Update to upstream version '2.1.0+dfsg'
with Debian dir 955646f472dfa57207935caf795a978a466f52c9
- - - - -
f5ef3364 by Andreas Tille at 2019-10-01T15:33:41Z
Adapt patches - upstream took over 2to3.patch
- - - - -
43e9ade9 by Andreas Tille at 2019-10-01T15:34:39Z
New upstream version closes Python3 bug
- - - - -
21 changed files:
- README.md
- bin/mapDamage
- debian/changelog
- − debian/patches/2to3.patch
- debian/patches/fix_quantile_na_rm.patch
- debian/patches/series
- debian/patches/use_debian_packaged_seqtk.patch
- mapdamage/Rscripts/mapDamage.R
- mapdamage/Rscripts/stats/function.R
- mapdamage/Rscripts/stats/main.R
- mapdamage/__init__.py
- mapdamage/align.py
- mapdamage/composition.py
- mapdamage/parseoptions.py
- mapdamage/rescale.py
- mapdamage/rescale_test.py
- mapdamage/rscript.py
- mapdamage/seq.py
- mapdamage/tables.py
- mapdamage/version.py
- setup.py
Changes:
=====================================
README.md
=====================================
@@ -1,18 +1,18 @@
### Important
-Users with versions dating prior to June 12 2013 please update. A nasty bug that caused the statistical part of mapDamage to use half of the data for estimation of the damage parameters, sorry for the inconvenience.
+Users with versions dating prior to June 12 2013 please update. A nasty bug that caused the statistical part of `mapDamage` to use half of the data for estimation of the damage parameters, sorry for the inconvenience.
### Introduction
Complete documentation, instructions, examples, screenshots and FAQ are available at [this address](http://ginolhac.github.io/mapDamage/).
-[mapDamage2](http://geogenetics.ku.dk/publications/mapdamage2.0/) is a computational framework written in Python and R, which tracks and quantifies DNA damage patterns
-among ancient DNA sequencing reads generated by Next-Generation Sequencing platforms.
+[mapDamage2](http://geogenetics.ku.dk/publications/mapdamage2.0/) is a computational framework written in Python and R, which tracks and quantifies DNA damage patterns
+among ancient DNA sequencing reads generated by Next-Generation Sequencing platforms.
-Mapdamage is developed at the [Centre for GeoGenetics](http://geogenetics.ku.dk/) by the [Orlando Group ](http://geogenetics.ku.dk/research/research_groups/palaeomix_group/).
+`mapDamage` is developed at the [Centre for GeoGenetics](http://geogenetics.ku.dk/) by the [Orlando Group ](http://geogenetics.ku.dk/research/research_groups/palaeomix_group/).
### Web interface
Anna Kostikova from [insideDNA](https://insidedna.me) implemented a web interface for mapDamage.
-Users can adjust the cloud ressources in terms of RAM/CPU to perform their analysis. She wrote a [tutorial](https://insidedna.me/tutorials/view/Analysis-ancient-DNA-samples-using-mapDamage) explaining how to use this [web interface (sign up required)](https://insidedna.me/app#/tools/100648/)
+Users can adjust the cloud resources in terms of RAM/CPU to perform their analysis. She wrote a [tutorial](https://insidedna.me/tutorials/view/Analysis-ancient-DNA-samples-using-mapDamage) explaining how to use this [web interface (sign up required)](https://insidedna.me/app#/tools/100648/)
### Citation
@@ -31,5 +31,4 @@ http://bioinformatics.oxfordjournals.org/content/27/15/2153](http://bioinformati
### Contact
Please report bugs and suggest possible improvements to Aurélien Ginolhac, Mikkel Schubert or Hákon Jónsson by email:
-aginolhac at snm.ku.dk, MSchubert at snm.ku.dk or jonsson.hakon at gmail.com.
-
+ginolhac at gmail.com, mikkelsch at gmail.com or jonsson.hakon at gmail.com.
=====================================
bin/mapDamage
=====================================
@@ -1,8 +1,6 @@
-#!/usr/bin/python
+#!/usr/bin/env python3
# -*- coding: utf-8 -*-
-from __future__ import print_function
-
import logging
import random
import time
@@ -37,7 +35,7 @@ plot and quantify damage patterns from a SAM/BAM file
:Date: November 2012
:Type: tool
:Input: SAM/BAM
-:Output: tabulated tables, pdf
+:Output: tabulated tables, pdf
"""
# check if pysam if available
@@ -45,7 +43,7 @@ MODULE = "pysam"
URL = "http://code.google.com/p/pysam/"
try:
__import__(MODULE)
-except ImportError, e:
+except ImportError as e:
sys.stderr.write("Error: Could not import required module '%s':\n\t- %s\n" % (MODULE, e))
sys.stderr.write(" If module is not installed, please download from '%s'.\n" % URL)
sys.stderr.write(" A local install may be performed using the following command:\n")
@@ -98,7 +96,7 @@ def _downsample_to_fixed_number(bamfile, options):
if index >= downsample_to:
continue
sample[index] = record
- return filter(None, sample)
+ return [_f for _f in sample if _f]
def _read_bamfile(bamfile, options):
@@ -116,12 +114,12 @@ def _read_bamfile(bamfile, options):
def _check_mm_option():
"""
- As the user can override the system wide mapdamage modules with
- the --mapdamage-modules, it has to happen before option parsing
+ As the user can override the system wide mapdamage modules with
+ the --mapdamage-modules, it has to happen before option parsing
in mapdamage.parseoptions
"""
path_to_mm = None
- for nr,arg in zip(xrange(len(sys.argv)),sys.argv):
+ for nr,arg in zip(list(range(len(sys.argv))),sys.argv):
if arg.startswith("--mapdamage-modules"):
try:
if "=" in arg:
@@ -130,7 +128,7 @@ def _check_mm_option():
path_to_mm = arg_p[1]
else:
# the option is of the format --mapdamage-modules AAAA
- path_to_mm = sys.argv[nr+1]
+ path_to_mm = sys.argv[nr+1]
break
except IndexError as e:
raise SystemExit("Must specify a path to --mapdamage-modules")
@@ -201,7 +199,7 @@ def main():
mapdamage.composition.get_base_comp(options.ref,path_to_basecomp)
mapdamage.rscript.run_stats(options)
return 0
-
+
# fetch all references and associated lengths in nucleotides
try:
ref = pysam.Fastafile(options.ref)
@@ -230,7 +228,7 @@ def main():
in_bam = pysam.Samfile(options.filename)
- reflengths = dict(zip(in_bam.references, in_bam.lengths))
+ reflengths = dict(list(zip(in_bam.references, in_bam.lengths)))
# check if references in SAM/BAM are the same in the fasta reference file
fai_lengths = mapdamage.seq.read_fasta_index(options.ref + ".fai")
if not fai_lengths:
@@ -322,7 +320,7 @@ def main():
# compute base composition for genomic regions
mapdamage.composition.count_ref_comp(read, chrom, before, after, dnacomp)
-
+
if options.fasta:
mapdamage.seq.write_fasta(read, chrom, seq, refseq, min(coordinate), max(coordinate), before, after, fhfasta)
@@ -357,7 +355,7 @@ def main():
if mapdamage.tables.check_table_and_warn_if_dmg_freq_is_low(options.folder):
options.no_stats = True
-
+
# run the Bayesian estimation
if not options.no_stats:
# before running the Bayesian estimation get the base composition
=====================================
debian/changelog
=====================================
@@ -1,6 +1,6 @@
-mapdamage (2.0.9+dfsg-2) UNRELEASED; urgency=medium
+mapdamage (2.1.0+dfsg-1) UNRELEASED; urgency=medium
- * Use 2to3 to port to Python3
+ * New upstream version
Closes: #936989
* debhelper-compat 12
* Standards-Version: 4.4.0
@@ -8,7 +8,7 @@ mapdamage (2.0.9+dfsg-2) UNRELEASED; urgency=medium
* buildsystem=pybuild
* python3-pysam <!nocheck>
- -- Andreas Tille <tille at debian.org> Tue, 03 Sep 2019 11:02:34 +0200
+ -- Andreas Tille <tille at debian.org> Tue, 01 Oct 2019 17:13:12 +0200
mapdamage (2.0.9+dfsg-1) unstable; urgency=medium
=====================================
debian/patches/2to3.patch deleted
=====================================
@@ -1,423 +0,0 @@
-Description: Use 2to3 to port to Python3
-Bug-Debian: https://bugs.debian.org/936989
-Author: Andreas Tille <tille at debian.org>
-Last-Update: Tue, 03 Sep 2019 11:02:34 +0200
-
---- a/bin/mapDamage
-+++ b/bin/mapDamage
-@@ -1,7 +1,7 @@
--#!/usr/bin/python
-+#!/usr/bin/python3
- # -*- coding: utf-8 -*-
-
--from __future__ import print_function
-+
-
- import logging
- import random
-@@ -45,7 +45,7 @@ MODULE = "pysam"
- URL = "http://code.google.com/p/pysam/"
- try:
- __import__(MODULE)
--except ImportError, e:
-+except ImportError as e:
- sys.stderr.write("Error: Could not import required module '%s':\n\t- %s\n" % (MODULE, e))
- sys.stderr.write(" If module is not installed, please download from '%s'.\n" % URL)
- sys.stderr.write(" A local install may be performed using the following command:\n")
-@@ -98,7 +98,7 @@ def _downsample_to_fixed_number(bamfile,
- if index >= downsample_to:
- continue
- sample[index] = record
-- return filter(None, sample)
-+ return [_f for _f in sample if _f]
-
-
- def _read_bamfile(bamfile, options):
-@@ -121,7 +121,7 @@ def _check_mm_option():
- in mapdamage.parseoptions
- """
- path_to_mm = None
-- for nr,arg in zip(xrange(len(sys.argv)),sys.argv):
-+ for nr,arg in zip(range(len(sys.argv)),sys.argv):
- if arg.startswith("--mapdamage-modules"):
- try:
- if "=" in arg:
-@@ -230,7 +230,7 @@ def main():
- in_bam = pysam.Samfile(options.filename)
-
-
-- reflengths = dict(zip(in_bam.references, in_bam.lengths))
-+ reflengths = dict(list(zip(in_bam.references, in_bam.lengths)))
- # check if references in SAM/BAM are the same in the fasta reference file
- fai_lengths = mapdamage.seq.read_fasta_index(options.ref + ".fai")
- if not fai_lengths:
---- a/mapdamage/__init__.py
-+++ b/mapdamage/__init__.py
-@@ -1,4 +1,4 @@
--#!/usr/bin/env python
-+#!/usr/bin/python3
-
- import mapdamage.parseoptions
- import mapdamage.seq
---- a/mapdamage/align.py
-+++ b/mapdamage/align.py
-@@ -1,4 +1,4 @@
--#!/usr/bin/env python
-+#!/usr/bin/python3
-
- import itertools
- import mapdamage
-@@ -84,7 +84,7 @@ def get_mis(read, seq, refseq, ref, leng
- std = '-' if read.is_reverse else '+'
- subtable = tab[ref][end][std]
-
-- for (i, nt_seq, nt_ref) in itertools.izip(xrange(length), seq, refseq):
-+ for (i, nt_seq, nt_ref) in zip(range(length), seq, refseq):
- if (nt_seq in "ACGT-") and (nt_ref in "ACGT-"):
- if nt_ref != "-":
- # record base composition in the reference, only A, C, G, T
---- a/mapdamage/composition.py
-+++ b/mapdamage/composition.py
-@@ -1,4 +1,4 @@
--#!/usr/bin/env python
-+#!/usr/bin/python3
-
- import mapdamage
- import itertools
-@@ -10,8 +10,8 @@ def count_ref_comp(read, chrom, before,
- """ record basae composition in external genomic regions """
- std = '-' if read.is_reverse else '+'
-
-- _update_table(comp[chrom]['5p'][std], before, xrange(-len(before), 0))
-- _update_table(comp[chrom]['3p'][std], after, xrange(1, len(after) + 1))
-+ _update_table(comp[chrom]['5p'][std], before, range(-len(before), 0))
-+ _update_table(comp[chrom]['3p'][std], after, range(1, len(after) + 1))
-
-
- def count_read_comp(read, chrom, length, comp):
-@@ -20,12 +20,12 @@ def count_read_comp(read, chrom, length,
- if read.is_reverse:
- std, seq = '-', mapdamage.seq.revcomp(seq)
-
-- _update_table(comp[chrom]['5p'][std], seq, xrange(1, length + 1))
-- _update_table(comp[chrom]['3p'][std], reversed(seq), xrange(-1, - length - 1, -1))
-+ _update_table(comp[chrom]['5p'][std], seq, range(1, length + 1))
-+ _update_table(comp[chrom]['3p'][std], reversed(seq), range(-1, - length - 1, -1))
-
-
- def _update_table(table, sequence, indices):
-- for (index, nt) in itertools.izip(indices, sequence):
-+ for (index, nt) in zip(indices, sequence):
- if nt in table:
- table[nt][index] += 1
-
-@@ -47,7 +47,7 @@ def get_base_comp(filename,destination=F
- bases["C"] = bases["C"] + int(tabs[3])
- bases["G"] = bases["G"] + int(tabs[4])
- bases["T"] = bases["T"] + int(tabs[5])
-- except (OSError, ValueError), error:
-+ except (OSError, ValueError) as error:
- sys.stderr.write("Error: Seqtk failed: %s\n" % (error,))
- sys.exit(1)
- # get the base frequencies
---- a/mapdamage/parseoptions.py
-+++ b/mapdamage/parseoptions.py
-@@ -1,4 +1,4 @@
--#!/usr/bin/env python
-+#!/usr/bin/python3
-
- from optparse import OptionParser, OptionGroup, SUPPRESS_HELP
- import os
-@@ -235,12 +235,12 @@ def options():
-
- if os.path.isdir(options.folder):
- if not options.quiet and not options.plot_only:
-- print("Warning, %s already exists" % options.folder)
-+ print(("Warning, %s already exists" % options.folder))
- if options.plot_only:
- if not file_exist(options.folder+"/dnacomp.txt") or not file_exist(options.folder+"/misincorporation.txt"):
- parser.error('folder %s is not a valid result folder' % options.folder)
- else:
-- os.makedirs(options.folder, mode = 0750)
-+ os.makedirs(options.folder, mode = 0o750)
- if options.plot_only or options.stats_only or options.rescale_only:
- sys.stderr.write("Error, %s does not exist while plot/stats/rescale only was used\n" % options.folder)
- return None
---- a/mapdamage/rescale.py
-+++ b/mapdamage/rescale.py
-@@ -31,14 +31,14 @@ def get_corr_prob(folder, rescale_length
- fi_handle = csv.DictReader(open(full_path))
- corr_prob = {}
- for line in fi_handle:
-- if (corr_prob.has_key(line["Position"])):
-+ if (line["Position"] in corr_prob):
- sys.exit('This file has multiple position definitions %s, line %d: %s' % \
- (folder, fi_handle.line_num, corr_prob[line["Position"]]))
- else:
- corr_prob[int(line["Position"])] = {'C.T':float(line["C.T"]), 'G.A':float(line["G.A"])}
-
- # Exclude probabilities for positions outside of user-specified region
-- for key in corr_prob.keys():
-+ for key in list(corr_prob.keys()):
- if key < -rescale_length_3p or key > rescale_length_5p:
- corr_prob.pop(key)
-
-@@ -75,13 +75,13 @@ def corr_this_base(corr_prob, nt_seq, nt
- back_pos = pos-length-1
- # position from 3' end
-
-- if corr_prob.has_key(pos):
-+ if pos in corr_prob:
- p5_corr = corr_prob[pos][subs]
- # correction from 5' end
- else:
- p5_corr = 0
-
-- if corr_prob.has_key(back_pos):
-+ if back_pos in corr_prob:
- p3_corr = corr_prob[back_pos][subs]
- # correction from 3' end
- else:
-@@ -104,7 +104,7 @@ def corr_this_base(corr_prob, nt_seq, nt
-
- def initialize_subs():
- """Initialize a substitution table, to track the expected substitution counts"""
-- per_qual = dict(zip(range(0,130),[0]*130))
-+ per_qual = dict(list(zip(list(range(0,130)),[0]*130)))
- subs = {"CT-before":per_qual.copy(),\
- "TC-before":per_qual.copy(),\
- "GA-before":per_qual.copy(),\
-@@ -164,7 +164,7 @@ def qual_summary_subs(subs):
- for qv in subs[i]:
- if qv >= lv :
- key = i+"-Q"+str(lv)
-- if subs.has_key(key):
-+ if key in subs:
- subs[key] += subs[i][qv]
- else:
- subs[key] = subs[i][qv]
-@@ -174,32 +174,32 @@ def print_subs(subs):
- print("\tThe expected substition frequencies before and after scaling using the scaled qualities as probalities:")
- if subs["C"]!=0:
- # the special case of no substitutions
-- print("\tCT\t"+str(subs["CT-pvals_before"]/subs["C"])+"\t\t"+str(subs["CT-pvals"]/subs["C"]))
-+ print(("\tCT\t"+str(subs["CT-pvals_before"]/subs["C"])+"\t\t"+str(subs["CT-pvals"]/subs["C"])))
- else:
- print("\tCT\tNA\t\tNA")
- if subs["T"]!=0:
-- print("\tTC\t"+str(subs["TC-pvals"]/subs["T"])+"\t\t"+str(subs["TC-pvals"]/subs["T"]))
-+ print(("\tTC\t"+str(subs["TC-pvals"]/subs["T"])+"\t\t"+str(subs["TC-pvals"]/subs["T"])))
- else:
- print("\tTC\tNA\t\tNA")
- if subs["G"]!=0:
-- print("\tGA\t"+str(subs["GA-pvals_before"]/subs["G"])+"\t\t"+str(subs["GA-pvals"]/subs["G"]))
-+ print(("\tGA\t"+str(subs["GA-pvals_before"]/subs["G"])+"\t\t"+str(subs["GA-pvals"]/subs["G"])))
- else:
- print("\tGA\tNA\t\tNA")
- if subs["A"]!=0:
-- print("\tAG\t"+str(subs["AG-pvals"]/subs["A"])+"\t\t"+str(subs["AG-pvals"]/subs["A"]))
-+ print(("\tAG\t"+str(subs["AG-pvals"]/subs["A"])+"\t\t"+str(subs["AG-pvals"]/subs["A"])))
- else:
- print("\tAG\tNA\t\tNA")
- print("\tQuality metrics before and after scaling")
-- print("\tCT-Q0 \t"+str(subs["CT-before-Q0"])+"\t\t"+str(subs["CT-after-Q0"]))
-- print("\tCT-Q10 \t"+str(subs["CT-before-Q10"])+"\t\t"+str(subs["CT-after-Q10"]))
-- print("\tCT-Q20 \t"+str(subs["CT-before-Q20"])+"\t\t"+str(subs["CT-after-Q20"]))
-- print("\tCT-Q30 \t"+str(subs["CT-before-Q30"])+"\t\t"+str(subs["CT-after-Q30"]))
-- print("\tCT-Q40 \t"+str(subs["CT-before-Q40"])+"\t\t"+str(subs["CT-after-Q40"]))
-- print("\tGA-Q0 \t"+str(subs["GA-before-Q0"])+"\t\t"+str(subs["GA-after-Q0"]))
-- print("\tGA-Q10 \t"+str(subs["GA-before-Q10"])+"\t\t"+str(subs["GA-after-Q10"]))
-- print("\tGA-Q20 \t"+str(subs["GA-before-Q20"])+"\t\t"+str(subs["GA-after-Q20"]))
-- print("\tGA-Q30 \t"+str(subs["GA-before-Q30"])+"\t\t"+str(subs["GA-after-Q30"]))
-- print("\tGA-Q40 \t"+str(subs["GA-before-Q40"])+"\t\t"+str(subs["GA-after-Q40"]))
-+ print(("\tCT-Q0 \t"+str(subs["CT-before-Q0"])+"\t\t"+str(subs["CT-after-Q0"])))
-+ print(("\tCT-Q10 \t"+str(subs["CT-before-Q10"])+"\t\t"+str(subs["CT-after-Q10"])))
-+ print(("\tCT-Q20 \t"+str(subs["CT-before-Q20"])+"\t\t"+str(subs["CT-after-Q20"])))
-+ print(("\tCT-Q30 \t"+str(subs["CT-before-Q30"])+"\t\t"+str(subs["CT-after-Q30"])))
-+ print(("\tCT-Q40 \t"+str(subs["CT-before-Q40"])+"\t\t"+str(subs["CT-after-Q40"])))
-+ print(("\tGA-Q0 \t"+str(subs["GA-before-Q0"])+"\t\t"+str(subs["GA-after-Q0"])))
-+ print(("\tGA-Q10 \t"+str(subs["GA-before-Q10"])+"\t\t"+str(subs["GA-after-Q10"])))
-+ print(("\tGA-Q20 \t"+str(subs["GA-before-Q20"])+"\t\t"+str(subs["GA-after-Q20"])))
-+ print(("\tGA-Q30 \t"+str(subs["GA-before-Q30"])+"\t\t"+str(subs["GA-after-Q30"])))
-+ print(("\tGA-Q40 \t"+str(subs["GA-before-Q40"])+"\t\t"+str(subs["GA-after-Q40"])))
-
-
- def rescale_qual_read(bam, read, ref, corr_prob,subs, debug = False,direction="both"):
-@@ -237,7 +237,7 @@ def rescale_qual_read(bam, read, ref, co
- new_qual = [-100]*length_read
- pos_on_read = 0
- number_of_rescaled_bases = 0.0
-- for (i, nt_seq, nt_ref, nt_qual) in itertools.izip(xrange(length_align), seq, refseq, qual):
-+ for (i, nt_seq, nt_ref, nt_qual) in zip(range(length_align), seq, refseq, qual):
- # rescale the quality according to the triplet position,
- # pair of the reference and the sequence
- if ((nt_seq == "T" and nt_ref =="C") or (nt_seq == "A" and nt_ref =="G")):
---- a/mapdamage/rescale_test.py
-+++ b/mapdamage/rescale_test.py
-@@ -1,6 +1,9 @@
- import unittest
- import optparse
--import rescale
-+try:
-+ from . import rescale
-+except:
-+ from mapdamage import rescale
- import pysam
- import filecmp
-
---- a/mapdamage/rscript.py
-+++ b/mapdamage/rscript.py
-@@ -1,4 +1,4 @@
--#!/usr/bin/env python
-+#!/usr/bin/python3
-
- import os
- import subprocess
-@@ -35,11 +35,11 @@ def plot(opt):
- script = construct_path("mapDamage.R")
- call = ["Rscript", script, fcomp, title, opt.refplot, fmut, opt.readplot, \
- opt.ymax, opt.folder, opt.title, __version__]
-- code = subprocess.call(map(str, call))
-+ code = subprocess.call(list(map(str, call)))
-
- if code == 0:
- if not opt.quiet:
-- print("pdf %s generated" % title)
-+ print(("pdf %s generated" % title))
- return 0
- else:
- print("Error: plotting with R failed")
-@@ -58,7 +58,7 @@ def opt_plots(opt):
- script = construct_path("lengths.R")
- call = ["Rscript", script, flength, output, fmut, opt.length, \
- opt.title, __version__, bool_to_int(opt.quiet)]
-- code = subprocess.call(map(str, call))
-+ code = subprocess.call(list(map(str, call)))
- if code == 0:
- return 0
- else:
-@@ -90,11 +90,11 @@ def check_R_lib():
- if len(missing_lib) > 1:
- # Grammar Nazi has arrived
- last_ele = missing_lib.pop()
-- print ("Missing the following R libraries '" + "', '".join(missing_lib) \
-- + "' and '" + last_ele + "'")
-+ print(("Missing the following R libraries '" + "', '".join(missing_lib) \
-+ + "' and '" + last_ele + "'"))
- return 1
- elif len(missing_lib) == 1:
-- print ("Missing the following R library "+missing_lib[0])
-+ print(("Missing the following R library "+missing_lib[0]))
- return 1
- else :
- # No missing libraries
---- a/mapdamage/seq.py
-+++ b/mapdamage/seq.py
-@@ -1,9 +1,9 @@
--#!/usr/bin/env python
-+#!/usr/bin/python3
- import sys
- import string
-
- # from Martin Kircher, to complement DNA
--TABLE = string.maketrans('TGCAMRWSYKVHDBtgcamrwsykvhdb', \
-+TABLE = str.maketrans('TGCAMRWSYKVHDBtgcamrwsykvhdb', \
- 'ACGTKYWSRMBDHVacgtkywsrmbdhv')
-
- LETTERS = ("A", "C", "G", "T")
---- a/mapdamage/tables.py
-+++ b/mapdamage/tables.py
-@@ -1,4 +1,4 @@
--#!/usr/bin/env python
-+#!/usr/bin/python3
-
- import os
- import collections
-@@ -16,7 +16,7 @@ def initialize_mut(ref, length):
- for std in ('+','-'):
- tab_std = tab_end[std] = {}
- for mut in mapdamage.seq.HEADER:
-- tab_std[mut] = dict.fromkeys(xrange(length), 0)
-+ tab_std[mut] = dict.fromkeys(range(length), 0)
-
- return tab
-
-@@ -26,8 +26,8 @@ def print_mut(mut, opt, out):
-
-
- def initialize_comp(ref, around, length):
-- keys = {"3p" : range(-length, 0) + range(1, around + 1),
-- "5p" : range(-around, 0) + range(1, length + 1)}
-+ keys = {"3p" : list(range(-length, 0)) + list(range(1, around + 1)),
-+ "5p" : list(range(-around, 0)) + list(range(1, length + 1))}
-
- tab = {}
- for contig in ref:
-@@ -74,8 +74,8 @@ def check_table_and_warn_if_dmg_freq_is_
- total = 0.0
- for filename in ("5pCtoT_freq.txt", "3pGtoA_freq.txt"):
- if not os.path.exists(folder+"/"+filename):
-- print("Error: Required table has not been created ('%s'), bayesian computation cannot be performed" \
-- % filename)
-+ print(("Error: Required table has not been created ('%s'), bayesian computation cannot be performed" \
-+ % filename))
- return True
-
- with open(os.path.join(folder, filename)) as handle:
-@@ -85,12 +85,12 @@ def check_table_and_warn_if_dmg_freq_is_
- total += float(freq[1])
- break
- else:
-- print("Error: Could not find pos = 1 in table '%s', bayesian computation cannot be performed" \
-- % filename)
-+ print(("Error: Could not find pos = 1 in table '%s', bayesian computation cannot be performed" \
-+ % filename))
- return True
-
- if total < 0.01:
-- print("Warning: DNA damage levels are too low, the Bayesian computation should not be performed (%f < 0.01)\n" % total)
-+ print(("Warning: DNA damage levels are too low, the Bayesian computation should not be performed (%f < 0.01)\n" % total))
-
- return False
-
-@@ -103,9 +103,9 @@ def _print_freq_table(table, columns, op
- out.write("# Chr: reference from sam/bam header, End: from which termini of DNA sequences, Std: strand of reads\n")
- out.write("Chr\tEnd\tStd\tPos\t%s\n" % ("\t".join(columns)))
-
-- for (reference, ends) in sorted(table.iteritems()):
-- for (end, strands) in sorted(ends.iteritems()):
-- for (strand, subtable) in sorted(strands.iteritems()):
-+ for (reference, ends) in sorted(table.items()):
-+ for (end, strands) in sorted(ends.items()):
-+ for (strand, subtable) in sorted(strands.items()):
- subtable["Total"] = {}
- for index in sorted(subtable[columns[0]]):
- subtable["Total"][index] = sum(subtable[letter][index] for letter in mapdamage.seq.LETTERS)
---- a/mapdamage/version.py
-+++ b/mapdamage/version.py
-@@ -1,4 +1,4 @@
--#!/usr/bin/env python
-+#!/usr/bin/python3
- try:
- from mapdamage._version import __version__
- except ImportError:
---- a/setup.py
-+++ b/setup.py
-@@ -1,4 +1,4 @@
--#!/usr/bin/env python
-+#!/usr/bin/python3
- # -*- coding: utf-8 -*-
-
- from distutils.core import setup
-@@ -14,9 +14,9 @@ def setup_version():
- try:
- version = subprocess.check_output(("git", "describe", "--always", "--tags", "--dirty"))
- with open(os.path.join("mapdamage", "_version.py"), "w") as handle:
-- handle.write("#!/usr/bin/env python\n")
-+ handle.write("#!/usr/bin/python3\n")
- handle.write("__version__ = %r\n" % (version.strip(),))
-- except (subprocess.CalledProcessError, OSError), error:
-+ except (subprocess.CalledProcessError, OSError) as error:
- raise SystemExit("Could not determine mapDamage version: %s" % (error,))
-
-
=====================================
debian/patches/fix_quantile_na_rm.patch
=====================================
@@ -4,11 +4,9 @@ Description: change calcSampleStats such as any NA and Nan's are
Last-Update: 2017-03-28
Bug: https://github.com/ginolhac/mapDamage/issues/18
-Index: mapdamage/mapdamage/Rscripts/stats/function.R
-===================================================================
---- mapdamage.orig/mapdamage/Rscripts/stats/function.R 2017-03-29 00:02:46.538470957 -0400
-+++ mapdamage/mapdamage/Rscripts/stats/function.R 2017-03-29 00:11:09.141025051 -0400
-@@ -595,8 +595,8 @@
+--- a/mapdamage/Rscripts/stats/function.R
++++ b/mapdamage/Rscripts/stats/function.R
+@@ -595,8 +595,8 @@ calcSampleStats <- function(da,X){
pos=da[,"Pos"],
mea=apply(X,1,mean),
med=apply(X,1,median),
=====================================
debian/patches/series
=====================================
@@ -1,3 +1,2 @@
use_debian_packaged_seqtk.patch
fix_quantile_na_rm.patch
-2to3.patch
=====================================
debian/patches/use_debian_packaged_seqtk.patch
=====================================
@@ -27,23 +27,24 @@ Description: Use Debian packaged seqtk
def setup_version():
if not os.path.exists(".git"):
# Release version, no .git folder
-@@ -40,16 +25,7 @@ class compileInstall(DistutilsInstall):
+@@ -40,17 +25,8 @@ class compileInstall(DistutilsInstall):
def run(self):
self.record=""
setup_version()
- compile_seqtk()
DistutilsInstall.run(self)
-- # fixing the permission problem of seqtk
+ # fixing the permission problem of seqtk
- files = self.get_outputs()
- for fi in files:
- if fi.endswith("seqtk/seqtk"):
-- os.chmod(fi,0755)
+- os.chmod(fi,0o755)
+-
-
-
-
-
setup(
+ cmdclass={'install': compileInstall},
@@ -59,7 +35,7 @@ setup(
author='Aurélien Ginolhac, Mikkel Schubert, Hákon Jónsson',
author_email='MSchubert at snm.ku.dk, jonsson.hakon at gmail.com',
@@ -55,7 +56,7 @@ Description: Use Debian packaged seqtk
license='LICENSE.txt',
--- a/bin/mapDamage
+++ b/bin/mapDamage
-@@ -151,7 +151,7 @@ def main():
+@@ -149,7 +149,7 @@ def main():
sys.path.insert(0,path_to_mm)
import mapdamage
@@ -66,12 +67,12 @@ Description: Use Debian packaged seqtk
"been intalled properly or current user does not\n"
--- a/mapdamage/composition.py
+++ b/mapdamage/composition.py
-@@ -35,7 +35,7 @@ def get_base_comp(filename,destination=F
+@@ -33,7 +33,7 @@ def get_base_comp(filename,destination=F
Gets the basecomposition of all the sequences in filename
and returns the value to destination if given.
"""
-- path_to_seqtk = mapdamage.rscript.construct_path("seqtk",folder="seqtk")
-+ path_to_seqtk = "/usr/bin/seqtk"
+- path_to_seqtk = mapdamage.rscript.construct_path("seqtk",folder="seqtk")
++ path_to_seqtk = "/usr/bin/seqtk"
bases = {"A":0,"C":0,"G":0,"T":0}
alp = ["A","C","G","T"]
try:
=====================================
mapdamage/Rscripts/mapDamage.R
=====================================
@@ -33,20 +33,20 @@ draw.open.rect <- function(xleft, ybottom, xright, ytop, padding = 0) {
} else {
xpoints <- c(xright, xleft - padding, xleft - padding, xright)
}
-
+
lines(xpoints, c(ytop, ytop, ybottom, ybottom), col = "darkgrey")
}
plot.base.composition <- function(tbl, base, color, around, ylabels.at = c(), xlabels = FALSE) {
xcoords <- c(-around:-1, 1:around)
-
+
plot.axis <- function(yaxis.at) {
axis(side = yaxis.at, labels = (yaxis.at == ylabels.at), line = 0, las = 2, cex.axis = 0.8)
if ((yaxis.at == 2) && (yaxis.at == ylabels.at)) {
mtext("Frequency", side = 2, line = 2.5, cex = 0.6)
}
-
+
if (xlabels) {
axis(side = 1, labels = xcoords, at = xcoords, las = 2, cex.axis = 0.6)
} else {
@@ -64,7 +64,7 @@ plot.base.composition <- function(tbl, base, color, around, ylabels.at = c(), xl
ycoords <- NULL
for (i in xcoords) {
- ycoords <- append(ycoords, mean(subtbl[(subtbl$Pos==i), base]/subtbl$Total[(subtbl$Pos==i)],na.rm=T))
+ ycoords <- append(ycoords, mean(subtbl[(subtbl$Pos==i), base]/subtbl$Total[(subtbl$Pos==i)],na.rm=T))
}
points(xcoords, ycoords, pch = 20, col = color, type = "b")
}
@@ -80,7 +80,7 @@ plot.base.composition <- function(tbl, base, color, around, ylabels.at = c(), xl
plot.frequencies("3p")
plot.axis(4)
draw.open.rect(-around, 0, 0, 0.5)
-
+
par(mar = c(1, 2, 1, 2))
}
@@ -103,8 +103,8 @@ calculate.mutation.table <- function(filename, length)
write.mutation.table <- function(tbl, end, mismatch, filename)
{
- columns <- c("pos", sprintf("5p%s", mismatch))
- tbl[is.na(tbl)] <- 0 # Python doesn't like NAs
+ columns <- c("pos", sprintf("%s%s", end, mismatch))
+ tbl[is.na(tbl)] <- 0 # Python doesn't like NAs
write.table(tbl[tbl$End == end, c("Pos", mismatch)],
row.names = FALSE, col.names = columns,
sep = "\t", quote = FALSE,
@@ -117,12 +117,12 @@ plot.mutations <- function(end, axis.at, start.i, end.i, modifier) {
subtable <- tbl[tbl$End == end, c("Pos", mismatches)]
rates <- rowSums(subtable) - subtable$Pos
subtable <- aggregate(list(Rate = rates), list(Pos = subtable$Pos), sum)
-
+
lines(subtable$Pos * modifier, subtable$Rate,
xlim = c(1, OPT.LENGTH), ylim = c(0, OPT.YMAX),
col = color, lwd = width)
}
-
+
plot(NA, xlim = c(start.i, end.i), ylim=c(0, OPT.YMAX), col="grey", lwd = 1, type = "l", xlab = "", ylab = "", axes = FALSE)
axis(side = 1, labels = start.i:end.i, at = start.i:end.i, las = 2, cex.axis = 0.8)
axis(side = axis.at, labels = TRUE, las = 2, cex.axis = 0.8)
@@ -130,11 +130,11 @@ plot.mutations <- function(end, axis.at, start.i, end.i, modifier) {
mtext("Frequency", side = 2, line = 2.5, cex = 0.6)
}
draw.open.rect(start.i, OPT.YMAX, end.i, -0.01, padding = 0.5)
-
+
for (mismatch in MISMATCHES) {
do.plot(mut, end, modifier, mismatch, "grey", 1)
}
-
+
do.plot(mut, end, modifier, CLIPPING, "orange", 1)
do.plot(mut, end, modifier, DELETIONS, "green", 1)
do.plot(mut, end, modifier, INSERTIONS, "purple", 1)
=====================================
mapdamage/Rscripts/stats/function.R
=====================================
@@ -1,17 +1,17 @@
#Various useful functions
getPmat <- function(tmu,tv_ti_ratio,acgt){
- #Returns the evolutionary substitution matrix
+ #Returns the evolutionary substitution matrix
if (sum(acgt>=1)!=0 || sum(acgt<=0)!=0){
write("The ACGT frequencies must be in the range 0 to 1",stderr())
- stop()
+ stop()
}
if (all.equal(sum(acgt),1)!=TRUE){
write("The ACGT frequencies do not sum to 1",stderr())
- stop()
+ stop()
}
if (tv_ti_ratio<=0){
write("The transversion and transtition ratio cannot go under 0",stderr())
- stop()
+ stop()
}
#Returns the substitution probability matrix.
if (identical(tv_ti_ratio,1) && identical(acgt,c(.25,.25,.25,.25))){
@@ -23,7 +23,7 @@ getPmat <- function(tmu,tv_ti_ratio,acgt){
E <- diag(exp(r$values))
# Q The eigen vector change of basis
- # M -> M
+ # M -> M
#
# ^ ^
# |B^-1 |B
@@ -32,8 +32,8 @@ getPmat <- function(tmu,tv_ti_ratio,acgt){
out <- solve(a=t(B),b= E %*% t(B))#Little trick to avoid numerical difficulties
rownames(out) <- c("A","C","G","T")
colnames(out) <- c("A","C","G","T")
- return(out)
- }
+ return(out)
+ }
}
jukesCantorPmat <- function(tmu){
@@ -49,22 +49,22 @@ qmatHKY85 <- function(tmu,tv_ti,acgt){
# | pi_a*tv_ti pi_c pi_g * tv_ti sum_4 |
#returns this matrix
#This could be replaced with the analytical formulas for the substitutions probabilities.
-
+
Qmat <- matrix(rep(acgt,4),ncol=4,byrow=TRUE)
diag(Qmat) <- 0
#Adjusting the transversions versus transitions
- #- * - *
- #* - * -
- #- * - *
- #* - * -
+ #- * - *
+ #* - * -
+ #- * - *
+ #* - * -
Qmat[1,2] <- tv_ti*Qmat[1,2]
Qmat[1,4] <- tv_ti*Qmat[1,4]
Qmat[2,1] <- tv_ti*Qmat[2,1]
Qmat[2,3] <- tv_ti*Qmat[2,3]
-
+
Qmat[3,2] <- tv_ti*Qmat[3,2]
Qmat[3,4] <- tv_ti*Qmat[3,4]
@@ -73,12 +73,12 @@ qmatHKY85 <- function(tmu,tv_ti,acgt){
diag(Qmat) <- -apply(Qmat,1,sum)
- Qmat <- tmu * Qmat
+ Qmat <- tmu * Qmat
return(Qmat)
}
metroDesc <- function(lpr,lol){
- # the logic in the Metropolis-Hastings step
+ # the logic in the Metropolis-Hastings step
stopifnot(!is.na(lpr))
stopifnot(!is.na(lol))
if (log(runif(1))<lpr-lol){
@@ -89,7 +89,7 @@ metroDesc <- function(lpr,lol){
}
genOverHang <- function(la){
- #Currently not used in the main code but could used to generate
+ #Currently not used in the main code but could used to generate
#overhangs like Philip
r <- runif(1)
i <- -1
@@ -143,7 +143,7 @@ seqProbVecLambda <- function(lambda,lambda_disp,m,fo_only=NA,re_only=NA){
}
}
-#The following is an MC simulation code to mimic the
+#The following is an MC simulation code to mimic the
#nick frequency part in the model from Philip
seqProbVecNuWithLengths<- cxxfunction(methods::signature(
I_la="numeric",
@@ -162,7 +162,7 @@ seqProbVecNuWithLengths<- cxxfunction(methods::signature(
double r = ((double) rand() / (RAND_MAX));
int i = -1;
double p = 0;
- double term = -500;
+ double term = -500;
while (p<r){
if (i==-1){
term = 1;
@@ -175,7 +175,7 @@ seqProbVecNuWithLengths<- cxxfunction(methods::signature(
return(i);
}
',body= '
- srand(time(0));
+ srand(time(0));
Rcpp::NumericVector la(I_la);
Rcpp::NumericVector la_disp(I_la_disp);
Rcpp::NumericVector nu(I_nu);
@@ -199,7 +199,7 @@ seqProbVecNuWithLengths<- cxxfunction(methods::signature(
double left_o_hang = genOverHang(la[0],la_disp[0]);
double right_o_hang = genOverHang(la[0],la_disp[0]);
double o_hang = left_o_hang+right_o_hang;
-//
+//
if (o_hang>=les(i)){
//Single stranded sequence
for (int j = 0; j <les(i);j++){
@@ -211,8 +211,8 @@ seqProbVecNuWithLengths<- cxxfunction(methods::signature(
for (int j = 0; j <(les[i]-right_o_hang);j++){
output(j) = output(j)+1;
}
- //The right overhang is always G>A for the double stranded but
- //Here we will make the assumption that the pattern is symmetric
+ //The right overhang is always G>A for the double stranded but
+ //Here we will make the assumption that the pattern is symmetric
//for practical reasons We can\'t do that ....
}else {
Rcpp::NumericVector sa = floor(runif(1,0,les[i]-o_hang))+left_o_hang;
@@ -258,7 +258,7 @@ pDam <- function(th,ded,des,la,nu,lin){
}
logLikFunOneBaseSlow <- function(Gen,S,Theta,deltad,deltas,laVec,nuVec,m,lin){
- #Calculates the log likelihood using R, a C++ version is available
+ #Calculates the log likelihood using R, a C++ version is available
ll <- 0
for (i in 1:length(laVec)){
#Get the damage probabilities
@@ -268,7 +268,7 @@ logLikFunOneBaseSlow <- function(Gen,S,Theta,deltad,deltas,laVec,nuVec,m,lin){
return(ll)
}
-#The same logic as in logLikFunOneBaseSlow except using a compiled code
+#The same logic as in logLikFunOneBaseSlow except using a compiled code
#to do the hard work
logLikFunOneBaseFast <- cxxfunction(methods::signature(
I_Gen="numeric",
@@ -333,7 +333,7 @@ return(ret);
logLikAll <- function(dat,Theta,deltad,deltas,laVec,nuVec,m){
- #Calculates the logLikelihood for all the bases by calling
+ #Calculates the logLikelihood for all the bases by calling
# logLikFunOneBaseFast for each base
if (deltad<0 || deltad>1 || deltas<0 || deltas>1 ){
return(-Inf)
@@ -341,8 +341,8 @@ logLikAll <- function(dat,Theta,deltad,deltas,laVec,nuVec,m){
#A,C,G and T
deb <- 0
-
- Asub <- dat[,"A.C"]+dat[,"A.G"]+dat[,"A.T"]
+
+ Asub <- dat[,"A.C"]+dat[,"A.G"]+dat[,"A.T"]
ALL <- logLikFunOneBaseFast(dat[,"A"],cbind(dat[,"A"]-Asub,dat[,"A.C"],dat[,"A.G"],dat[,"A.T"]),Theta,deltad,deltas,laVec,nuVec,m,1)
if (deb){
ALLSlow <- logLikFunOneBaseSlow(dat[,"A"],cbind(dat[,"A"]-Asub,dat[,"A.C"],dat[,"A.G"],dat[,"A.T"]),Theta,deltad,deltas,laVec,nuVec,m,1)
@@ -355,7 +355,7 @@ logLikAll <- function(dat,Theta,deltad,deltas,laVec,nuVec,m){
CLLSlow <- logLikFunOneBaseSlow(dat[,"C"],cbind(dat[,"C.A"],dat[,"C"]-Csub,dat[,"C.G"],dat[,"C.T"]),Theta,deltad,deltas,laVec,nuVec,m,2)
stopifnot(all.equal(CLL,CLLSlow))
}
-
+
Gsub <- dat[,"G.A"]+dat[,"G.C"]+dat[,"G.T"]
GLL <- logLikFunOneBaseFast(dat[,"G"],cbind(dat[,"G.A"],dat[,"G.C"],dat[,"G"]-Gsub,dat[,"G.T"]),Theta,deltad,deltas,laVec,nuVec,m,3)
@@ -363,7 +363,7 @@ logLikAll <- function(dat,Theta,deltad,deltas,laVec,nuVec,m){
GLLSlow <- logLikFunOneBaseSlow(dat[,"G"],cbind(dat[,"G.A"],dat[,"G.C"],dat[,"G"]-Gsub,dat[,"G.T"]),Theta,deltad,deltas,laVec,nuVec,m,3)
stopifnot(all.equal(CLL,CLLSlow))
}
-
+
Tsub <- dat[,"T.A"]+dat[,"T.C"]+dat[,"T.G"]
TLL <- logLikFunOneBaseFast(dat[,"T"],cbind(dat[,"T.A"],dat[,"T.C"],dat[,"T.G"],dat[,"T"]-Tsub),Theta,deltad,deltas,laVec,nuVec,m,4)
if (deb){
@@ -375,7 +375,7 @@ logLikAll <- function(dat,Theta,deltad,deltas,laVec,nuVec,m){
getParams <- function(cp){
- #Utility function nice to update the MCMC iterations matrix
+ #Utility function nice to update the MCMC iterations matrix
return(c(cp$Theta,cp$Rho,cp$DeltaD,cp$DeltaS,cp$Lambda,cp$LambdaRight,cp$LambdaDisp,cp$Nu))
}
@@ -385,7 +385,7 @@ plotTrace<- function(dat,main,k=111){
}
plotEverything <- function(mcmcOut,hi=0,pl,thin=100){
- #Plots the MCMC traceplot in the form of a running median and
+ #Plots the MCMC traceplot in the form of a running median and
#histogram of the MCMC iterations
if (sum(c(cu_pa$same_overhangs==FALSE,
cu_pa$fix_disp==FALSE,
@@ -438,7 +438,7 @@ plotEverything <- function(mcmcOut,hi=0,pl,thin=100){
}
accRat <- function(da){
- #A rough measure of the acceptance ratio
+ #A rough measure of the acceptance ratio
return(length(unique(da))/length(da))
}
@@ -467,7 +467,7 @@ adjustPropVar <- function(mcmc,propVar){
}
runGibbs <- function(cu_pa,iter){
- #Sampling over the conditional posterior distribution
+ #Sampling over the conditional posterior distribution
#for the parameters.
esti <- matrix(nrow=iter,ncol=9)
colnames(esti) <- c("Theta","Rho","DeltaD","DeltaS","Lambda","LambdaRight","LambdaDisp","Nu","LogLik")
@@ -492,7 +492,7 @@ runGibbs <- function(cu_pa,iter){
#Update the nu parameter by via MC estimation
cu_pa<-updateNu(cu_pa)
}
- esti[i,c(1:8)] <- getParams(cu_pa)
+ esti[i,c(1:8)] <- getParams(cu_pa)
esti[i,"LogLik"] <- logLikAll(cu_pa$dat,cu_pa$ThetaMat,cu_pa$DeltaD,cu_pa$DeltaS,cu_pa$laVec,cu_pa$nuVec,cu_pa$m)
if (! (i %% 1000) && cu_pa$verbose){
cat("MCMC-Iter\t",i,"\t",esti[i,"LogLik"],"\n")
@@ -504,7 +504,7 @@ runGibbs <- function(cu_pa,iter){
simPredCheck <- function(da,output){
#Simulates one draw from the posterior predictive distribution
- #and the probability of a C>T substitution because of a cytosine
+ #and the probability of a C>T substitution because of a cytosine
#demnation.
bases <- da[,c("A","C","G","T")]
#Constructing the lambda vector
@@ -541,12 +541,12 @@ simPredCheck <- function(da,output){
output$cu_pa$mLe,
output$cu_pa$forward_only,
output$cu_pa$nuSamples,
- output$cu_pa$ds_protocol)
+ output$cu_pa$ds_protocol)
nuVec <- c(nuVec,rev(1-nuVec))
}else {
nuVec <- output$cu_pa$nuVec
}
-
+
#Sample the other parameters
des <- sample(output$out[,"DeltaS"],1)
ded <- sample(output$out[,"DeltaD"],1)
@@ -562,10 +562,10 @@ simPredCheck <- function(da,output){
subs <- matrix(NA,nrow=nrow(output$cu_pa$dat),ncol=4+length(coln))
colnames(subs) <- c("A","C","G","T",coln)
#
- damProb <- rep(NA,nrow(output$cu_pa$dat))
- damProbGA <- damProb
+ damProb <- rep(NA,nrow(output$cu_pa$dat))
+ damProbGA <- damProb
for (i in 1:nrow(output$cu_pa$dat)){
- #Construct the site specific probabilities
+ #Construct the site specific probabilities
pct <- nuVec[i]*(laVec[i]*des+ded*(1-laVec[i]))
pga <- (1-nuVec[i])*(laVec[i]*des+ded*(1-laVec[i]))
pDamMat <- matrix(c(
@@ -574,11 +574,11 @@ simPredCheck <- function(da,output){
pga,0,1-pga,0,
0,0,0,1
),nrow=4,byrow=TRUE)
- ThetapDam <- pDamMat %*% pmat
+ ThetapDam <- pDamMat %*% pmat
#Calculate the probability C.T due to cytosine demanation
- damProb[i] <- ptransCC*pct/(ptransCC*pct+ptransCT)
- #Do not forget the reverse complement
- damProbGA[i] <- ptransGG*pga/(ptransGG*pga+ptransGA)
+ damProb[i] <- ptransCC*pct/(ptransCC*pct+ptransCT)
+ #Do not forget the reverse complement
+ damProbGA[i] <- ptransGG*pga/(ptransGG*pga+ptransGA)
#Then draw from a multinomial distribution
subs[i,c("A.C","A.G","A.T")] <- t(rmultinom(1,output$cu_pa$dat[i,"A"],ThetapDam[1,]))[-1]/output$cu_pa$dat[i,"A"]
subs[i,c("C.A","C.G","C.T")] <- t(rmultinom(1,output$cu_pa$dat[i,"C"],ThetapDam[2,]))[-2]/output$cu_pa$dat[i,"C"]
@@ -602,7 +602,7 @@ calcSampleStats <- function(da,X){
postPredCheck <- function(da,output,samples=10000){
#Plots the 95% posterior predictive intervals with the data as lines.
- #Returns the site specific probability of a C>T or G>A substitution
+ #Returns the site specific probability of a C>T or G>A substitution
#because of a cytosine demnation.
CTs <- matrix(NA,nrow=nrow(da),ncol=samples)
GAs <- matrix(NA,nrow=nrow(da),ncol=samples)
@@ -635,28 +635,28 @@ postPredCheck <- function(da,output,samples=10000){
}
CTsStats <- calcSampleStats(da,CTs)
GAsStats <- calcSampleStats(da,GAs)
- REsStats <- calcSampleStats(da,REs)
+ REsStats <- calcSampleStats(da,REs)
#Plotting the posterior predictive intervals
p <- ggplot()+
- geom_point(aes(x,mea,colour="C->T",aes_string="subs"),data=CTsStats)+
+ geom_point(aes(x,mea,colour="C->T"),data=CTsStats)+
geom_point(aes(x,mea,colour="G->A"),data=GAsStats)+
geom_point(aes(x,mea,colour="Others"),data=REsStats)+
- geom_errorbar(aes(x=x,y=med,ymin=loCI,ymax=hiCI,color="C->T"),data=CTsStats)+
- geom_errorbar(aes(x=x,y=med,ymin=loCI,ymax=hiCI,color="G->A"),data=GAsStats)+
- geom_errorbar(aes(x=x,y=med,ymin=loCI,ymax=hiCI,color="Others"),data=REsStats)+
+ geom_errorbar(aes(x=x,ymin=loCI,ymax=hiCI,color="C->T"),data=CTsStats)+
+ geom_errorbar(aes(x=x,ymin=loCI,ymax=hiCI,color="G->A"),data=GAsStats)+
+ geom_errorbar(aes(x=x,ymin=loCI,ymax=hiCI,color="Others"),data=REsStats)+
geom_line(aes(oneBased,C.T/C),color="red",data=data.frame(da))+
geom_line(aes(oneBased,G.A/G),color="green",data=data.frame(da))+
geom_line(aes(oneBased,((A.C+A.G+A.T)/A+(C.A+C.G)/C+(G.C+G.T)/G+(T.A+T.C+T.G)/T)/10),color="blue",data=data.frame(da))+
- ylab("Substitution rate")+
- xlab("Relative position")+
- scale_x_continuous(breaks=bres,labels=labs)+
- labs(colour = "Subs. type")+
- ggtitle("Posterior prediction intervals")
+ labs(y = "Substitution rate",
+ x = "Relative position",
+ colour = "Subs. type",
+ title = "Posterior prediction intervals")+
+ scale_x_continuous(breaks=bres,labels=labs)
if (output$cu_pa$use_bw_theme){
p <- p+theme_bw()
}
plot(p)
- #The correcting probabilities
+ #The correcting probabilities
coProbs <- cbind(da[,"Pos"],apply(C2TProbs,1,mean),apply(G2AProbs,1,mean))
colnames(coProbs) <- c("Position","C.T","G.A")
return(coProbs)
@@ -686,9 +686,8 @@ writeMCMC <- function(out,filename){
qua <- apply(out$out[,parameters],2,quantile,seq(from=0,to=1,by=.025))
acc <- apply(out$out[,parameters],2,accRat)
summStat <- rbind(mea,std,acc,qua)
- rownames(summStat)[1] <- "Mean"
- rownames(summStat)[2] <- "Std."
- rownames(summStat)[3] <- "Acceptance ratio"
+ rownames(summStat)[1] <- "Mean"
+ rownames(summStat)[2] <- "Std."
+ rownames(summStat)[3] <- "Acceptance ratio"
write.csv(summStat,paste(filename,"_summ_stat.csv",sep=""))
}
-
=====================================
mapdamage/Rscripts/stats/main.R
=====================================
@@ -1,12 +1,12 @@
#The main work flow of the package
#Load the libraries
-suppressMessages(library(inline)) #Already checked the libraries
-suppressMessages(library(ggplot2)) #thus ignoring any messages from them
-suppressMessages(library(Rcpp))
-suppressMessages(library(gam))
-suppressMessages(library(RcppGSL))
+suppressPackageStartupMessages(library(inline)) #Already checked the libraries
+suppressPackageStartupMessages(library(ggplot2)) #thus ignoring any messages from them
+suppressPackageStartupMessages(library(Rcpp))
+suppressPackageStartupMessages(library(gam))
+suppressPackageStartupMessages(library(RcppGSL))
-#Miscellaneous functions
+#Miscellaneous functions
source(paste(path_to_mapDamage_stats,"function.R",sep=""))
#Prior and proposal distributions for the parameters
@@ -18,7 +18,7 @@ source(paste(path_to_mapDamage_stats,"postConditonal.R",sep=""))
#functions for the grid search
source(paste(path_to_mapDamage_stats,"start.R",sep=""))
-#functions for loading the data
+#functions for loading the data
source(paste(path_to_mapDamage_stats,"data.R",sep=""))
@@ -39,14 +39,14 @@ if (forward_only && reverse_only){
stop()
}
-fow_dat <-readMapDamData(path_to_dat)
+fow_dat <-readMapDamData(path_to_dat)
rev_dat <-readMapDamData(path_to_dat,forward=0)
if (forward_only){
#Taking only the forward part
- dat <- fow_dat[1:sub_length,]
+ dat <- fow_dat[1:sub_length,]
}else if (reverse_only){
#Taking only the reverse part
- dat <- rev_dat[sub_length:1,]
+ dat <- rev_dat[sub_length:1,]
}else {
#Using both ends
dat <- joinFowAndRev(fow_dat,rev_dat,sub_length)
@@ -107,7 +107,7 @@ cu_pa$ThetaMat <- getPmat(cu_pa$Theta,cu_pa$Rho,cu_pa$acgt)
#######################################################
#
-# Setting the lambda vector
+# Setting the lambda vector
#
#######################################################
@@ -127,17 +127,17 @@ if (!cu_pa$same_overhangs){
#######################################################
#
-# Setting the nu vector
+# Setting the nu vector
#
#######################################################
if (!cu_pa$ds_protocol & cu_pa$nuSamples!=0){
- #Using the single stranded protocol with nu samples which makes
+ #Using the single stranded protocol with nu samples which makes
#no sense
write("Silly to use the MC estimation for nu_i if using the single stranded protocol", stderr())
stop()
}else if (cu_pa$nuSamples!=0 & cu_pa$fix_nu) {
- #Using the fixed nu parameter with nu samples which makes no sense
+ #Using the fixed nu parameter with nu samples which makes no sense
write("Silly to use the MC estimation for nu_i and use fix_nu ", stderr())
stop()
}else if (cu_pa$nuSamples!=0){
@@ -164,7 +164,7 @@ if (!cu_pa$ds_protocol & cu_pa$nuSamples!=0){
cu_pa$nuVec <- c(rep(1,nrow(dat)/2),rep(0,nrow(dat)/2))
}
}else {
- #This is for a non linear nick frequency, assumes the G>T and G>A are
+ #This is for a non linear nick frequency, assumes the G>T and G>A are
#mostly due to DNA damage patterns do not use for low damage datasets
te<-(dat[,"C.T"]/dat[,"C"])/(dat[,"G.A"]/dat[,"G"]+dat[,"C.T"]/dat[,"C"])
if (sum(is.na(te) )!=0 ){
@@ -177,7 +177,7 @@ if (!cu_pa$ds_protocol & cu_pa$nuSamples!=0){
cu_pa$nuVec <- c(rep(1,nrow(dat)/2),rep(0,nrow(dat)/2))
}
} else {
- #The substitutes seem to be okay estimate the nick frequency using GAM
+ #The substitutes seem to be okay estimate the nick frequency using GAM
if (cu_pa$forward_only || cu_pa$reverse_only){
if (cu_pa$use_raw_nick_freq){
#Use the frequency
@@ -202,14 +202,14 @@ if (!cu_pa$ds_protocol & cu_pa$nuSamples!=0){
}
#######################################################
#
-# Finding an "optimal" starting place
+# Finding an "optimal" starting place
#
#######################################################
if (grid_iter!=0){
#Start at random places and optimize the likelihood function
- cu_pa <- gridSearch(cu_pa,grid_iter)
-}
+ cu_pa <- gridSearch(cu_pa,grid_iter)
+}
#Calculate the log likelihood in the beginning
if (cu_pa$same_overhangs){
@@ -267,11 +267,11 @@ if (out_file_base!=""){
pdf(paste(out_file_base,"_MCMC_hist.pdf",sep=""))
plotEverything(mcmcOut,hi=1)
dev.off()
- #Posterior predictive plot
+ #Posterior predictive plot
pdf(paste(out_file_base,"_MCMC_post_pred.pdf",sep=""))
siteProb <- postPredCheck(dat,mcmcOut)
dev.off()
- #Write correcting probabilities
+ #Write correcting probabilities
write.csv(siteProb,paste(out_file_base,"_MCMC_correct_prob.csv",sep=""))
} else {
cat("Plotting\n")
=====================================
mapdamage/__init__.py
=====================================
@@ -1,4 +1,4 @@
-#!/usr/bin/env python
+#!/usr/bin/env python3
import mapdamage.parseoptions
import mapdamage.seq
=====================================
mapdamage/align.py
=====================================
@@ -1,5 +1,3 @@
-#!/usr/bin/env python
-
import itertools
import mapdamage
@@ -16,11 +14,11 @@ import mapdamage
#X 8 sequence mismatch
-def get_coordinates(read):
+def get_coordinates(read):
""" return external coordinates of aligned read bases """
fivep = read.aend if read.is_reverse else read.pos
threep = read.pos if read.is_reverse else read.aend
-
+
return fivep, threep
@@ -41,9 +39,9 @@ def get_around(coord, chrom, reflengths, length, ref):
def align(cigarlist, seq, ref):
- """ insert gaps according to the cigar string
- deletion: gaps to be inserted into read sequences,
- insertions: gaps to be inserted into reference sequence """
+ """ insert gaps according to the cigar string
+ deletion: gaps to be inserted into read sequences,
+ insertions: gaps to be inserted into reference sequence """
lref = list(ref)
for nbr, idx in parse_cigar(cigarlist, 1):
lref[idx:idx] = ["-"] * nbr
@@ -56,9 +54,9 @@ def align(cigarlist, seq, ref):
def align_with_qual(cigarlist, seq, qual, threshold, ref):
- """ insert gaps according to the cigar string
- deletion: gaps to be inserted into read sequences and qualities,
- insertions: gaps to be inserted into reference sequence """
+ """ insert gaps according to the cigar string
+ deletion: gaps to be inserted into read sequences and qualities,
+ insertions: gaps to be inserted into reference sequence """
lref = list(ref)
for nbr, idx in parse_cigar(cigarlist, 1):
lref[idx:idx] = ["-"] * nbr
@@ -84,7 +82,7 @@ def get_mis(read, seq, refseq, ref, length, tab, end):
std = '-' if read.is_reverse else '+'
subtable = tab[ref][end][std]
- for (i, nt_seq, nt_ref) in itertools.izip(xrange(length), seq, refseq):
+ for (i, nt_seq, nt_ref) in zip(range(length), seq, refseq):
if (nt_seq in "ACGT-") and (nt_ref in "ACGT-"):
if nt_ref != "-":
# record base composition in the reference, only A, C, G, T
@@ -106,7 +104,7 @@ def parse_cigar(cigarlist, ope):
for operation, length in cigarlist:
if operation == ope:
coordinate.append([length, tlength])
- if operation in oplist:
+ if operation in oplist:
tlength += length
return coordinate
@@ -127,8 +125,3 @@ def record_soft_clipping(read, tab, length):
end = '5p' if read.is_reverse else '3p'
update_table(end, strand, nbases)
-
-
-
-
-
=====================================
mapdamage/composition.py
=====================================
@@ -1,5 +1,3 @@
-#!/usr/bin/env python
-
import mapdamage
import itertools
import csv
@@ -10,8 +8,8 @@ def count_ref_comp(read, chrom, before, after, comp):
""" record basae composition in external genomic regions """
std = '-' if read.is_reverse else '+'
- _update_table(comp[chrom]['5p'][std], before, xrange(-len(before), 0))
- _update_table(comp[chrom]['3p'][std], after, xrange(1, len(after) + 1))
+ _update_table(comp[chrom]['5p'][std], before, range(-len(before), 0))
+ _update_table(comp[chrom]['3p'][std], after, range(1, len(after) + 1))
def count_read_comp(read, chrom, length, comp):
@@ -20,12 +18,12 @@ def count_read_comp(read, chrom, length, comp):
if read.is_reverse:
std, seq = '-', mapdamage.seq.revcomp(seq)
- _update_table(comp[chrom]['5p'][std], seq, xrange(1, length + 1))
- _update_table(comp[chrom]['3p'][std], reversed(seq), xrange(-1, - length - 1, -1))
+ _update_table(comp[chrom]['5p'][std], seq, range(1, length + 1))
+ _update_table(comp[chrom]['3p'][std], reversed(seq), range(-1, - length - 1, -1))
def _update_table(table, sequence, indices):
- for (index, nt) in itertools.izip(indices, sequence):
+ for (index, nt) in zip(indices, sequence):
if nt in table:
table[nt][index] += 1
@@ -35,7 +33,7 @@ def get_base_comp(filename,destination=False):
Gets the basecomposition of all the sequences in filename
and returns the value to destination if given.
"""
- path_to_seqtk = mapdamage.rscript.construct_path("seqtk",folder="seqtk")
+ path_to_seqtk = mapdamage.rscript.construct_path("seqtk",folder="seqtk")
bases = {"A":0,"C":0,"G":0,"T":0}
alp = ["A","C","G","T"]
try:
@@ -47,7 +45,7 @@ def get_base_comp(filename,destination=False):
bases["C"] = bases["C"] + int(tabs[3])
bases["G"] = bases["G"] + int(tabs[4])
bases["T"] = bases["T"] + int(tabs[5])
- except (OSError, ValueError), error:
+ except (OSError, ValueError) as error:
sys.stderr.write("Error: Seqtk failed: %s\n" % (error,))
sys.exit(1)
# get the base frequencies
@@ -75,7 +73,7 @@ def read_base_comp(filename):
lp = li.split()
if first_line:
header = lp
- first_line = False
+ first_line = False
else:
body = lp
bases = {}
=====================================
mapdamage/parseoptions.py
=====================================
@@ -1,5 +1,3 @@
-#!/usr/bin/env python
-
from optparse import OptionParser, OptionGroup, SUPPRESS_HELP
import os
import sys
@@ -38,7 +36,7 @@ def check_py_version():
return None
-def options():
+def options():
parser = OptionParser("%prog [options] -i BAMfile -r reference.fasta\n\nUse option -h or --help for help", version=__version__, \
epilog="report bugs to aginolhac at snm.ku.dk, MSchubert at snm.ku.dk or jonsson.hakon at gmail.com")
@@ -153,7 +151,7 @@ def options():
if not check_py_version():
return None
- # if the user wants to check the R packages then do that before the option parsing
+ # if the user wants to check the R packages then do that before the option parsing
if options.check_R_packages:
if check_R_lib():
sys.exit(1)
@@ -224,7 +222,7 @@ def options():
# check destination for rescaled bam
if not options.rescale_out and (options.rescale or options.rescale_only):
- # if there are mulitiple bam files to rescale then pick first one as
+ # if there are mulitiple bam files to rescale then pick first one as
# the name of the rescaled file
if isinstance(options.filename,list):
basename = os.path.basename(options.filename[0])
@@ -235,12 +233,12 @@ def options():
if os.path.isdir(options.folder):
if not options.quiet and not options.plot_only:
- print("Warning, %s already exists" % options.folder)
+ print(("Warning, %s already exists" % options.folder))
if options.plot_only:
if not file_exist(options.folder+"/dnacomp.txt") or not file_exist(options.folder+"/misincorporation.txt"):
parser.error('folder %s is not a valid result folder' % options.folder)
else:
- os.makedirs(options.folder, mode = 0750)
+ os.makedirs(options.folder, mode = 0o750)
if options.plot_only or options.stats_only or options.rescale_only:
sys.stderr.write("Error, %s does not exist while plot/stats/rescale only was used\n" % options.folder)
return None
@@ -279,4 +277,3 @@ def options():
return options
-
=====================================
mapdamage/rescale.py
=====================================
@@ -31,14 +31,14 @@ def get_corr_prob(folder, rescale_length_5p, rescale_length_3p):
fi_handle = csv.DictReader(open(full_path))
corr_prob = {}
for line in fi_handle:
- if (corr_prob.has_key(line["Position"])):
+ if (line["Position"] in corr_prob):
sys.exit('This file has multiple position definitions %s, line %d: %s' % \
(folder, fi_handle.line_num, corr_prob[line["Position"]]))
else:
corr_prob[int(line["Position"])] = {'C.T':float(line["C.T"]), 'G.A':float(line["G.A"])}
# Exclude probabilities for positions outside of user-specified region
- for key in corr_prob.keys():
+ for key in list(corr_prob.keys()):
if key < -rescale_length_3p or key > rescale_length_5p:
corr_prob.pop(key)
@@ -75,13 +75,13 @@ def corr_this_base(corr_prob, nt_seq, nt_ref, pos, length,direction="both"):
back_pos = pos-length-1
# position from 3' end
- if corr_prob.has_key(pos):
+ if pos in corr_prob:
p5_corr = corr_prob[pos][subs]
# correction from 5' end
else:
p5_corr = 0
- if corr_prob.has_key(back_pos):
+ if back_pos in corr_prob:
p3_corr = corr_prob[back_pos][subs]
# correction from 3' end
else:
@@ -104,7 +104,7 @@ def corr_this_base(corr_prob, nt_seq, nt_ref, pos, length,direction="both"):
def initialize_subs():
"""Initialize a substitution table, to track the expected substitution counts"""
- per_qual = dict(zip(range(0,130),[0]*130))
+ per_qual = dict(list(zip(list(range(0,130)),[0]*130)))
subs = {"CT-before":per_qual.copy(),\
"TC-before":per_qual.copy(),\
"GA-before":per_qual.copy(),\
@@ -164,7 +164,7 @@ def qual_summary_subs(subs):
for qv in subs[i]:
if qv >= lv :
key = i+"-Q"+str(lv)
- if subs.has_key(key):
+ if key in subs:
subs[key] += subs[i][qv]
else:
subs[key] = subs[i][qv]
@@ -174,32 +174,32 @@ def print_subs(subs):
print("\tThe expected substition frequencies before and after scaling using the scaled qualities as probalities:")
if subs["C"]!=0:
# the special case of no substitutions
- print("\tCT\t"+str(subs["CT-pvals_before"]/subs["C"])+"\t\t"+str(subs["CT-pvals"]/subs["C"]))
+ print(("\tCT\t"+str(subs["CT-pvals_before"]/subs["C"])+"\t\t"+str(subs["CT-pvals"]/subs["C"])))
else:
print("\tCT\tNA\t\tNA")
if subs["T"]!=0:
- print("\tTC\t"+str(subs["TC-pvals"]/subs["T"])+"\t\t"+str(subs["TC-pvals"]/subs["T"]))
+ print(("\tTC\t"+str(subs["TC-pvals"]/subs["T"])+"\t\t"+str(subs["TC-pvals"]/subs["T"])))
else:
print("\tTC\tNA\t\tNA")
if subs["G"]!=0:
- print("\tGA\t"+str(subs["GA-pvals_before"]/subs["G"])+"\t\t"+str(subs["GA-pvals"]/subs["G"]))
+ print(("\tGA\t"+str(subs["GA-pvals_before"]/subs["G"])+"\t\t"+str(subs["GA-pvals"]/subs["G"])))
else:
print("\tGA\tNA\t\tNA")
if subs["A"]!=0:
- print("\tAG\t"+str(subs["AG-pvals"]/subs["A"])+"\t\t"+str(subs["AG-pvals"]/subs["A"]))
+ print(("\tAG\t"+str(subs["AG-pvals"]/subs["A"])+"\t\t"+str(subs["AG-pvals"]/subs["A"])))
else:
print("\tAG\tNA\t\tNA")
print("\tQuality metrics before and after scaling")
- print("\tCT-Q0 \t"+str(subs["CT-before-Q0"])+"\t\t"+str(subs["CT-after-Q0"]))
- print("\tCT-Q10 \t"+str(subs["CT-before-Q10"])+"\t\t"+str(subs["CT-after-Q10"]))
- print("\tCT-Q20 \t"+str(subs["CT-before-Q20"])+"\t\t"+str(subs["CT-after-Q20"]))
- print("\tCT-Q30 \t"+str(subs["CT-before-Q30"])+"\t\t"+str(subs["CT-after-Q30"]))
- print("\tCT-Q40 \t"+str(subs["CT-before-Q40"])+"\t\t"+str(subs["CT-after-Q40"]))
- print("\tGA-Q0 \t"+str(subs["GA-before-Q0"])+"\t\t"+str(subs["GA-after-Q0"]))
- print("\tGA-Q10 \t"+str(subs["GA-before-Q10"])+"\t\t"+str(subs["GA-after-Q10"]))
- print("\tGA-Q20 \t"+str(subs["GA-before-Q20"])+"\t\t"+str(subs["GA-after-Q20"]))
- print("\tGA-Q30 \t"+str(subs["GA-before-Q30"])+"\t\t"+str(subs["GA-after-Q30"]))
- print("\tGA-Q40 \t"+str(subs["GA-before-Q40"])+"\t\t"+str(subs["GA-after-Q40"]))
+ print(("\tCT-Q0 \t"+str(subs["CT-before-Q0"])+"\t\t"+str(subs["CT-after-Q0"])))
+ print(("\tCT-Q10 \t"+str(subs["CT-before-Q10"])+"\t\t"+str(subs["CT-after-Q10"])))
+ print(("\tCT-Q20 \t"+str(subs["CT-before-Q20"])+"\t\t"+str(subs["CT-after-Q20"])))
+ print(("\tCT-Q30 \t"+str(subs["CT-before-Q30"])+"\t\t"+str(subs["CT-after-Q30"])))
+ print(("\tCT-Q40 \t"+str(subs["CT-before-Q40"])+"\t\t"+str(subs["CT-after-Q40"])))
+ print(("\tGA-Q0 \t"+str(subs["GA-before-Q0"])+"\t\t"+str(subs["GA-after-Q0"])))
+ print(("\tGA-Q10 \t"+str(subs["GA-before-Q10"])+"\t\t"+str(subs["GA-after-Q10"])))
+ print(("\tGA-Q20 \t"+str(subs["GA-before-Q20"])+"\t\t"+str(subs["GA-after-Q20"])))
+ print(("\tGA-Q30 \t"+str(subs["GA-before-Q30"])+"\t\t"+str(subs["GA-after-Q30"])))
+ print(("\tGA-Q40 \t"+str(subs["GA-before-Q40"])+"\t\t"+str(subs["GA-after-Q40"])))
def rescale_qual_read(bam, read, ref, corr_prob,subs, debug = False,direction="both"):
@@ -237,7 +237,7 @@ def rescale_qual_read(bam, read, ref, corr_prob,subs, debug = False,direction="b
new_qual = [-100]*length_read
pos_on_read = 0
number_of_rescaled_bases = 0.0
- for (i, nt_seq, nt_ref, nt_qual) in itertools.izip(xrange(length_align), seq, refseq, qual):
+ for (i, nt_seq, nt_ref, nt_qual) in zip(range(length_align), seq, refseq, qual):
# rescale the quality according to the triplet position,
# pair of the reference and the sequence
if ((nt_seq == "T" and nt_ref =="C") or (nt_seq == "A" and nt_ref =="G")):
=====================================
mapdamage/rescale_test.py
=====================================
@@ -1,6 +1,6 @@
import unittest
import optparse
-import rescale
+from . import rescale
import pysam
import filecmp
=====================================
mapdamage/rscript.py
=====================================
@@ -1,5 +1,3 @@
-#!/usr/bin/env python
-
import os
import subprocess
from subprocess import CalledProcessError, check_call
@@ -15,8 +13,8 @@ def construct_path(name,folder="Rscripts"):
def plot(opt):
"""
- Calls the R script to make the plots, takes in the arguments
- from parameter processing
+ Calls the R script to make the plots, takes in the arguments
+ from parameter processing
"""
# comp<-args[1]
# pdfout<-args[2]
@@ -32,33 +30,33 @@ def plot(opt):
fcomp = opt.folder+"/"+"dnacomp.txt"
title = opt.folder+"/"+"Fragmisincorporation_plot.pdf"
- script = construct_path("mapDamage.R")
+ script = construct_path("mapDamage.R")
call = ["Rscript", script, fcomp, title, opt.refplot, fmut, opt.readplot, \
opt.ymax, opt.folder, opt.title, __version__]
- code = subprocess.call(map(str, call))
+ code = subprocess.call(list(map(str, call)))
if code == 0:
if not opt.quiet:
- print("pdf %s generated" % title)
+ print(("pdf %s generated" % title))
return 0
else:
print("Error: plotting with R failed")
return 1
-
-def opt_plots(opt):
+
+def opt_plots(opt):
""" optional plots for length distribution
and cumulative C>T mutations, per strand """
-
+
fmut = opt.folder+"/"+"misincorporation.txt"
flength = opt.folder+"/"+"lgdistribution.txt"
output = opt.folder+"/"+"Length_plot.pdf"
- script = construct_path("lengths.R")
+ script = construct_path("lengths.R")
call = ["Rscript", script, flength, output, fmut, opt.length, \
opt.title, __version__, bool_to_int(opt.quiet)]
- code = subprocess.call(map(str, call))
+ code = subprocess.call(list(map(str, call)))
if code == 0:
return 0
else:
@@ -77,7 +75,7 @@ def check_R_one_lib(name):
def check_R_lib():
"""
- Checks if the necessary R libraries are here, signal
+ Checks if the necessary R libraries are here, signal
otherwise
"""
libs = ["inline", "ggplot2", "gam", "Rcpp", "RcppGSL"]
@@ -90,11 +88,11 @@ def check_R_lib():
if len(missing_lib) > 1:
# Grammar Nazi has arrived
last_ele = missing_lib.pop()
- print ("Missing the following R libraries '" + "', '".join(missing_lib) \
- + "' and '" + last_ele + "'")
+ print(("Missing the following R libraries '" + "', '".join(missing_lib) \
+ + "' and '" + last_ele + "'"))
return 1
elif len(missing_lib) == 1:
- print ("Missing the following R library "+missing_lib[0])
+ print(("Missing the following R library "+missing_lib[0]))
return 1
else :
# No missing libraries
=====================================
mapdamage/seq.py
=====================================
@@ -1,14 +1,13 @@
-#!/usr/bin/env python
import sys
import string
# from Martin Kircher, to complement DNA
-TABLE = string.maketrans('TGCAMRWSYKVHDBtgcamrwsykvhdb', \
+TABLE = str.maketrans('TGCAMRWSYKVHDBtgcamrwsykvhdb', \
'ACGTKYWSRMBDHVacgtkywsrmbdhv')
LETTERS = ("A", "C", "G", "T")
MUTATIONS = ('G>A', 'C>T', 'A>G', 'T>C', 'A>C', 'A>T', 'C>G', 'C>A', 'T>G',
- 'T>A', 'G>C', 'G>T', 'A>-', 'T>-', 'C>-', 'G>-', '->A', '->T',
+ 'T>A', 'G>C', 'G>T', 'A>-', 'T>-', 'C>-', 'G>-', '->A', '->T',
'->C', '->G', 'S')
HEADER = LETTERS + ("Total", ) + MUTATIONS
@@ -33,11 +32,11 @@ def record_lg(read, coordinate, tab):
""" record global length distribution
don't record paired reads as they are normally not used for aDNA """
std = '-' if read.is_reverse else '+'
-
+
length = (max(coordinate) - min(coordinate))
if not read.is_paired:
tab[std][length] = tab[std][length] + 1
-
+
return tab
@@ -47,7 +46,7 @@ def read_fasta_index(filename):
sys.stderr.write("Error: %s\n" % msg)
sys.stderr.write(" Filename: %s\n" % filename)
sys.stderr.write(" Line: %s\n" % repr(line))
-
+
fai = {}
with open(filename, 'r') as handle:
for line in handle:
=====================================
mapdamage/tables.py
=====================================
@@ -1,5 +1,3 @@
-#!/usr/bin/env python
-
import os
import collections
@@ -7,7 +5,7 @@ import mapdamage
from mapdamage.version import __version__
-def initialize_mut(ref, length):
+def initialize_mut(ref, length):
tab = {}
for contig in ref:
tab_contig = tab[contig] = {}
@@ -16,8 +14,8 @@ def initialize_mut(ref, length):
for std in ('+','-'):
tab_std = tab_end[std] = {}
for mut in mapdamage.seq.HEADER:
- tab_std[mut] = dict.fromkeys(xrange(length), 0)
-
+ tab_std[mut] = dict.fromkeys(range(length), 0)
+
return tab
@@ -26,8 +24,8 @@ def print_mut(mut, opt, out):
def initialize_comp(ref, around, length):
- keys = {"3p" : range(-length, 0) + range(1, around + 1),
- "5p" : range(-around, 0) + range(1, length + 1)}
+ keys = {"3p" : list(range(-length, 0)) + list(range(1, around + 1)),
+ "5p" : list(range(-around, 0)) + list(range(1, length + 1))}
tab = {}
for contig in ref:
@@ -52,7 +50,7 @@ def initialize_lg():
for std in ('+','-'):
tab[std] = collections.defaultdict(int)
- return tab
+ return tab
def print_lg(tab, opt, out):
@@ -74,8 +72,8 @@ def check_table_and_warn_if_dmg_freq_is_low(folder):
total = 0.0
for filename in ("5pCtoT_freq.txt", "3pGtoA_freq.txt"):
if not os.path.exists(folder+"/"+filename):
- print("Error: Required table has not been created ('%s'), bayesian computation cannot be performed" \
- % filename)
+ print(("Error: Required table has not been created ('%s'), bayesian computation cannot be performed" \
+ % filename))
return True
with open(os.path.join(folder, filename)) as handle:
@@ -85,12 +83,12 @@ def check_table_and_warn_if_dmg_freq_is_low(folder):
total += float(freq[1])
break
else:
- print("Error: Could not find pos = 1 in table '%s', bayesian computation cannot be performed" \
- % filename)
+ print(("Error: Could not find pos = 1 in table '%s', bayesian computation cannot be performed" \
+ % filename))
return True
if total < 0.01:
- print("Warning: DNA damage levels are too low, the Bayesian computation should not be performed (%f < 0.01)\n" % total)
+ print(("Warning: DNA damage levels are too low, the Bayesian computation should not be performed (%f < 0.01)\n" % total))
return False
@@ -103,9 +101,9 @@ def _print_freq_table(table, columns, opt, out, offset = 0):
out.write("# Chr: reference from sam/bam header, End: from which termini of DNA sequences, Std: strand of reads\n")
out.write("Chr\tEnd\tStd\tPos\t%s\n" % ("\t".join(columns)))
- for (reference, ends) in sorted(table.iteritems()):
- for (end, strands) in sorted(ends.iteritems()):
- for (strand, subtable) in sorted(strands.iteritems()):
+ for (reference, ends) in sorted(table.items()):
+ for (end, strands) in sorted(ends.items()):
+ for (strand, subtable) in sorted(strands.items()):
subtable["Total"] = {}
for index in sorted(subtable[columns[0]]):
subtable["Total"][index] = sum(subtable[letter][index] for letter in mapdamage.seq.LETTERS)
=====================================
mapdamage/version.py
=====================================
@@ -1,5 +1,5 @@
-#!/usr/bin/env python
+#!/usr/bin/env python3
try:
from mapdamage._version import __version__
except ImportError:
- __version__ = "2.0.8"
+ __version__ = "2.1.0"
=====================================
setup.py
=====================================
@@ -1,4 +1,4 @@
-#!/usr/bin/env python
+#!/usr/bin/env python3
# -*- coding: utf-8 -*-
from distutils.core import setup
@@ -31,7 +31,7 @@ def setup_version():
with open(os.path.join("mapdamage", "_version.py"), "w") as handle:
handle.write("#!/usr/bin/env python\n")
handle.write("__version__ = %r\n" % (version.strip(),))
- except (subprocess.CalledProcessError, OSError), error:
+ except (subprocess.CalledProcessError, OSError) as error:
raise SystemExit("Could not determine mapDamage version: %s" % (error,))
@@ -46,7 +46,7 @@ class compileInstall(DistutilsInstall):
files = self.get_outputs()
for fi in files:
if fi.endswith("seqtk/seqtk"):
- os.chmod(fi,0755)
+ os.chmod(fi,0o755)
@@ -55,7 +55,7 @@ class compileInstall(DistutilsInstall):
setup(
cmdclass={'install': compileInstall},
name='mapdamage',
- version='2.0.8',
+ version='2.1.0',
author='Aurélien Ginolhac, Mikkel Schubert, Hákon Jónsson',
author_email='MSchubert at snm.ku.dk, jonsson.hakon at gmail.com',
packages=['mapdamage'],
View it on GitLab: https://salsa.debian.org/med-team/mapdamage/compare/88621b984ce2d3ffd54b967e6e64c515fbf18fff...43e9ade9065122eb2e25f2be0a1f8f85f4ab1f28
--
View it on GitLab: https://salsa.debian.org/med-team/mapdamage/compare/88621b984ce2d3ffd54b967e6e64c515fbf18fff...43e9ade9065122eb2e25f2be0a1f8f85f4ab1f28
You're receiving this email because of your account on salsa.debian.org.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://alioth-lists.debian.net/pipermail/debian-med-commit/attachments/20191001/0fd4081a/attachment-0001.html>
More information about the debian-med-commit
mailing list