[med-svn] [Git][med-team/nanoplot][master] 7 commits: routine-update: New upstream version
Steffen Möller (@moeller)
gitlab at salsa.debian.org
Mon May 24 00:54:21 BST 2021
Steffen Möller pushed to branch master at Debian Med / nanoplot
Commits:
2e93ada5 by Steffen Moeller at 2021-05-24T01:44:15+02:00
routine-update: New upstream version
- - - - -
54279d6c by Steffen Moeller at 2021-05-24T01:44:16+02:00
New upstream version 1.36.2
- - - - -
12fd94cf by Steffen Moeller at 2021-05-24T01:44:18+02:00
Update upstream source from tag 'upstream/1.36.2'
Update to upstream version '1.36.2'
with Debian dir d861929edaa084d0446ba73160e520bc475ab048
- - - - -
84e4e1c5 by Steffen Moeller at 2021-05-24T01:44:19+02:00
routine-update: Standards-Version: 4.5.1
- - - - -
f8537a31 by Steffen Moeller at 2021-05-24T01:44:26+02:00
Trim trailing whitespace.
Changes-By: lintian-brush
Fixes: lintian: trailing-whitespace
See-also: https://lintian.debian.org/tags/trailing-whitespace.html
- - - - -
0c1b0b1d by Steffen Moeller at 2021-05-24T01:44:29+02:00
Set upstream metadata fields: Bug-Database, Bug-Submit.
Changes-By: lintian-brush
Fixes: lintian: upstream-metadata-missing-bug-tracking
See-also: https://lintian.debian.org/tags/upstream-metadata-missing-bug-tracking.html
- - - - -
6a9fa598 by Steffen Moeller at 2021-05-24T01:51:47+02:00
New upstream version, state missing dependency
https://pypi.org/project/kaleido/ is not in debian
- - - - -
20 changed files:
- MANIFEST.in
- NanoPlot.egg-info/PKG-INFO
- PKG-INFO
- README.md
- README.rst
- debian/changelog
- debian/control
- debian/upstream/metadata
- − extra/color_options.txt
- + extra/color_options_hex.txt
- nanoplot/NanoPlot.py
- + nanoplot/report.py
- nanoplot/utils.py
- nanoplot/version.py
- nanoplotter/nanoplotter_main.py
- nanoplotter/plot.py
- nanoplotter/spatial_heatmap.py
- nanoplotter/timeplots.py
- scripts/test.sh
- setup.py
Changes:
=====================================
MANIFEST.in
=====================================
@@ -1,4 +1,4 @@
-include extra/color_options.txt
+include extra/color_options_hex.txt
include scripts/test.sh
include scripts/sequencing_speed_only.py
include README.md
=====================================
NanoPlot.egg-info/PKG-INFO
=====================================
@@ -1,6 +1,6 @@
Metadata-Version: 2.1
Name: NanoPlot
-Version: 1.30.1
+Version: 1.36.2
Summary: Plotting suite for Oxford Nanopore sequencing data and alignments
Home-page: https://github.com/wdecoster/NanoPlot
Author: Wouter De Coster
@@ -139,6 +139,7 @@ Description: # NanoPlot
## ACKNOWLEDGMENTS/CONTRIBUTORS
+ - [Ilias Bukraa](https://github.com/iliasbukraa) for tremendous improvements and maintenance of the code
- Andreas Sjödin for building and maintaining conda recipes
- Darrin Schultz [@conchoecia](https://github.com/conchoecia) for Pauvre code
- [@alexomics](https://github.com/alexomics) for fixing the indentation of the printed stats
@@ -182,7 +183,7 @@ Platform: UNKNOWN
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Science/Research
Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
-Classifier: License :: OSI Approved :: MIT License
+Classifier: License :: OSI Approved :: GNU General Public License v3 (GPLv3)
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.3
Classifier: Programming Language :: Python :: 3.4
=====================================
PKG-INFO
=====================================
@@ -1,6 +1,6 @@
Metadata-Version: 2.1
Name: NanoPlot
-Version: 1.30.1
+Version: 1.36.2
Summary: Plotting suite for Oxford Nanopore sequencing data and alignments
Home-page: https://github.com/wdecoster/NanoPlot
Author: Wouter De Coster
@@ -139,6 +139,7 @@ Description: # NanoPlot
## ACKNOWLEDGMENTS/CONTRIBUTORS
+ - [Ilias Bukraa](https://github.com/iliasbukraa) for tremendous improvements and maintenance of the code
- Andreas Sjödin for building and maintaining conda recipes
- Darrin Schultz [@conchoecia](https://github.com/conchoecia) for Pauvre code
- [@alexomics](https://github.com/alexomics) for fixing the indentation of the printed stats
@@ -182,7 +183,7 @@ Platform: UNKNOWN
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Science/Research
Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
-Classifier: License :: OSI Approved :: MIT License
+Classifier: License :: OSI Approved :: GNU General Public License v3 (GPLv3)
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.3
Classifier: Programming Language :: Python :: 3.4
=====================================
README.md
=====================================
@@ -131,6 +131,7 @@ This script now also provides read length vs mean quality plots in the '[pauvre]
## ACKNOWLEDGMENTS/CONTRIBUTORS
+- [Ilias Bukraa](https://github.com/iliasbukraa) for tremendous improvements and maintenance of the code
- Andreas Sjödin for building and maintaining conda recipes
- Darrin Schultz [@conchoecia](https://github.com/conchoecia) for Pauvre code
- [@alexomics](https://github.com/alexomics) for fixing the indentation of the printed stats
=====================================
README.rst
=====================================
@@ -159,6 +159,8 @@ This script now also provides read length vs mean quality plots in the
ACKNOWLEDGMENTS/CONTRIBUTORS
----------------------------
+- `Ilias Bukraa <https://github.com/iliasbukraa>`__ for tremendous
+ improvements and maintenance of the code
- Andreas Sjödin for building and maintaining conda recipes
- Darrin Schultz [@conchoecia](https://github.com/conchoecia) for
Pauvre code
=====================================
debian/changelog
=====================================
@@ -1,5 +1,13 @@
-nanoplot (1.30.1-1) UNRELEASED; urgency=medium
+nanoplot (1.36.2-1) UNRELEASED; urgency=medium
+ [ Andreas Tille ]
* Initial release (Closes: #964345)
+ [ Steffen Moeller, all via routine-update]
+ * Standards-Version: 4.5.1
+ * Trim trailing whitespace.
+ * Set upstream metadata fields: Bug-Database, Bug-Submit.
+
+ FIXME: needs kaleido (https://pypi.org/project/kaleido/) for testing
+
-- Andreas Tille <tille at debian.org> Wed, 22 Apr 2020 16:40:23 +0200
=====================================
debian/control
=====================================
@@ -12,7 +12,7 @@ Build-Depends: debhelper-compat (= 13),
python3-pauvre <!nocheck>,
python3-plotly <!nocheck>,
python3-seaborn <!nocheck>,
-Standards-Version: 4.5.0
+Standards-Version: 4.5.1
Vcs-Browser: https://salsa.debian.org/med-team/nanoplot
Vcs-Git: https://salsa.debian.org/med-team/nanoplot.git
Homepage: https://github.com/wdecoster/NanoPlot
@@ -49,4 +49,3 @@ Description: plotting scripts for long read sequencing data
MinKnow basecalling (optionally compressed)
* fasta files (optionally compressed)
* multiple files of the same type can be offered simultaneously
-
=====================================
debian/upstream/metadata
=====================================
@@ -1,3 +1,5 @@
+Bug-Database: https://github.com/wdecoster/NanoPlot/issues
+Bug-Submit: https://github.com/wdecoster/NanoPlot/issues/new
Reference:
Author: >
Wouter De Coster and Svenn D'Hert and Darrin T Schultz and Marc Cruts
@@ -16,3 +18,12 @@ Reference:
Registry:
- Name: conda:bioconda
Entry: nanoplot
+ - Name: bio.tools
+ Entry: NA
+ Checked: 2021-05-24
+ - Name: SciCrunch
+ Entry: NA
+ Checked: 2021-05-24
+ - Name: guix
+ Entry: NA
+ Checked: 2021-05-24
=====================================
extra/color_options.txt deleted
=====================================
@@ -1,148 +0,0 @@
-aliceblue
-antiquewhite
-aqua
-aquamarine
-azure
-beige
-bisque
-black
-blanchedalmond
-blue
-blueviolet
-brown
-burlywood
-cadetblue
-chartreuse
-chocolate
-coral
-cornflowerblue
-cornsilk
-crimson
-cyan
-darkblue
-darkcyan
-darkgoldenrod
-darkgray
-darkgreen
-darkgrey
-darkkhaki
-darkmagenta
-darkolivegreen
-darkorange
-darkorchid
-darkred
-darksalmon
-darkseagreen
-darkslateblue
-darkslategray
-darkslategrey
-darkturquoise
-darkviolet
-deeppink
-deepskyblue
-dimgray
-dimgrey
-dodgerblue
-firebrick
-floralwhite
-forestgreen
-fuchsia
-gainsboro
-ghostwhite
-gold
-goldenrod
-gray
-green
-greenyellow
-grey
-honeydew
-hotpink
-indianred
-indigo
-ivory
-khaki
-lavender
-lavenderblush
-lawngreen
-lemonchiffon
-lightblue
-lightcoral
-lightcyan
-lightgoldenrodyellow
-lightgray
-lightgreen
-lightgrey
-lightpink
-lightsalmon
-lightseagreen
-lightskyblue
-lightslategray
-lightslategrey
-lightsteelblue
-lightyellow
-lime
-limegreen
-linen
-magenta
-maroon
-mediumaquamarine
-mediumblue
-mediumorchid
-mediumpurple
-mediumseagreen
-mediumslateblue
-mediumspringgreen
-mediumturquoise
-mediumvioletred
-midnightblue
-mintcream
-mistyrose
-moccasin
-navajowhite
-navy
-oldlace
-olive
-olivedrab
-orange
-orangered
-orchid
-palegoldenrod
-palegreen
-paleturquoise
-palevioletred
-papayawhip
-peachpuff
-peru
-pink
-plum
-powderblue
-purple
-rebeccapurple
-red
-rosybrown
-royalblue
-saddlebrown
-salmon
-sandybrown
-seagreen
-seashell
-sienna
-silver
-skyblue
-slateblue
-slategray
-slategrey
-snow
-springgreen
-steelblue
-tan
-teal
-thistle
-tomato
-turquoise
-violet
-wheat
-white
-whitesmoke
-yellow
-yellowgreen
=====================================
extra/color_options_hex.txt
=====================================
@@ -0,0 +1,148 @@
+aliceblue,#F0F8FF
+antiquewhite,#FAEBD7
+aqua,#00FFFF
+aquamarine,#7FFFD4
+azure,#F0FFFF
+beige,#F5F5DC
+bisque,#FFE4C4
+black,#000000
+blanchedalmond,#FFEBCD
+blue,#0000FF
+blueviolet,#8A2BE2
+brown,#A52A2A
+burlywood,#DEB887
+cadetblue,#5F9EA0
+chartreuse,#7FFF00
+chocolate,#D2691E
+coral,#FF7F50
+cornflowerblue,#6495ED
+cornsilk,#FFF8DC
+crimson,#DC143C
+cyan,#00FFFF
+darkblue,#00008B
+darkcyan,#008B8B
+darkgoldenrod,#B8860B
+darkgray,#A9A9A9
+darkgreen,#006400
+darkgrey,#A9A9A9
+darkkhaki,#BDB76B
+darkmagenta,#8B008B
+darkolivegreen,#556B2F
+darkorange,#FF8C00
+darkorchid,#9932CC
+darkred,#8B0000
+darksalmon,#E9967A
+darkseagreen,#8FBC8F
+darkslateblue,#483D8B
+darkslategray,#2F4F4F
+darkslategrey,#2F4F4F
+darkturquoise,#00CED1
+darkviolet,#9400D3
+deeppink,#FF1493
+deepskyblue,#00BFFF
+dimgray,#696969
+dimgrey,#696969
+dodgerblue,#1E90FF
+firebrick,#B22222
+floralwhite,#FFFAF0
+forestgreen,#228B22
+fuchsia,#FF00FF
+gainsboro,#DCDCDC
+ghostwhite,#F8F8FF
+gold,#FFD700
+goldenrod,#DAA520
+gray,#808080
+green,#008000
+greenyellow,#ADFF2F
+grey,#808080
+honeydew,#F0FFF0
+hotpink,#FF69B4
+indianred,#CD5C5C
+indigo,#4B0082
+ivory,#FFFFF0
+khaki,#F0E68C
+lavender,#E6E6FA
+lavenderblush,#FFF0F5
+lawngreen,#7CFC00
+lemonchiffon,#FFFACD
+lightblue,#ADD8E6
+lightcoral,#F08080
+lightcyan,#E0FFFF
+lightgoldenrodyellow,#FAFAD2
+lightgray,#D3D3D3
+lightgreen,#90EE90
+lightgrey,#D3D3D3
+lightpink,#FFB6C1
+lightsalmon,#FFA07A
+lightseagreen,#20B2AA
+lightskyblue,#87CEFA
+lightslategray,#778899
+lightslategrey,#778899
+lightsteelblue,#B0C4DE
+lightyellow,#FFFFE0
+lime,#00FF00
+limegreen,#32CD32
+linen,#FAF0E6
+magenta,#FF00FF
+maroon,#800000
+mediumaquamarine,#66CDAA
+mediumblue,#0000CD
+mediumorchid,#BA55D3
+mediumpurple,#9370DB
+mediumseagreen,#3CB371
+mediumslateblue,#7B68EE
+mediumspringgreen,#00FA9A
+mediumturquoise,#48D1CC
+mediumvioletred,#C71585
+midnightblue,#191970
+mintcream,#F5FFFA
+mistyrose,#FFE4E1
+moccasin,#FFE4B5
+navajowhite,#FFDEAD
+navy,#000080
+oldlace,#FDF5E6
+olive,#808000
+olivedrab,#6B8E23
+orange,#FFA500
+orangered,#FF4500
+orchid,#DA70D6
+palegoldenrod,#EEE8AA
+palegreen,#98FB98
+paleturquoise,#AFEEEE
+palevioletred,#DB7093
+papayawhip,#FFEFD5
+peachpuff,#FFDAB9
+peru,#CD853F
+pink,#FFC0CB
+plum,#DDA0DD
+powderblue,#B0E0E6
+purple,#800080
+rebeccapurple,#663399
+red,#FF0000
+rosybrown,#BC8F8F
+royalblue,#4169E1
+saddlebrown,#8B4513
+salmon,#FA8072
+sandybrown,#F4A460
+seagreen,#2E8B57
+seashell,#FFF5EE
+sienna,#A0522D
+silver,#C0C0C0
+skyblue,#87CEEB
+slateblue,#6A5ACD
+slategray,#708090
+slategrey,#708090
+snow,#FFFAFA
+springgreen,#00FF7F
+steelblue,#4682B4
+tan,#D2B48C
+teal,#008080
+thistle,#D8BFD8
+tomato,#FF6347
+turquoise,#40E0D0
+violet,#EE82EE
+wheat,#F5DEB3
+white,#FFFFFF
+whitesmoke,#F5F5F5
+yellow,#FFFF00
+yellowgreen,#9ACD32
=====================================
nanoplot/NanoPlot.py
=====================================
@@ -11,13 +11,13 @@ Input data can be given as one or multiple of:
-a summary file generated by albacore
'''
-
from os import path
import logging
import nanomath
import numpy as np
from scipy import stats
import nanoplot.utils as utils
+import nanoplot.report as report
from nanoget import get_input
from nanoplot.filteroptions import filter_and_transform_data
from nanoplot.version import __version__
@@ -37,21 +37,24 @@ def main():
try:
utils.make_output_dir(args.outdir)
utils.init_logs(args)
- args.format = nanoplotter.check_valid_format(args.format)
- sources = {
- "fastq": args.fastq,
- "bam": args.bam,
- "cram": args.cram,
- "fastq_rich": args.fastq_rich,
- "fastq_minimal": args.fastq_minimal,
- "summary": args.summary,
- "fasta": args.fasta,
- "ubam": args.ubam,
- }
-
+ # args.format = nanoplotter.check_valid_format(args.format)
if args.pickle:
datadf = pickle.load(open(args.pickle, 'rb'))
+ elif args.feather:
+ from nanoget import combine_dfs
+ from pandas import read_feather
+ datadf = combine_dfs([read_feather(p) for p in args.feather], method="simple")
else:
+ sources = {
+ "fastq": args.fastq,
+ "bam": args.bam,
+ "cram": args.cram,
+ "fastq_rich": args.fastq_rich,
+ "fastq_minimal": args.fastq_minimal,
+ "summary": args.summary,
+ "fasta": args.fasta,
+ "ubam": args.ubam,
+ }
datadf = get_input(
source=[n for n, s in sources.items() if s][0],
files=[f for f in sources.values() if f][0],
@@ -60,7 +63,7 @@ def main():
combine="simple",
barcoded=args.barcoded,
huge=args.huge,
- keep_supp=not(args.no_supplementary))
+ keep_supp=not (args.no_supplementary))
if args.store:
pickle.dump(
obj=datadf,
@@ -71,28 +74,32 @@ def main():
index=False,
compression="gzip")
- settings["statsfile"] = [make_stats(datadf, settings, suffix="")]
+ settings["statsfile"] = [make_stats(datadf, settings, suffix="", tsv_stats=args.tsv_stats)]
datadf, settings = filter_and_transform_data(datadf, settings)
if settings["filtered"]: # Bool set when filter was applied in filter_and_transform_data()
settings["statsfile"].append(
- make_stats(datadf[datadf["length_filter"]], settings, suffix="_post_filtering")
+ make_stats(datadf[datadf["length_filter"]], settings,
+ suffix="_post_filtering", tsv_stats=args.tsv_stats)
)
if args.barcoded:
+ main_path = settings["path"]
barcodes = list(datadf["barcode"].unique())
plots = []
for barc in barcodes:
logging.info("Processing {}".format(barc))
- settings["path"] = path.join(args.outdir, args.prefix + barc + "_")
dfbarc = datadf[datadf["barcode"] == barc]
if len(dfbarc) > 5:
settings["title"] = barc
+ settings["path"] = path.join(args.outdir, args.prefix + barc + "_")
+ plots.append(report.BarcodeTitle(barc))
plots.extend(
make_plots(dfbarc, settings)
)
else:
sys.stderr.write("Found barcode {} less than 5x, ignoring...\n".format(barc))
logging.info("Found barcode {} less than 5 times, ignoring".format(barc))
+ settings["path"] = main_path
else:
plots = make_plots(datadf, settings)
make_report(plots, settings)
@@ -107,20 +114,22 @@ def main():
raise
-def make_stats(datadf, settings, suffix):
+def make_stats(datadf, settings, suffix, tsv_stats=True):
statsfile = settings["path"] + "NanoStats" + suffix + ".txt"
- nanomath.write_stats(
+ stats_df = nanomath.write_stats(
datadfs=[datadf],
- outputfile=statsfile)
+ outputfile=statsfile,
+ as_tsv=tsv_stats)
logging.info("Calculated statistics")
if settings["barcoded"]:
barcodes = list(datadf["barcode"].unique())
statsfile = settings["path"] + "NanoStats_barcoded.txt"
- nanomath.write_stats(
+ stats_df = nanomath.write_stats(
datadfs=[datadf[datadf["barcode"] == b] for b in barcodes],
outputfile=statsfile,
- names=barcodes)
- return statsfile
+ names=barcodes,
+ as_tsv=tsv_stats)
+ return stats_df if tsv_stats else statsfile
def make_plots(datadf, settings):
@@ -128,16 +137,27 @@ def make_plots(datadf, settings):
Call plotting functions from nanoplotter
settings["lengths_pointer"] is a column in the DataFrame specifying which lengths to use
'''
- plot_settings = dict(font_scale=settings["font_scale"])
- nanoplotter.plot_settings(plot_settings, dpi=settings["dpi"])
color = nanoplotter.check_valid_color(settings["color"])
colormap = nanoplotter.check_valid_colormap(settings["colormap"])
+
plotdict = {type: settings["plots"].count(type) for type in ["kde", "hex", "dot", 'pauvre']}
+ if "hex" in settings["plots"]:
+ print(
+ "WARNING: hex as part of --plots has been deprecated and will be ignored. To get the hex output, rerun with --legacy hex.")
+
+ if settings["legacy"] is None:
+ plotdict_legacy = {}
+ else:
+ plotdict_legacy = {plot: settings["legacy"].count(plot) for plot in ["kde", "hex", "dot"]}
+
plots = []
+
+ subdf = utils.subsample_datasets(datadf)
if settings["N50"]:
n50 = nanomath.get_N50(np.sort(datadf["lengths"]))
else:
n50 = None
+
plots.extend(
nanoplotter.length_plots(
array=datadf[datadf["length_filter"]]["lengths"].astype('uint64'),
@@ -145,7 +165,6 @@ def make_plots(datadf, settings):
path=settings["path"],
n50=n50,
color=color,
- figformat=settings["format"],
title=settings["title"])
)
logging.info("Created length plots")
@@ -154,27 +173,27 @@ def make_plots(datadf, settings):
nanoplotter.scatter(
x=datadf[datadf["length_filter"]][settings["lengths_pointer"].replace('log_', '')],
y=datadf[datadf["length_filter"]]["quals"],
+ legacy=plotdict_legacy,
names=['Read lengths', 'Average read quality'],
path=settings["path"] + "LengthvsQualityScatterPlot",
color=color,
- figformat=settings["format"],
+ colormap=colormap,
plots=plotdict,
- title=settings["title"],
- plot_settings=plot_settings)
+ title=settings["title"])
)
if settings["logBool"]:
plots.extend(
nanoplotter.scatter(
x=datadf[datadf["length_filter"]][settings["lengths_pointer"]],
y=datadf[datadf["length_filter"]]["quals"],
+ legacy=plotdict_legacy,
names=['Read lengths', 'Average read quality'],
path=settings["path"] + "LengthvsQualityScatterPlot",
color=color,
- figformat=settings["format"],
+ colormap=colormap,
plots=plotdict,
log=True,
- title=settings["title"],
- plot_settings=plot_settings)
+ title=settings["title"])
)
logging.info("Created LengthvsQual plot")
if "channelIDs" in datadf:
@@ -183,30 +202,27 @@ def make_plots(datadf, settings):
array=datadf["channelIDs"],
title=settings["title"],
path=settings["path"] + "ActivityMap_ReadsPerChannel",
- color=colormap,
- figformat=settings["format"])
+ colormap=colormap)
)
logging.info("Created spatialheatmap for succesfull basecalls.")
if "start_time" in datadf:
plots.extend(
nanoplotter.time_plots(
df=datadf,
+ subsampled_df=subdf,
path=settings["path"],
color=color,
- figformat=settings["format"],
- title=settings["title"],
- plot_settings=plot_settings)
+ title=settings["title"])
)
if settings["logBool"]:
plots.extend(
nanoplotter.time_plots(
df=datadf,
+ subsampled_df=subdf,
path=settings["path"],
color=color,
- figformat=settings["format"],
title=settings["title"],
- log_length=True,
- plot_settings=plot_settings)
+ log_length=True)
)
logging.info("Created timeplots.")
if "aligned_lengths" in datadf and "lengths" in datadf:
@@ -214,13 +230,13 @@ def make_plots(datadf, settings):
nanoplotter.scatter(
x=datadf[datadf["length_filter"]]["aligned_lengths"],
y=datadf[datadf["length_filter"]]["lengths"],
+ legacy=plotdict_legacy,
names=["Aligned read lengths", "Sequenced read length"],
path=settings["path"] + "AlignedReadlengthvsSequencedReadLength",
- figformat=settings["format"],
plots=plotdict,
color=color,
- title=settings["title"],
- plot_settings=plot_settings)
+ colormap=colormap,
+ title=settings["title"])
)
logging.info("Created AlignedLength vs Length plot.")
if "mapQ" in datadf and "quals" in datadf:
@@ -228,40 +244,40 @@ def make_plots(datadf, settings):
nanoplotter.scatter(
x=datadf["mapQ"],
y=datadf["quals"],
+ legacy=plotdict_legacy,
names=["Read mapping quality", "Average basecall quality"],
path=settings["path"] + "MappingQualityvsAverageBaseQuality",
color=color,
- figformat=settings["format"],
+ colormap=colormap,
plots=plotdict,
- title=settings["title"],
- plot_settings=plot_settings)
+ title=settings["title"])
)
logging.info("Created MapQvsBaseQ plot.")
plots.extend(
nanoplotter.scatter(
x=datadf[datadf["length_filter"]][settings["lengths_pointer"].replace('log_', '')],
y=datadf[datadf["length_filter"]]["mapQ"],
+ legacy=plotdict_legacy,
names=["Read length", "Read mapping quality"],
path=settings["path"] + "MappingQualityvsReadLength",
color=color,
- figformat=settings["format"],
+ colormap=colormap,
plots=plotdict,
- title=settings["title"],
- plot_settings=plot_settings)
+ title=settings["title"])
)
if settings["logBool"]:
plots.extend(
nanoplotter.scatter(
x=datadf[datadf["length_filter"]][settings["lengths_pointer"]],
y=datadf[datadf["length_filter"]]["mapQ"],
+ legacy=plotdict_legacy,
names=["Read length", "Read mapping quality"],
path=settings["path"] + "MappingQualityvsReadLength",
color=color,
- figformat=settings["format"],
+ colormap=colormap,
plots=plotdict,
log=True,
- title=settings["title"],
- plot_settings=plot_settings)
+ title=settings["title"])
)
logging.info("Created Mapping quality vs read length plot.")
if "percentIdentity" in datadf:
@@ -271,52 +287,52 @@ def make_plots(datadf, settings):
nanoplotter.scatter(
x=datadf["percentIdentity"],
y=datadf["aligned_quals"],
+ legacy=plotdict_legacy,
names=["Percent identity", "Average Base Quality"],
path=settings["path"] + "PercentIdentityvsAverageBaseQuality",
color=color,
- figformat=settings["format"],
+ colormap=colormap,
plots=plotdict,
stat=stats.pearsonr if not settings["hide_stats"] else None,
minvalx=minPID,
- title=settings["title"],
- plot_settings=plot_settings)
+ title=settings["title"])
)
logging.info("Created Percent ID vs Base quality plot.")
plots.extend(
nanoplotter.scatter(
x=datadf[datadf["length_filter"]][settings["lengths_pointer"].replace('log_', '')],
y=datadf[datadf["length_filter"]]["percentIdentity"],
+ legacy=plotdict_legacy,
names=["Aligned read length", "Percent identity"],
path=settings["path"] + "PercentIdentityvsAlignedReadLength",
color=color,
- figformat=settings["format"],
+ colormap=colormap,
plots=plotdict,
stat=stats.pearsonr if not settings["hide_stats"] else None,
minvaly=minPID,
- title=settings["title"],
- plot_settings=plot_settings)
+ title=settings["title"])
)
if settings["logBool"]:
plots.extend(
nanoplotter.scatter(
x=datadf[datadf["length_filter"]][settings["lengths_pointer"]],
y=datadf[datadf["length_filter"]]["percentIdentity"],
+ legacy=plotdict_legacy,
names=["Aligned read length", "Percent identity"],
path=settings["path"] + "PercentIdentityvsAlignedReadLength",
color=color,
- figformat=settings["format"],
+ colormap=colormap,
plots=plotdict,
stat=stats.pearsonr if not settings["hide_stats"] else None,
log=True,
minvaly=minPID,
- title=settings["title"],
- plot_settings=plot_settings)
+ title=settings["title"])
)
plots.append(nanoplotter.dynamic_histogram(array=datadf["percentIdentity"],
name="percent identity",
path=settings["path"]
- + "PercentIdentityHistogram",
+ + "PercentIdentityHistogram",
title=settings["title"],
color=color))
logging.info("Created Percent ID vs Length plot")
@@ -328,49 +344,19 @@ def make_report(plots, settings):
Creates a fat html report based on the previously created files
plots is a list of Plot objects defined by a path and title
statsfile is the file to which the stats have been saved,
- which is parsed to a table (rather dodgy)
+ which is parsed to a table (rather dodgy) or nicely if it's a pandas/tsv
'''
logging.info("Writing html report.")
- html_content = ['<body>']
-
- # Hyperlink Table of Contents panel
- html_content.append('<div class="panel panelC">')
- if settings["filtered"]:
- html_content.append(
- '<p><strong><a href="#stats0">Summary Statistics prior to filtering</a></strong></p>')
- html_content.append(
- '<p><strong><a href="#stats1">Summary Statistics after filtering</a></strong></p>')
- else:
- html_content.append(
- '<p><strong><a href="#stats0">Summary Statistics</a></strong></p>')
- html_content.append('<p><strong><a href="#plots">Plots</a></strong></p>')
- html_content.extend(['<p style="margin-left:20px"><a href="#'
- + p.title.replace(' ', '_') + '">' + p.title + '</a></p>' for p in plots])
- html_content.append('</div>')
-
- # The report itself: stats
- html_content.append('<div class="panel panelM"> <h1>NanoPlot report</h1>')
- if settings["filtered"]:
- html_content.append('<h2 id="stats0">Summary statistics prior to filtering</h2>')
- html_content.append(utils.stats2html(settings["statsfile"][0]))
- html_content.append('<h2 id="stats1">Summary statistics after filtering</h2>')
- html_content.append(utils.stats2html(settings["statsfile"][1]))
- else:
- html_content.append('<h2 id="stats0">Summary statistics</h2>')
- html_content.append(utils.stats2html(settings["statsfile"][0]))
- # The report itself: plots
- html_content.append('<h2 id="plots">Plots</h2>')
- for plot in plots:
- html_content.append('\n<h3 id="' + plot.title.replace(' ', '_') + '">'
- + plot.title + '</h3>\n' + plot.encode())
- html_content.append('\n<br>\n<br>\n<br>\n<br>')
- html_body = '\n'.join(html_content) + '</div></body></html>'
- html_str = utils.html_head + html_body
- htmlreport = settings["path"] + "NanoPlot-report.html"
- with open(htmlreport, "w") as html_file:
- html_file.write(html_str)
- return htmlreport
+ html_content = [
+ '<body class="grid">',
+ report.html_toc(plots, filtered=settings["filtered"]),
+ report.html_stats(settings),
+ report.html_plots(plots),
+ report.run_info(settings) if settings["info_in_report"] else '',
+ '</main></body></html>']
+ with open(settings["path"] + "NanoPlot-report.html", "w") as html_file:
+ html_file.write(report.html_head + '\n'.join(html_content))
if __name__ == "__main__":
=====================================
nanoplot/report.py
=====================================
@@ -0,0 +1,294 @@
+import pandas as pd
+import numpy as np
+
+
+class BarcodeTitle(object):
+ """Bit of a dummy class to add barcode titles to the report"""
+
+ def __init__(self, title):
+ self.title = title.upper()
+
+ def encode(self):
+ return ""
+
+
+def chunks(values, chunks):
+ if values:
+ chunksize = int(len(values) / chunks)
+ return ([' '.join(values[i:i + chunksize]) for i in range(0, len(values), chunksize)])
+ else:
+ return [" "] * chunks
+
+
+def html_stats(settings):
+ statsfile = settings["statsfile"]
+ filtered = settings["filtered"]
+ as_tsv = settings['tsv_stats']
+
+ stats_html = []
+ stats_html.append('<main class="grid-main"><h2>NanoPlot reports</h2>')
+ if filtered:
+ stats_html.append('<h3 id="stats0">Summary statistics prior to filtering</h3>')
+ if as_tsv:
+ stats_html.append(statsfile[0].to_html())
+ stats_html.append('<h3 id="stats1">Summary statistics after filtering</h3>')
+ stats_html.append(statsfile[1].to_html())
+ else:
+ stats_html.append(stats2html(statsfile[0]))
+ stats_html.append('<h3 id="stats1">Summary statistics after filtering</h3>')
+ stats_html.append(stats2html(statsfile[1]))
+ else:
+ stats_html.append('<h3 id="stats0">Summary statistics</h3>')
+ if as_tsv:
+ stats_html.append(statsfile[0].to_html())
+ else:
+ stats_html.append(stats2html(statsfile[0]))
+ return '\n'.join(stats_html)
+
+
+def stats2html(statsf):
+ df = pd.read_csv(statsf, sep=':', header=None, names=['feature', 'value'])
+ values = df["value"].str.strip().str.replace('\t', ' ').str.split().replace(np.nan, '')
+ num = len(values[0]) or 1
+ v = [chunks(i, num) for i in values]
+ df = pd.DataFrame(v, index=df["feature"])
+ df.columns.name = None
+ df.index.name = None
+ return df.to_html(header=False)
+
+
+def html_toc(plots, filtered=False):
+ toc = []
+ toc.append('<h1 class="hiddentitle">NanoPlot statistics report</h1>')
+ toc.append('<header class="grid-header"><nav><h2 class="hiddentitle">Menu</h2><ul>')
+ if filtered:
+ toc.append(
+ '<li><a href="#stats0">Summary Statistics prior to filtering</a></li>')
+ toc.append(
+ '<li><a href="#stats1">Summary Statistics after filtering</a></li>')
+ else:
+ toc.append('<li><a href="#stats0">Summary Statistics</a></li>')
+
+ toc.append('<li class="submenu"><a href="#plots" class="submenubtn">Plots</a>')
+ toc.append('<ul class="submenu-items">')
+ toc.extend(['<li><a href="#'
+ + p.title.replace(' ', '_') + '">' + p.title + '</a></li>' for p in plots])
+ toc.append('</ul>')
+ toc.append('</li>')
+ toc.append(
+ '<li class="issue-btn"><a href="https://github.com/wdecoster/NanoPlot/issues" target="_blank" class="reporting">Report issue on Github</a></li>')
+ toc.append('</ul></nav></header>')
+ return '\n'.join(toc)
+
+
+def html_plots(plots):
+ html_plots = []
+ html_plots.append('<h3 id="plots">Plots</h3>')
+ for plot in plots:
+ html_plots.append('<button class="collapsible">' + plot.title + '</button>')
+ html_plots.append('<section class="collapsible-content"><h4 class="hiddentitle" id="' +
+ plot.title.replace(' ', '_') + '">' + plot.title + '</h4>')
+ html_plots.append(plot.encode())
+ html_plots.append('</section>')
+
+ html_plots.append(
+ '<script>var coll = document.getElementsByClassName("collapsible");var i;for (i = 0; i < coll.length; i++) {coll[i].addEventListener("click", function() {this.classList.toggle("active");var content = this.nextElementSibling;if (content.style.display === "none") {content.style.display = "block";} else {content.style.display = "none";}});}</script>')
+
+ return '\n'.join(html_plots)
+
+
+def run_info(settings):
+ html_info = []
+ html_info.append('<h5>Run Info</h5>\n')
+ html_info.append('<h6>Data source:</h6>\n')
+ for k in ["fastq", "fasta", "fastq_rich", "fastq_minimal", "summary",
+ "bam", "ubam", "cram", "pickle", "feather"]:
+ html_info.append(f"{k}:\t{settings[k]}<br>")
+ html_info.append('<h6>Filtering parameters:</h6>\n')
+ for k in ['maxlength', 'minlength', 'drop_outliers', 'downsample', 'loglength',
+ 'percentqual', 'alength', 'minqual', 'runtime_until', 'no_supplementary']:
+ html_info.append(f"{k}:\t{settings[k]}<br>")
+ # html_info.append('</p>')
+ return '\n'.join(html_info)
+
+
+html_head = """
+<!DOCTYPE html>
+<html>
+<head>
+<meta charset="UTF-8">
+<style>
+
+body {margin:0}
+
+.grid { /* grid definition for index page */
+ display: grid;
+ grid-template-areas: 'gheader'
+ 'gmain';
+ margin: 0;
+}
+
+.grid > .grid-header { /* definition of the header on index page and its position in the grid */
+ grid-area: gheader;
+}
+
+.grid > .grid-main { /* definition of the main content on index page and its position in the grid */
+ grid-area: gmain;
+}
+
+nav {
+ text-align: center;
+}
+
+ul {
+ border-bottom: 1px solid white;
+ font-family: "Trebuchet MS", sans-serif;
+ list-style-type: none; /* remove dot symbols from list */
+ margin: 0;
+ padding: 0;
+ overflow: hidden; /* contains the overflow of the element if it goes 'out of bounds' */
+ background-color: #001f3f;
+ font-size: 1.6em;
+}
+
+ul > li > ul {
+ font-size: 1em;
+}
+
+li {
+ float: left; /* floats the list items to the left side of the page */
+}
+
+li a, .submenubutton {
+ display: inline-block; /* display the list items inline block so the items are vertically displayed */
+ color: white;
+ text-align: center;
+ padding: 14px 16px;
+ text-decoration: none; /* removes the underline that comes with the a tag */
+}
+
+li a:hover, .submenu:hover .submenubutton { /* when you hover over a submenu item the bkgrnd color is gray */
+ background-color: #39CCCC;
+}
+
+.submenu {
+ display: inline-block; /* idem to above, list items are displayed underneath each other */
+}
+
+.submenu-items { /* hides the ul */
+ display: none;
+ position: absolute;
+ background-color: #f9f9f9;
+ min-width: 160px;
+ z-index: 1;
+}
+
+.submenu-items li {
+ display: block;
+ float: none;
+ overflow: hidden;
+}
+
+.submenu-items li a { /* styling of the links in the submenu */
+ color: black;
+ padding: 12px 16px;
+ text-decoration: none;
+ display: block;
+ text-align: left;
+}
+
+.submenu-items a:hover {
+ background-color: #f1f1f1;
+}
+
+.submenu:hover .submenu-items {
+ display: block;
+ float: bottom;
+ overflow: hidden;
+}
+
+li {
+ border-right: 1px solid #bbb;
+}
+
+.issue-btn {
+ border-right: none;
+ float: right;
+}
+
+.hiddentitle { /* hides titles that are not necessary for content, but are for outline */
+ position: absolute;
+ width: 1px;
+ height: 1px;
+ overflow: hidden;
+ left: -10000px;
+}
+
+h2 { color: #111; font-family: 'Helvetica Neue', sans-serif; font-size: 60px; font-weight: bold; letter-spacing: -1px; line-height: 1; text-align: center; }
+
+h3 { color: #111; font-family: 'Open Sans', sans-serif; font-size: 25px; font-weight: 300; line-height: 32px; text-align: center; padding-bottom: 0;}
+
+h4 { color: #111; font-family: 'Helvetica Neue', sans-serif; font-size: 16px; font-weight: 150; margin: 0 0 0 0; text-align: left; padding:20px 0px 20px 0px;}
+
+table {
+ font-family: Arial, Helvetica, sans-serif;
+ border-collapse: collapse;
+ table-layout: auto;
+ border-collapse: collapse;
+ width: 100%;
+}
+
+table td, table th {
+ border: 1px solid #ddd;
+ padding: 8px;
+}
+
+table tr:nth-child(even){background-color: #f2f2f2;}
+
+table tr:hover {background-color: #ddd;}
+
+/* Style the button that is used to open and close the collapsible content */
+.collapsible {
+ background-color: #39CCCC;
+ color: white;
+ cursor: pointer;
+ padding: 18px;
+ width: 100%;
+ border: none;
+ text-align: left;
+ outline: none;
+ font-size: 15px;
+}
+
+/* Add a background color to the button if it is clicked on (add the .active class with JS), and when you move the mouse over it (hover) */
+.active, .collapsible:hover {
+ color:white;
+ background-color: #001f3f;
+}
+
+/* Style the collapsible content. Note: hidden by default */
+.collapsible-content {
+ padding: 0 18px;
+ display: block;
+ overflow: hidden;
+ background-color: #FFFFFF;
+ text-align: center;
+}
+
+.collapsible:after {
+ content: '-';
+ font-size: 20px;
+ font-weight: bold;
+ float: right;
+ color:white;
+ margin-left: 5px;
+}
+
+.active:after {
+ content: '+'; /* Unicode character for "minus" sign (-) */
+ color: white;
+}
+</style>
+<title>NanoPlot Report</title>
+</head>
+"""
=====================================
nanoplot/utils.py
=====================================
@@ -7,8 +7,6 @@ from nanoplot.version import __version__
from argparse import HelpFormatter, Action, ArgumentParser
import textwrap as _textwrap
import pandas as pd
-import numpy as np
-from matplotlib import cm
class CustomHelpFormatter(HelpFormatter):
@@ -41,7 +39,6 @@ class Action_Print_Colors(Action):
class Action_Print_Colormaps(Action):
-
def __init__(self, option_strings, dest="==SUPPRESS==", default="==SUPPRESS==", help=None):
super(Action_Print_Colormaps, self).__init__(
option_strings=option_strings,
@@ -97,6 +94,12 @@ def get_args():
help="Specify an optional prefix to be used for the output files.",
default="",
type=str)
+ general.add_argument("--tsv_stats",
+ help="Output the stats file as a properly formatted TSV.",
+ action='store_true')
+ general.add_argument("--info_in_report",
+ help="Add NanoPlot run info in the report.",
+ action='store_true')
filtering = parser.add_argument_group(
title='Options for filtering or transforming input prior to plotting')
filtering.add_argument("--maxlength",
@@ -151,18 +154,22 @@ def get_args():
visual.add_argument("-cm", "--colormap",
help="Specify a valid matplotlib colormap for the heatmap",
default="Greens")
- visual.add_argument("-f", "--format",
- help="Specify the output format of the plots.",
- default="png",
- type=str,
- choices=['eps', 'jpeg', 'jpg', 'pdf', 'pgf', 'png', 'ps',
- 'raw', 'rgba', 'svg', 'svgz', 'tif', 'tiff'])
+ # visual.add_argument("-f", "--format",
+ # help="Specify the output format of the plots.",
+ # default="png",
+ # type=str,
+ # choices=['eps', 'jpeg', 'jpg', 'pdf', 'pgf', 'png', 'ps',
+ # 'raw', 'rgba', 'svg', 'svgz', 'tif', 'tiff'])
visual.add_argument("--plots",
help="Specify which bivariate plots have to be made.",
default=['kde', 'dot'],
type=str,
nargs='*',
- choices=['kde', 'hex', 'dot', 'pauvre'])
+ choices=['kde', 'hex', 'dot'])
+ visual.add_argument("--legacy", help="Specify which bivariate plots have to be made (legacy mode).",
+ type=str,
+ nargs='*',
+ choices=['kde', 'dot', 'hex'])
visual.add_argument("--listcolors",
help="List the colors which are available for plotting and exit.",
action=Action_Print_Colors,
@@ -238,6 +245,10 @@ def get_args():
mtarget.add_argument("--pickle",
help="Data is a pickle file stored earlier.",
metavar="pickle")
+ mtarget.add_argument("--feather",
+ help="Data is in one or more feather file(s).",
+ nargs='+',
+ metavar="file")
args = parser.parse_args()
if args.listcolors:
list_colors()
@@ -259,13 +270,19 @@ def custom_formatter(prog):
def list_colors():
parent_directory = os.path.dirname(os.path.abspath(os.path.dirname(__file__)))
- colours = open(os.path.join(parent_directory, "extra/color_options.txt")).readlines()
- print("{}".format(", ".join([c.strip() for c in colours])))
+ colours = open(os.path.join(parent_directory, "extra/color_options_hex.txt"))
+ col_hex = {}
+
+ for line in colours:
+ key, value = line.split(",")
+ col_hex[key] = value.strip()
+ print("Valid colors: {}".format("\n".join([c.strip() for c in list(col_hex.keys())])))
sys.exit(0)
def list_colormaps():
- print("{}".format(", ".join([c.strip() for c in cm.cmap_d.keys()])))
+ print('Valid colormaps:\nGreys,YlGnBu,Greens,YlOrRd,Bluered,RdBu,Reds,Blues,Picnic,'
+ 'Rainbow,Portland,Jet,Hot,Blackbody,Earth,Electric,Viridis,Cividis')
sys.exit(0)
@@ -293,52 +310,26 @@ def init_logs(args, tool="NanoPlot"):
return logname
-def chunks(values, chunks):
- if values:
- chunksize = int(len(values) / chunks)
- return([' '.join(values[i:i + chunksize]) for i in range(0, len(values), chunksize)])
+def subsample_datasets(df, minimal=10000):
+ if 'dataset' in df:
+ list_df = []
+
+ for d in df["dataset"].unique():
+ dataset = df.loc[df['dataset'] == d]
+
+ if len(dataset.index) < minimal:
+ list_df.append(dataset)
+
+ else:
+ list_df.append(dataset.sample(minimal))
+
+ subsampled_df = pd.concat(list_df, ignore_index=True)
+
else:
- return [" "] * chunks
-
-
-def stats2html(statsf):
- df = pd.read_csv(statsf, sep=':', header=None, names=['feature', 'value'])
- values = df["value"].str.strip().str.replace('\t', ' ').str.split().replace(np.nan, '')
- num = len(values[0]) or 1
- v = [chunks(i, num) for i in values]
- return pd.DataFrame(v, index=df["feature"]).to_html(header=False)
-
-
-html_head = """<!DOCTYPE html>
-<html>
- <head>
- <meta charset="UTF-8">
- <style>
- table, th, td {
- text-align: left;
- padding: 2px;
- /* border: 1px solid black;
- border-collapse: collapse; */
- }
- h2 {
- line-height: 0pt;
- }
- .panel {
- display: inline-block;
- background: #ffffff;
- min-height: 100px;
- box-shadow:0px 0px 5px 5px #C9C9C9;
- -webkit-box-shadow:2px 2px 5px 5x #C9C9C9;
- -moz-box-shadow:2px 2px 5px 5px #C9C9C9;
- margin: 10px;
- padding: 10px;
- }
- .panelC {
- float: left
- }
- .panelM {
- float: left
- }
- </style>
- <title>NanoPlot Report</title>
- </head>"""
+ if len(df.index) < minimal:
+ subsampled_df = df
+
+ else:
+ subsampled_df = df.sample(minimal)
+
+ return subsampled_df
=====================================
nanoplot/version.py
=====================================
@@ -1 +1 @@
-__version__ = "1.30.1"
+__version__ = "1.36.2"
=====================================
nanoplotter/nanoplotter_main.py
=====================================
@@ -23,24 +23,19 @@ spatialHeatmap(array, title, path, color, format)
"""
-
+import plotly.graph_objs as go
+import plotly
import logging
import sys
+import os
import pandas as pd
import numpy as np
-from collections import namedtuple
from nanoplotter.plot import Plot
-import matplotlib as mpl
-mpl.use('Agg')
-import matplotlib.pyplot as plt
-from matplotlib import colors as mcolors
-import seaborn as sns
-from pauvre.marginplot import margin_plot
-from nanoplotter.timeplots import time_plots
+import plotly.express as px
+import plotly.figure_factory as ff
from nanoplotter.spatial_heatmap import spatial_heatmap
-from matplotlib import cm
-import plotly
-import plotly.graph_objs as go
+from nanoplotter.timeplots import time_plots
+import re
def check_valid_color(color):
@@ -48,9 +43,15 @@ def check_valid_color(color):
If color is invalid the default is returned.
"""
- if color in list(mcolors.CSS4_COLORS.keys()) + ["#4CB391"]:
+ colors, _ = colors_and_colormaps()
+ if color in colors:
+ logging.info("NanoPlot: Valid color {}.".format(color))
+ return colors.get(color)
+
+ elif re.search(r'^#(?:[0-9a-fA-F]{3}){1,2}$', color):
logging.info("NanoPlot: Valid color {}.".format(color))
return color
+
else:
logging.info("NanoPlot: Invalid color {}, using default.".format(color))
sys.stderr.write("Invalid color {}, using default.\n".format(color))
@@ -62,7 +63,8 @@ def check_valid_colormap(colormap):
If colormap is invalid the default is returned.
"""
- if colormap in list(cm.cmap_d.keys()):
+ _, colormaps = colors_and_colormaps()
+ if colormap in colormaps:
logging.info("NanoPlot: Valid colormap {}.".format(colormap))
return colormap
else:
@@ -71,30 +73,122 @@ def check_valid_colormap(colormap):
return "Greens"
-def check_valid_format(figformat):
- """Check if the specified figure format is valid.
-
- If format is invalid the default is returned.
- Probably installation-dependent
+def scatter(x, y, legacy, names, path, plots, color="#4CB391", colormap="Greens",
+ stat=None, log=False, minvalx=0, minvaly=0, title=None, xmax=None, ymax=None):
+ """->
+ create marginalised scatterplots and KDE plot with marginalized histograms
+ -> update from scatter_legacy function to utilise plotly package
+ - scatterplot with histogram on both axes
+ - kernel density plot with histograms on both axes
+ - hexbin not implemented yet
+ - pauvre plot temporarily not available
"""
- fig = plt.figure()
- if figformat in list(fig.canvas.get_supported_filetypes().keys()):
- logging.info("NanoPlot: valid output format {}".format(figformat))
- return figformat
- else:
- logging.info("NanoPlot: invalid output format {}".format(figformat))
- sys.stderr.write("Invalid format {}, using default.\n".format(figformat))
- return "png"
+ logging.info("NanoPlot: Creating {} vs {} plots using statistics from {} reads.".format(
+ names[0], names[1], x.size))
+ if not contains_variance([x, y], names):
+ return []
+ plots_made = []
+ idx = np.random.choice(x.index, min(10000, len(x)), replace=False)
+ maxvalx = xmax or np.amax(x[idx])
+ maxvaly = ymax or np.amax(y[idx])
-def plot_settings(plot_settings, dpi):
- sns.set(**plot_settings)
- mpl.rcParams['savefig.dpi'] = dpi
+ if plots["dot"]:
+ if log:
+ dot_plot = Plot(
+ path=path + "_loglength_dot.html",
+ title=f"{names[0]} vs {names[1]} plot using dots "
+ "after log transformation of read lengths")
+ else:
+ dot_plot = Plot(
+ path=path + "_dot.html",
+ title=f"{names[0]} vs {names[1]} plot using dots")
+
+ fig = px.scatter(x=x[idx], y=y[idx], marginal_x="histogram", marginal_y="histogram",
+ range_x=[minvalx, maxvalx], range_y=[minvaly, maxvaly])
+ fig.update_traces(marker=dict(color=color))
+ fig.update_yaxes(rangemode="tozero")
+ fig.update_xaxes(rangemode="tozero")
+ fig.update_layout(xaxis_title=names[0],
+ yaxis_title=names[1],
+ title=title or dot_plot.title,
+ title_x=0.5)
+
+ if log:
+ ticks = [10 ** i for i in range(10) if not 10 ** i > 10 * (10 ** maxvalx)]
+ fig.update_layout(
+ xaxis=dict(
+ tickmode='array',
+ tickvals=np.log10(ticks),
+ ticktext=ticks,
+ tickangle=45
+ )
+ )
+
+ dot_plot.fig = fig
+ dot_plot.html = dot_plot.fig.to_html(full_html=False, include_plotlyjs='cdn')
+ dot_plot.save()
+ plots_made.append(dot_plot)
-def scatter(x, y, names, path, plots, color="#4CB391", figformat="png",
- stat=None, log=False, minvalx=0, minvaly=0, title=None,
- plot_settings={}, xmax=None, ymax=None):
+ if plots["kde"]:
+ if log:
+ kde_plot = Plot(
+ path=path + "_loglength_kde.html",
+ title="{} vs {} plot using a kernel density estimation "
+ "after log transformation of read lengths".format(names[0], names[1]))
+ else:
+ kde_plot = Plot(
+ path=path + "_kde.html",
+ title="{} vs {} plot using a kernel density estimation".format(names[0], names[1]))
+
+ col = hex_to_rgb_scale_0_1(color)
+ fig = ff.create_2d_density(x[idx], y[idx], point_size=3,
+ hist_color=col,
+ point_color=col,
+ colorscale=colormap, width=1870)
+
+ fig.update_layout(xaxis_title=names[0],
+ yaxis_title=names[1],
+ title=title or kde_plot.title,
+ title_x=0.5,
+ xaxis=dict(tickangle=45))
+
+ if log:
+ ticks = [10 ** i for i in range(10) if not 10 ** i > 10 * (10 ** maxvalx)]
+ fig.update_layout(
+ xaxis=dict(
+ tickmode='array',
+ tickvals=np.log10(ticks),
+ ticktext=ticks,
+ tickangle=45
+ )
+ )
+
+ kde_plot.fig = fig
+ kde_plot.html = kde_plot.fig.to_html(full_html=False, include_plotlyjs='cdn')
+ kde_plot.save()
+ plots_made.append(kde_plot)
+
+ if 1 in legacy.values():
+ plots_made += scatter_legacy(x=x[idx],
+ y=y[idx],
+ names=names,
+ path=path,
+ plots=legacy,
+ color=color,
+ figformat="png",
+ stat=stat,
+ log=log,
+ minvalx=minvalx,
+ minvaly=minvaly,
+ title=title)
+ return plots_made
+
+
+def scatter_legacy(x, y, names, path, plots, color="#4CB391", figformat="png",
+ stat=None, log=False, minvalx=0, minvaly=0, title=None,
+ xmax=None, ymax=None):
"""Create bivariate plots.
Create four types of bivariate plots of x vs y, containing marginal summaries
@@ -103,15 +197,25 @@ def scatter(x, y, names, path, plots, color="#4CB391", figformat="png",
-A kernel density plot with density curves on axes
-A pauvre-style plot using code from https://github.com/conchoecia/pauvre
"""
+ try:
+ import matplotlib as mpl
+ mpl.use('Agg')
+ import seaborn as sns
+ import matplotlib.pyplot as plt
+ except ImportError:
+ sys.stderr("need additional modules when running with --legacy")
+ return []
+
logging.info("NanoPlot: Creating {} vs {} plots using statistics from {} reads.".format(
names[0], names[1], x.size))
if not contains_variance([x, y], names):
return []
- sns.set(style="ticks", **plot_settings)
+ sns.set(style="ticks")
maxvalx = xmax or np.amax(x)
maxvaly = ymax or np.amax(y)
plots_made = []
+ path = path + "_legacy"
if plots["hex"]:
if log:
@@ -135,7 +239,7 @@ def scatter(x, y, names, path, plots, color="#4CB391", figformat="png",
height=10)
plot.set_axis_labels(names[0], names[1])
if log:
- ticks = [10**i for i in range(10) if not 10**i > 10 * (10**maxvalx)]
+ ticks = [10 ** i for i in range(10) if not 10 ** i > 10 * (10 ** maxvalx)]
plot.ax_joint.set_xticks(np.log10(ticks))
plot.ax_marg_x.set_xticks(np.log10(ticks))
plot.ax_joint.set_xticklabels(ticks)
@@ -145,7 +249,7 @@ def scatter(x, y, names, path, plots, color="#4CB391", figformat="png",
hex_plot.save(format=figformat)
plots_made.append(hex_plot)
- sns.set(style="darkgrid", **plot_settings)
+ sns.set(style="darkgrid")
if plots["dot"]:
if log:
dot_plot = Plot(
@@ -169,7 +273,7 @@ def scatter(x, y, names, path, plots, color="#4CB391", figformat="png",
joint_kws={"s": 1})
plot.set_axis_labels(names[0], names[1])
if log:
- ticks = [10**i for i in range(10) if not 10**i > 10 * (10**maxvalx)]
+ ticks = [10 ** i for i in range(10) if not 10 ** i > 10 * (10 ** maxvalx)]
plot.ax_joint.set_xticks(np.log10(ticks))
plot.ax_marg_x.set_xticks(np.log10(ticks))
plot.ax_joint.set_xticklabels(ticks)
@@ -180,155 +284,179 @@ def scatter(x, y, names, path, plots, color="#4CB391", figformat="png",
plots_made.append(dot_plot)
if plots["kde"]:
- idx = np.random.choice(x.index, min(2000, len(x)), replace=False)
- if log:
- kde_plot = Plot(
- path=path + "_loglength_kde." + figformat,
- title="{} vs {} plot using a kernel density estimation "
- "after log transformation of read lengths".format(names[0], names[1]))
+ if len(x) > 2:
+ idx = np.random.choice(x.index, min(2000, len(x)), replace=False)
+ if log:
+ kde_plot = Plot(
+ path=path + "_loglength_kde." + figformat,
+ title="{} vs {} plot using a kernel density estimation "
+ "after log transformation of read lengths".format(names[0], names[1]))
+ else:
+ kde_plot = Plot(
+ path=path + "_kde." + figformat,
+ title=f"{names[0]} vs {names[1]} plot using a kernel density estimation")
+ plot = sns.jointplot(
+ x=x[idx],
+ y=y[idx],
+ kind="kde",
+ clip=((0, np.Inf), (0, np.Inf)),
+ xlim=(minvalx, maxvalx),
+ ylim=(minvaly, maxvaly),
+ space=0,
+ color=color,
+ stat_func=stat,
+ shade_lowest=False,
+ height=10)
+ plot.set_axis_labels(names[0], names[1])
+ if log:
+ ticks = [10 ** i for i in range(10) if not 10 ** i > 10 * (10 ** maxvalx)]
+ plot.ax_joint.set_xticks(np.log10(ticks))
+ plot.ax_marg_x.set_xticks(np.log10(ticks))
+ plot.ax_joint.set_xticklabels(ticks)
+ plt.subplots_adjust(top=0.90)
+ plot.fig.suptitle(title or "{} vs {} plot".format(names[0], names[1]), fontsize=25)
+ kde_plot.fig = plot
+ kde_plot.save(format=figformat)
+ plots_made.append(kde_plot)
else:
- kde_plot = Plot(
- path=path + "_kde." + figformat,
- title="{} vs {} plot using a kernel density estimation".format(names[0], names[1]))
- plot = sns.jointplot(
- x=x[idx],
- y=y[idx],
- kind="kde",
- clip=((0, np.Inf), (0, np.Inf)),
- xlim=(minvalx, maxvalx),
- ylim=(minvaly, maxvaly),
- space=0,
- color=color,
- stat_func=stat,
- shade_lowest=False,
- height=10)
- plot.set_axis_labels(names[0], names[1])
- if log:
- ticks = [10**i for i in range(10) if not 10**i > 10 * (10**maxvalx)]
- plot.ax_joint.set_xticks(np.log10(ticks))
- plot.ax_marg_x.set_xticks(np.log10(ticks))
- plot.ax_joint.set_xticklabels(ticks)
- plt.subplots_adjust(top=0.90)
- plot.fig.suptitle(title or "{} vs {} plot".format(names[0], names[1]), fontsize=25)
- kde_plot.fig = plot
- kde_plot.save(format=figformat)
- plots_made.append(kde_plot)
-
- if plots["pauvre"] and names == ['Read lengths', 'Average read quality'] and log is False:
- pauvre_plot = Plot(
- path=path + "_pauvre." + figformat,
- title="{} vs {} plot using pauvre-style @conchoecia".format(names[0], names[1]))
- sns.set(style="white", **plot_settings)
- margin_plot(df=pd.DataFrame({"length": x, "meanQual": y}),
- Y_AXES=False,
- title=title or "Length vs Quality in Pauvre-style",
- plot_maxlen=None,
- plot_minlen=0,
- plot_maxqual=None,
- plot_minqual=0,
- lengthbin=None,
- qualbin=None,
- BASENAME="whatever",
- path=pauvre_plot.path,
- fileform=[figformat],
- dpi=600,
- TRANSPARENT=True,
- QUIET=True)
- plots_made.append(pauvre_plot)
+ sys.stderr.write("Not enough observations (reads) to create a kde plot.\n")
+ logging.info("NanoPlot: Not enough observations (reads) to create a kde plot")
plt.close("all")
return plots_made
+# def pauvre_plot():
+# from pauvre.marginplot import margin_plot
+# if plots["pauvre"] and names == ['Read lengths', 'Average read quality'] and log is False:
+# pauvre_plot = Plot(
+# path=path + "_pauvre." + figformat,
+# title="{} vs {} plot using pauvre-style @conchoecia".format(names[0], names[1]))
+# sns.set(style="white")
+# margin_plot(df=pd.DataFrame({"length": x, "meanQual": y}),
+# Y_AXES=False,
+# title=title or "Length vs Quality in Pauvre-style",
+# plot_maxlen=None,
+# plot_minlen=0,
+# plot_maxqual=None,
+# plot_minqual=0,
+# lengthbin=None,
+# qualbin=None,
+# BASENAME="whatever",
+# path=pauvre_plot.path,
+# fileform=[figformat],
+# dpi=600,
+# TRANSPARENT=True,
+# QUIET=True)
+# plots_made.append(pauvre_plot)
+
+
def contains_variance(arrays, names):
"""
Make sure both arrays for bivariate ("scatter") plot have a stddev > 0
"""
for ar, name in zip(arrays, names):
if np.std(ar) == 0:
- sys.stderr.write(
- "No variation in '{}', skipping bivariate plots.\n".format(name.lower()))
- logging.info("NanoPlot: No variation in {}, skipping bivariate plot".format(name))
+ sys.stderr.write(f"No variation in '{name.lower()}', skipping bivariate plots.\n")
+ logging.info(f"NanoPlot: No variation in {name}, skipping bivariate plot")
return False
else:
return True
-def length_plots(array, name, path, title=None, n50=None, color="#4CB391", figformat="png"):
+def length_plots(array, name, path, title=None, n50=None, color="#4CB391"):
"""Create histogram of normal and log transformed read lengths."""
logging.info("NanoPlot: Creating length plots for {}.".format(name))
maxvalx = np.amax(array)
if n50:
- logging.info("NanoPlot: Using {} reads with read length N50 of {}bp and maximum of {}bp."
+ logging.info("NanoPlot: Using {} reads with read length N50 of {}bp and maximum of {}bp."
.format(array.size, n50, maxvalx))
else:
- logging.info("NanoPlot: Using {} reads maximum of {}bp.".format(array.size, maxvalx))
+ logging.info(f"NanoPlot: Using {array.size} reads maximum of {maxvalx}bp.")
plots = []
- HistType = namedtuple('HistType', 'weight name ylabel')
- for h_type in [HistType(None, "", "Number of reads"),
- HistType(array, "Weighted ", "Number of bases")]:
+
+ HistType = [{'weight': array, 'name': 'Weighted', 'ylabel': 'Number of reads'},
+ {'weight': None, 'name': 'Non weighted', 'ylabel': 'Number of reads'}]
+
+ for h_type in HistType:
histogram = Plot(
- path=path + h_type.name.replace(" ", "_") + "Histogram" +
- name.replace(' ', '') + "." + figformat,
- title=h_type.name + "Histogram of read lengths")
- ax = sns.distplot(
- a=array,
- kde=False,
- hist=True,
- bins=max(round(int(maxvalx) / 500), 10),
- color=color,
- hist_kws=dict(weights=h_type.weight,
- edgecolor=color,
- linewidth=0.2,
- alpha=0.8))
+ path=path + h_type["name"].replace(" ", "_") + "Histogram" +
+ name.replace(' ', '') + ".html",
+ title=f"{h_type['name']} histogram of read lengths")
+
+ hist, bin_edges = np.histogram(array,
+ bins=max(round(int(maxvalx) / 500), 10),
+ weights=h_type["weight"])
+
+ fig = go.Figure()
+
+ fig.add_trace(go.Bar(x=bin_edges[1:],
+ y=hist,
+ marker_color=color))
+
if n50:
- plt.axvline(n50)
- plt.annotate('N50', xy=(n50, np.amax([h.get_height() for h in ax.patches])), size=8)
- ax.set(
- xlabel='Read length',
- ylabel=h_type.ylabel,
- title=title or histogram.title)
- plt.ticklabel_format(style='plain', axis='y')
- histogram.fig = ax.get_figure()
- histogram.save(format=figformat)
- plt.close("all")
+ fig.add_vline(n50)
+ fig.add_annotation(text='N50', x=n50, y=0.95)
+ fig.update_annotations(font_size=8)
+
+ fig.update_layout(xaxis_title='Read length',
+ yaxis_title=h_type["ylabel"],
+ title=title or histogram.title,
+ title_x=0.5)
+
+ histogram.fig = fig
+ histogram.html = histogram.fig.to_html(full_html=False, include_plotlyjs='cdn')
+ histogram.save()
log_histogram = Plot(
- path=path + h_type.name.replace(" ", "_") + "LogTransformed_Histogram" +
- name.replace(' ', '') + "." + figformat,
- title=h_type.name + "Histogram of read lengths after log transformation")
- ax = sns.distplot(
- a=np.log10(array),
- kde=False,
- hist=True,
- color=color,
- hist_kws=dict(weights=h_type.weight,
- edgecolor=color,
- linewidth=0.2,
- alpha=0.8))
- ticks = [10**i for i in range(10) if not 10**i > 10 * maxvalx]
- ax.set(
- xticks=np.log10(ticks),
- xticklabels=ticks,
- xlabel='Read length',
- ylabel=h_type.ylabel,
- title=title or log_histogram.title)
+ path=path + h_type["name"].replace(" ", "_") + "LogTransformed_Histogram" +
+ name.replace(' ', '') + ".html",
+ title=h_type["name"] + " histogram of read lengths after log transformation")
+
+ if h_type["weight"] is None:
+ hist_log, bin_edges_log = np.histogram(np.log10(array),
+ bins=max(round(int(maxvalx) / 500), 10),
+ weights=h_type["weight"])
+
+ else:
+ hist_log, bin_edges_log = np.histogram(np.log10(array),
+ bins=max(round(int(maxvalx) / 500), 10),
+ weights=np.log10(h_type["weight"]))
+
+ fig = go.Figure()
+ fig.add_trace(go.Bar(x=bin_edges_log[1:],
+ y=hist_log,
+ marker_color=color))
+
+ ticks = [10 ** i for i in range(10) if not 10 ** i > 10 * maxvalx]
+
+ fig.update_layout(
+ xaxis=dict(
+ tickmode='array',
+ tickvals=np.log10(ticks),
+ ticktext=ticks),
+ xaxis_title='Read length',
+ yaxis_title=h_type["ylabel"],
+ title=title or log_histogram.title,
+ title_x=0.5)
+
if n50:
- plt.axvline(np.log10(n50))
- plt.annotate('N50', xy=(np.log10(n50), np.amax(
- [h.get_height() for h in ax.patches])), size=8)
- plt.ticklabel_format(style='plain', axis='y')
- log_histogram.fig = ax.get_figure()
- log_histogram.save(format=figformat)
- plt.close("all")
+ fig.add_vline(np.log10(n50))
+ fig.add_annotation(text='N50', x=np.log10(n50), y=0.95)
+ fig.update_annotations(font_size=8)
+
+ log_histogram.fig = fig
+ log_histogram.html = log_histogram.fig.to_html(full_html=False, include_plotlyjs='cdn')
+ log_histogram.save()
+
plots.extend([histogram, log_histogram])
- plots.append(dynamic_histogram(array=array, name=name, path=path, title=title, color=color))
+
plots.append(yield_by_minimal_length_plot(array=array,
name=name,
path=path,
title=title,
- color=color,
- figformat=figformat))
+ color=color))
+
return plots
@@ -337,12 +465,13 @@ def dynamic_histogram(array, name, path, title=None, color="#4CB391"):
Use plotly to a histogram
Return html code, but also save as png
"""
- dynhist = Plot(path=path + "Dynamic_Histogram_{}.html".format(name.replace(' ', '_')),
- title=title or "Dynamic histogram of {}".format(name))
+ dynhist = Plot(
+ path=path + f"Dynamic_Histogram_{name[0].lower() + name[1:].replace(' ', '_')}.html",
+ title="Dynamic histogram of {}".format(name[0].lower() + name[1:]))
ylabel = "Number of reads" if len(array) <= 10000 else "Downsampled number of reads"
dynhist.html, dynhist.fig = plotly_histogram(array=array.sample(min(len(array), 10000)),
color=color,
- title=dynhist.title,
+ title=title or dynhist.title,
xlabel=name,
ylabel=ylabel)
dynhist.save()
@@ -364,35 +493,57 @@ def plotly_histogram(array, color="#4CB391", title=None, xlabel=None, ylabel=Non
fig = go.Figure(
{"data": data,
"layout": go.Layout(barmode='overlay',
- title=title)})
+ title=title,
+ title_x=0.5)})
return html, fig
-def yield_by_minimal_length_plot(array, name, path,
- title=None, color="#4CB391", figformat="png"):
+def yield_by_minimal_length_plot(array, name, path, title=None, color="#4CB391"):
df = pd.DataFrame(data={"lengths": np.sort(array)[::-1]})
- df["cumyield_gb"] = df["lengths"].cumsum() / 10**9
+ df["cumyield_gb"] = df["lengths"].cumsum() / 10 ** 9
+ idx = np.random.choice(array.index, min(10000, len(array)), replace=False)
+
yield_by_length = Plot(
- path=path + "Yield_By_Length." + figformat,
+ path=path + "Yield_By_Length.html",
title="Yield by length")
- ax = sns.regplot(
- x='lengths',
- y="cumyield_gb",
- data=df,
- x_ci=None,
- fit_reg=False,
- color=color,
- scatter_kws={"s": 3})
- ax.set(
- xlabel='Read length',
- ylabel='Cumulative yield for minimal length',
- title=title or yield_by_length.title)
- yield_by_length.fig = ax.get_figure()
- yield_by_length.save(format=figformat)
- plt.close("all")
+
+ fig = px.scatter(x=df.reindex(idx)["lengths"], y=df.reindex(idx)["cumyield_gb"])
+ fig.update_traces(marker=dict(color=color))
+ fig.update_layout(xaxis_title='Read length',
+ yaxis_title='Cumulative yield for minimal length [Gb]',
+ title=title or yield_by_length.title,
+ title_x=0.5)
+
+ yield_by_length.fig = fig
+ yield_by_length.html = yield_by_length.fig.to_html(full_html=False, include_plotlyjs='cdn')
+ yield_by_length.save()
+
return yield_by_length
+def colors_and_colormaps():
+ colormaps = ('Greys,YlGnBu,Greens,YlOrRd,Bluered,RdBu,Reds,Blues,Picnic,Rainbow,Portland,Jet,'
+ 'Hot,Blackbody,Earth,Electric,Viridis,Cividis').split(',')
+ parent_directory = os.path.dirname(os.path.abspath(os.path.dirname(__file__)))
+ colours = open(os.path.join(parent_directory, "extra/color_options_hex.txt"))
+ col_hex = {}
+
+ for line in colours:
+ key, value = line.split(",")
+ col_hex[key] = value.strip()
+
+ return col_hex, colormaps
+
+
+def hex_to_rgb_scale_0_1(hexcolor):
+ color = hexcolor.lstrip("#")
+ RGB_color = tuple(int(color[x:x + 2], 16) for x in (0, 2, 4))
+
+ RGB_color = [x / 255 for x in RGB_color]
+
+ return tuple(RGB_color)
+
+
def run_tests():
import pickle
df = pickle.load(open("nanotest/sequencing_summary.pickle", "rb"))
@@ -401,17 +552,15 @@ def run_tests():
y=df["quals"],
names=['Read lengths', 'Average read quality'],
path="LengthvsQualityScatterPlot",
- plots={'dot': 1, 'kde': 1, 'hex': 1, 'pauvre': 1},
- plot_settings=dict(font_scale=1))
+ plots={'dot': 1, 'kde': 1})
time_plots(
df=df,
- path=".",
- color="#4CB391",
- plot_settings=dict(font_scale=1))
+ path="./",
+ color="#4CB391")
length_plots(
array=df["lengths"],
name="lengths",
- path=".")
+ path="./")
spatial_heatmap(
array=df["channelIDs"],
title="Number of reads generated per channel",
=====================================
nanoplotter/plot.py
=====================================
@@ -1,8 +1,10 @@
-import plotly.io as pio
+import os
from base64 import b64encode
from io import BytesIO
+from pathlib import Path
from urllib.parse import quote as urlquote
import sys
+from kaleido.scopes.plotly import PlotlyScope
import logging
@@ -40,7 +42,16 @@ class Plot(object):
if self.html:
with open(self.path, 'w') as html_out:
html_out.write(self.html)
- self.save_static()
+ try:
+ self.save_static()
+ except (AttributeError, ValueError) as e:
+ p = os.path.splitext(self.path)[0]+".png"
+ if os.path.exists(p):
+ os.remove(p)
+
+ logging.warning("No static plots are saved due to some kaleido problem:")
+ logging.warning(e)
+
elif self.fig:
self.fig.savefig(
fname=self.path,
@@ -56,9 +67,6 @@ class Plot(object):
sys.stderr.write(".show not implemented for Plot instance without fig attribute!")
def save_static(self):
- try:
- pio.write_image(self.fig, self.path.replace('html', 'png'))
- except ValueError as e:
- logging.warning("Nanoplotter: orca not found, not creating static image of html. "
- "See https://github.com/plotly/orca")
- logging.warning(e, exc_info=True)
+ scope = PlotlyScope()
+ with open(self.path.replace('html', 'png'), "wb") as f:
+ f.write(scope.transform(self.fig, format="png"))
=====================================
nanoplotter/spatial_heatmap.py
=====================================
@@ -1,9 +1,8 @@
import numpy as np
import logging
from nanoplotter.plot import Plot
-import matplotlib.pyplot as plt
-import seaborn as sns
import pandas as pd
+import plotly.graph_objects as go
class Layout(object):
@@ -64,28 +63,30 @@ def make_layout(maxval):
flowcell='PromethION')
-def spatial_heatmap(array, path, title=None, color="Greens", figformat="png"):
+def spatial_heatmap(array, path, colormap, title=None):
"""Taking channel information and creating post run channel activity plots."""
logging.info("Nanoplotter: Creating heatmap of reads per channel using {} reads."
.format(array.size))
+
activity_map = Plot(
- path=path + "." + figformat,
+ path=path + ".html",
title="Number of reads generated per channel")
+
layout = make_layout(maxval=np.amax(array))
valueCounts = pd.value_counts(pd.Series(array))
+
for entry in valueCounts.keys():
layout.template[np.where(layout.structure == entry)] = valueCounts[entry]
- plt.figure()
- ax = sns.heatmap(
- data=pd.DataFrame(layout.template, index=layout.yticks, columns=layout.xticks),
- xticklabels="auto",
- yticklabels="auto",
- square=True,
- cbar_kws={"orientation": "horizontal"},
- cmap=color,
- linewidths=0.20)
- ax.set_title(title or activity_map.title)
- activity_map.fig = ax.get_figure()
- activity_map.save(format=figformat)
- plt.close("all")
+
+ data = pd.DataFrame(layout.template, index=layout.yticks, columns=layout.xticks)
+
+ fig = go.Figure(data=go.Heatmap(z=data.values.tolist(), colorscale=colormap))
+ fig.update_layout(xaxis_title='Channel',
+ yaxis_title='Number of reads',
+ title=title or activity_map.title,
+ title_x=0.5)
+
+ activity_map.fig = fig
+ activity_map.html = activity_map.fig.to_html(full_html=False, include_plotlyjs='cdn')
+ activity_map.save()
return [activity_map]
=====================================
nanoplotter/timeplots.py
=====================================
@@ -2,14 +2,14 @@ import sys
import logging
from nanoplotter.plot import Plot
from datetime import timedelta
-import seaborn as sns
-import matplotlib.pyplot as plt
from math import ceil
import pandas as pd
import numpy as np
+import plotly.graph_objs as go
+import plotly.express as px
-def check_valid_time_and_sort(df, timescol, days=5, warning=True):
+def check_valid_time_and_sort(df, timescol="start_time", days=5, warning=True):
"""Check if the data contains reads created within the same `days` timeframe.
if not, print warning and only return part of the data which is within `days` days
@@ -34,212 +34,246 @@ def check_valid_time_and_sort(df, timescol, days=5, warning=True):
.reset_index()
-def time_plots(df, path, title=None, color="#4CB391", figformat="png",
- log_length=False, plot_settings=None):
+def time_plots(df, subsampled_df, path, title=None, color="#4CB391", log_length=False):
"""Making plots of time vs read length, time vs quality and cumulative yield."""
- dfs = check_valid_time_and_sort(df, "start_time")
- logging.info("Nanoplotter: Creating timeplots using {} reads.".format(len(dfs)))
+
+ logging.info(f"Nanoplotter: Creating timeplots using {len(df)} (full) or "
+ f"{len(subsampled_df)} (subsampled dataset) reads.")
+ dfs = check_valid_time_and_sort(df)
cumyields = cumulative_yield(dfs=dfs.set_index("start_time"),
path=path,
- figformat=figformat,
title=title,
color=color)
reads_pores_over_time = plot_over_time(dfs=dfs.set_index("start_time"),
path=path,
- figformat=figformat,
title=title,
color=color)
- violins = violin_plots_over_time(dfs=dfs,
+ violins = violin_plots_over_time(dfs=check_valid_time_and_sort(subsampled_df),
path=path,
- figformat=figformat,
title=title,
log_length=log_length,
- plot_settings=plot_settings)
+ color=color)
return cumyields + reads_pores_over_time + violins
-def violin_plots_over_time(dfs, path, figformat, title,
- log_length=False, plot_settings=None):
+def violin_plots_over_time(dfs, path, title, log_length=False, color="#4CB391"):
+
dfs['timebin'] = add_time_bins(dfs)
plots = []
+
+ dfs.sort_values("timebin")
+
plots.append(length_over_time(dfs=dfs,
path=path,
- figformat=figformat,
title=title,
log_length=log_length,
- plot_settings=plot_settings))
+ color=color))
if "quals" in dfs:
plots.append(quality_over_time(dfs=dfs,
path=path,
- figformat=figformat,
title=title,
- plot_settings=plot_settings))
+ color=color))
if "duration" in dfs:
plots.append(sequencing_speed_over_time(dfs=dfs,
path=path,
- figformat=figformat,
title=title,
- plot_settings=plot_settings))
+ color=color))
return plots
-def length_over_time(dfs, path, figformat, title, log_length=False, plot_settings={}):
+def length_over_time(dfs, path, title, log_length=False, color="#4CB391"):
if log_length:
- time_length = Plot(path=path + "TimeLogLengthViolinPlot." + figformat,
+ time_length = Plot(path=path + "TimeLogLengthViolinPlot.html",
title="Violin plot of log read lengths over time")
else:
- time_length = Plot(path=path + "TimeLengthViolinPlot." + figformat,
+ time_length = Plot(path=path + "TimeLengthViolinPlot.html",
title="Violin plot of read lengths over time")
- sns.set(style="white", **plot_settings)
- if log_length:
- length_column = "log_lengths"
- else:
- length_column = "lengths"
+
+ length_column = "log_lengths" if log_length else "lengths"
if "length_filter" in dfs: # produced by NanoPlot filtering of too long reads
temp_dfs = dfs[dfs["length_filter"]]
else:
temp_dfs = dfs
- ax = sns.violinplot(x="timebin",
- y=length_column,
- data=temp_dfs,
- inner=None,
- cut=0,
- linewidth=0)
- ax.set(xlabel='Interval (hours)',
- ylabel="Read length",
- title=title or time_length.title)
+ fig = go.Figure()
+
+ fig.add_trace(go.Violin(y=temp_dfs[length_column],
+ x=temp_dfs["timebin"],
+ points=False, spanmode="hard",
+ line_color='black', line_width=1.5,
+ fillcolor=color, opacity=0.8))
+ fig.update_layout(xaxis_title='Interval (hours)',
+ yaxis_title='Read length',
+ title=title or time_length.title,
+ title_x=0.5)
+
if log_length:
ticks = [10**i for i in range(10) if not 10**i > 10 * np.amax(dfs["lengths"])]
- ax.set(yticks=np.log10(ticks),
- yticklabels=ticks)
- plt.xticks(rotation=45, ha='center', fontsize=8)
- time_length.fig = ax.get_figure()
- time_length.save(format=figformat)
- plt.close("all")
+ fig.update_layout(
+ yaxis=dict(
+ tickmode='array',
+ tickvals=np.log10(ticks),
+ ticktext=ticks
+ )
+ )
+
+ fig.update_yaxes(tickangle=45)
+
+ time_length.fig = fig
+ time_length.html = time_length.fig.to_html(full_html=False, include_plotlyjs='cdn')
+ time_length.save()
+
return time_length
-def quality_over_time(dfs, path, figformat, title, plot_settings={}):
- time_qual = Plot(path=path + "TimeQualityViolinPlot." + figformat,
+def quality_over_time(dfs, path, title=None, color="#4CB391"):
+ time_qual = Plot(path=path + "TimeQualityViolinPlot.html",
title="Violin plot of quality over time")
- sns.set(style="white", **plot_settings)
- ax = sns.violinplot(x="timebin",
- y="quals",
- data=dfs,
- inner=None,
- cut=0,
- linewidth=0)
- ax.set(xlabel='Interval (hours)',
- ylabel="Basecall quality",
- title=title or time_qual.title)
- plt.xticks(rotation=45, ha='center', fontsize=8)
- time_qual.fig = ax.get_figure()
- time_qual.save(format=figformat)
- plt.close("all")
+
+ fig = go.Figure()
+
+ fig.add_trace(go.Violin(y=dfs["quals"],
+ x=dfs["timebin"],
+ points=False, spanmode="hard",
+ line_color='black', line_width=1.5,
+ fillcolor=color, opacity=0.8))
+
+ fig.update_layout(xaxis_title='Interval (hours)',
+ yaxis_title='Basecall quality',
+ title=title or time_qual.title,
+ title_x=0.5)
+
+ fig.update_xaxes(tickangle=45)
+
+ time_qual.fig = fig
+ time_qual.html = time_qual.fig.to_html(full_html=False, include_plotlyjs='cdn')
+ time_qual.save()
+
return time_qual
-def sequencing_speed_over_time(dfs, path, figformat, title, plot_settings={}):
- time_duration = Plot(path=path + "TimeSequencingSpeed_ViolinPlot." + figformat,
+def sequencing_speed_over_time(dfs, path, title, color="#4CB391"):
+ time_duration = Plot(path=path + "TimeSequencingSpeed_ViolinPlot.html",
title="Violin plot of sequencing speed over time")
- sns.set(style="white", **plot_settings)
- if "timebin" not in dfs:
- dfs['timebin'] = add_time_bins(dfs)
+
mask = dfs['duration'] != 0
- ax = sns.violinplot(x=dfs.loc[mask, "timebin"],
- y=dfs.loc[mask, "lengths"] / dfs.loc[mask, "duration"],
- inner=None,
- cut=0,
- linewidth=0)
- ax.set(xlabel='Interval (hours)',
- ylabel="Sequencing speed (nucleotides/second)",
- title=title or time_duration.title)
- plt.xticks(rotation=45, ha='center', fontsize=8)
- time_duration.fig = ax.get_figure()
- time_duration.save(format=figformat)
- plt.close("all")
+
+ fig = go.Figure()
+
+ fig.add_trace(
+ go.Violin(x=dfs.loc[mask, "timebin"],
+ y=dfs.loc[mask, "lengths"] / dfs.loc[mask, "duration"],
+ points=False, spanmode="hard",
+ line_color='black', line_width=1.5,
+ fillcolor=color, opacity=0.8))
+
+ fig.update_layout(xaxis_title='Interval (hours)',
+ yaxis_title='Sequencing speed (nucleotides/second)',
+ title=title or time_duration.title,
+ title_x=0.5)
+
+ fig.update_xaxes(tickangle=45)
+
+ time_duration.fig = fig
+ time_duration.html = time_duration.fig.to_html(full_html=False, include_plotlyjs='cdn')
+ time_duration.save()
+
return time_duration
def add_time_bins(dfs, bin_length=3):
maxtime = dfs["start_time"].max().total_seconds()
labels = [str(i) + "-" + str(i + bin_length)
- for i in range(0, 168, bin_length) if not i > (maxtime / 3600)]
+ for i in range(0, 168, bin_length) if not i >= (maxtime / 3600)]
return pd.cut(x=dfs["start_time"],
bins=ceil((maxtime / 3600) / bin_length),
labels=labels)
-def plot_over_time(dfs, path, figformat, title, color):
- num_reads = Plot(path=path + "NumberOfReads_Over_Time." + figformat,
+def plot_over_time(dfs, path, title, color="#4CB391"):
+ num_reads = Plot(path=path + "NumberOfReads_Over_Time.html",
title="Number of reads over time")
s = dfs.loc[:, "lengths"].resample('10T').count()
- ax = sns.regplot(x=s.index.total_seconds() / 3600,
- y=s,
- x_ci=None,
- fit_reg=False,
- color=color,
- scatter_kws={"s": 3})
- ax.set(xlabel='Run time (hours)',
- ylabel='Number of reads per 10 minutes',
- title=title or num_reads.title)
- num_reads.fig = ax.get_figure()
- num_reads.save(format=figformat)
- plt.close("all")
+
+ fig = px.scatter(
+ data_frame=None,
+ x=s.index.total_seconds() / 3600,
+ y=s)
+ fig.update_traces(marker=dict(color=color))
+
+ fig.update_layout(xaxis_title='Run time (hours)',
+ yaxis_title='Number of reads per 10 minutes',
+ title=title or num_reads.title,
+ title_x=0.5)
+
+ num_reads.fig = fig
+ num_reads.html = num_reads.fig.to_html(full_html=False, include_plotlyjs='cdn')
+ num_reads.save()
+
plots = [num_reads]
if "channelIDs" in dfs:
- pores_over_time = Plot(path=path + "ActivePores_Over_Time." + figformat,
+ pores_over_time = Plot(path=path + "ActivePores_Over_Time.html",
title="Number of active pores over time")
s = dfs.loc[:, "channelIDs"].resample('10T').nunique()
- ax = sns.regplot(x=s.index.total_seconds() / 3600,
- y=s,
- x_ci=None,
- fit_reg=False,
- color=color,
- scatter_kws={"s": 3})
- ax.set(xlabel='Run time (hours)',
- ylabel='Active pores per 10 minutes',
- title=title or pores_over_time.title)
- pores_over_time.fig = ax.get_figure()
- pores_over_time.save(format=figformat)
- plt.close("all")
+
+ fig = px.scatter(
+ data_frame=None,
+ x=s.index.total_seconds() / 3600,
+ y=s)
+ fig.update_traces(marker=dict(color=color))
+
+ fig.update_layout(xaxis_title='Run time (hours)',
+ yaxis_title='Active pores per 10 minutes',
+ title=title or pores_over_time.title,
+ title_x=0.5)
+
+ pores_over_time.fig = fig
+ pores_over_time.html = pores_over_time.fig.to_html(full_html=False, include_plotlyjs='cdn')
+ pores_over_time.save()
+
plots.append(pores_over_time)
return plots
-def cumulative_yield(dfs, path, figformat, title, color):
- cum_yield_gb = Plot(path=path + "CumulativeYieldPlot_Gigabases." + figformat,
+def cumulative_yield(dfs, path, title, color):
+ cum_yield_gb = Plot(path=path + "CumulativeYieldPlot_Gigabases.html",
title="Cumulative yield")
- s = dfs.loc[:, "lengths"].cumsum().resample('1T').max() / 1e9
- ax = sns.regplot(x=s.index.total_seconds() / 3600,
- y=s,
- x_ci=None,
- fit_reg=False,
- color=color,
- scatter_kws={"s": 3})
- ax.set(xlabel='Run time (hours)',
- ylabel='Cumulative yield in gigabase',
- title=title or cum_yield_gb.title)
- cum_yield_gb.fig = ax.get_figure()
- cum_yield_gb.save(format=figformat)
- plt.close("all")
-
- cum_yield_reads = Plot(path=path + "CumulativeYieldPlot_NumberOfReads." + figformat,
+
+ s = dfs.loc[:, "lengths"].cumsum().resample('10T').max() / 1e9
+
+ fig = px.scatter(
+ x=s.index.total_seconds() / 3600,
+ y=s)
+ fig.update_traces(marker=dict(color=color))
+
+ fig.update_layout(xaxis_title='Run time (hours)',
+ yaxis_title='Cumulative yield in gigabase',
+ title=title or cum_yield_gb.title,
+ title_x=0.5)
+
+ cum_yield_gb.fig = fig
+ cum_yield_gb.html = cum_yield_gb.fig.to_html(full_html=False, include_plotlyjs='cdn')
+ cum_yield_gb.save()
+
+ cum_yield_reads = Plot(path=path + "CumulativeYieldPlot_NumberOfReads.html",
title="Cumulative yield")
+
s = dfs.loc[:, "lengths"].resample('10T').count().cumsum()
- ax = sns.regplot(x=s.index.total_seconds() / 3600,
- y=s,
- x_ci=None,
- fit_reg=False,
- color=color,
- scatter_kws={"s": 3})
- ax.set(xlabel='Run time (hours)',
- ylabel='Cumulative yield in number of reads',
- title=title or cum_yield_reads.title)
- cum_yield_reads.fig = ax.get_figure()
- cum_yield_reads.save(format=figformat)
- plt.close("all")
+
+ fig = px.scatter(
+ x=s.index.total_seconds() / 3600,
+ y=s)
+ fig.update_traces(marker=dict(color=color))
+
+ fig.update_layout(xaxis_title='Run time (hours)',
+ yaxis_title='Cumulative yield in number of reads',
+ title=title or cum_yield_gb.title,
+ title_x=0.5)
+
+ cum_yield_reads.fig = fig
+ cum_yield_reads.html = cum_yield_reads.fig.to_html(full_html=False, include_plotlyjs='cdn')
+ cum_yield_reads.save()
+
return [cum_yield_gb, cum_yield_reads]
=====================================
scripts/test.sh
=====================================
@@ -1,41 +1,50 @@
set -ev
-git clone https://github.com/wdecoster/nanotest.git
+if [ -d "nanotest" ]; then
+ echo "nanotest already cloned"
+else
+ git clone https://github.com/wdecoster/nanotest.git
+fi
NanoPlot -h
-NanoPlot --listcolors
+# NanoPlot --listcolors
echo ""
echo ""
echo ""
echo "testing bam:"
-NanoPlot --bam nanotest/alignment.bam --verbose
+NanoPlot --bam nanotest/alignment.bam --verbose -o tests
echo ""
echo ""
echo ""
echo "testing bam without supplementary alignments:"
-NanoPlot --bam nanotest/alignment.bam --verbose --no_supplementary
+NanoPlot --bam nanotest/alignment.bam --verbose --no_supplementary -o tests
echo ""
echo ""
echo ""
echo "testing summary:"
-NanoPlot --summary nanotest/sequencing_summary.txt --loglength --verbose
+NanoPlot --summary nanotest/sequencing_summary.txt --loglength --verbose -o tests
echo ""
echo ""
echo ""
echo "testing fastq rich:"
-NanoPlot --fastq_rich nanotest/reads.fastq.gz --verbose --downsample 800
+NanoPlot --fastq_rich nanotest/reads.fastq.gz --verbose --downsample 800 -o tests
echo ""
echo ""
echo ""
echo "testing fastq minimal:"
-NanoPlot --fastq_minimal nanotest/reads.fastq.gz --store --verbose --plot dot
+NanoPlot --fastq_minimal nanotest/reads.fastq.gz --store --verbose --plots dot -o tests
echo ""
echo ""
echo ""
echo "testing fastq plain:"
-NanoPlot --fastq nanotest/reads.fastq.gz --verbose --minqual 4 --color red
+NanoPlot --fastq nanotest/reads.fastq.gz --verbose --minqual 4 --color red -o tests
echo ""
echo ""
echo ""
echo "testing fasta:"
-NanoPlot --fasta nanotest/reads.fa.gz --verbose --maxlength 35000
+NanoPlot --fasta nanotest/reads.fa.gz --verbose --maxlength 35000 -o tests
+echo ""
+echo ""
+echo ""
+# echo "testing feather:"
+# NanoPlot --feather nanotest/summary1.feather --verbose --outdir plots
=====================================
setup.py
=====================================
@@ -21,7 +21,7 @@ setup(
'Development Status :: 4 - Beta',
'Intended Audience :: Science/Research',
'Topic :: Scientific/Engineering :: Bio-Informatics',
- 'License :: OSI Approved :: MIT License',
+ 'License :: OSI Approved :: GNU General Public License v3 (GPLv3)',
'Programming Language :: Python :: 3',
'Programming Language :: Python :: 3.3',
'Programming Language :: Python :: 3.4',
@@ -32,16 +32,15 @@ setup(
python_requires='>=3',
install_requires=['biopython',
'pysam>0.10.0.0',
- 'pandas>=0.22.0',
- 'numpy',
+ 'pandas>=1.1.0',
+ 'numpy>=1.16.5',
'scipy',
'python-dateutil',
- 'seaborn>=0.10.1',
- 'matplotlib>=3.1.3',
- 'nanoget>=1.13.0',
- 'nanomath>=0.23.1',
- "pauvre==0.2.0",
+ 'nanoget>=1.14.0',
+ 'nanomath>=1.0.0',
'plotly>=4.1.0',
+ 'pyarrow',
+ 'kaleido'
],
package_data={'NanoPlot': []},
package_dir={'nanoplot': 'nanoplot'},
View it on GitLab: https://salsa.debian.org/med-team/nanoplot/-/compare/a72e8e822e0673c8e89db24bc48e3a38e45d172f...6a9fa5980258accf2dd733150b2ac446233c6631
--
View it on GitLab: https://salsa.debian.org/med-team/nanoplot/-/compare/a72e8e822e0673c8e89db24bc48e3a38e45d172f...6a9fa5980258accf2dd733150b2ac446233c6631
You're receiving this email because of your account on salsa.debian.org.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://alioth-lists.debian.net/pipermail/debian-med-commit/attachments/20210523/097d75e4/attachment-0001.htm>
More information about the debian-med-commit
mailing list