[med-svn] [Git][med-team/nanoplot][master] 7 commits: routine-update: New upstream version

Mon May 24 00:54:21 BST 2021


Steffen Möller pushed to branch master at Debian Med / nanoplot


Commits:
2e93ada5 by Steffen Moeller at 2021-05-24T01:44:15+02:00
routine-update: New upstream version

- - - - -
54279d6c by Steffen Moeller at 2021-05-24T01:44:16+02:00
New upstream version 1.36.2
- - - - -
12fd94cf by Steffen Moeller at 2021-05-24T01:44:18+02:00
Update upstream source from tag 'upstream/1.36.2'

Update to upstream version '1.36.2'
with Debian dir d861929edaa084d0446ba73160e520bc475ab048
- - - - -
84e4e1c5 by Steffen Moeller at 2021-05-24T01:44:19+02:00
routine-update: Standards-Version: 4.5.1

- - - - -
f8537a31 by Steffen Moeller at 2021-05-24T01:44:26+02:00
Trim trailing whitespace.

Changes-By: lintian-brush
Fixes: lintian: trailing-whitespace
See-also: https://lintian.debian.org/tags/trailing-whitespace.html

- - - - -
0c1b0b1d by Steffen Moeller at 2021-05-24T01:44:29+02:00
Set upstream metadata fields: Bug-Database, Bug-Submit.

Changes-By: lintian-brush
Fixes: lintian: upstream-metadata-missing-bug-tracking
See-also: https://lintian.debian.org/tags/upstream-metadata-missing-bug-tracking.html

- - - - -
6a9fa598 by Steffen Moeller at 2021-05-24T01:51:47+02:00
New upstream version, state missing dependency

https://pypi.org/project/kaleido/ is not in debian

- - - - -


20 changed files:

- MANIFEST.in
- NanoPlot.egg-info/PKG-INFO
- PKG-INFO
- README.md
- README.rst
- debian/changelog
- debian/control
- debian/upstream/metadata
- − extra/color_options.txt
- + extra/color_options_hex.txt
- nanoplot/NanoPlot.py
- + nanoplot/report.py
- nanoplot/utils.py
- nanoplot/version.py
- nanoplotter/nanoplotter_main.py
- nanoplotter/plot.py
- nanoplotter/spatial_heatmap.py
- nanoplotter/timeplots.py
- scripts/test.sh
- setup.py


Changes:

=====================================
MANIFEST.in
=====================================
@@ -1,4 +1,4 @@
-include extra/color_options.txt
+include extra/color_options_hex.txt
 include scripts/test.sh
 include scripts/sequencing_speed_only.py
 include README.md


=====================================
NanoPlot.egg-info/PKG-INFO
=====================================
@@ -1,6 +1,6 @@
 Metadata-Version: 2.1
 Name: NanoPlot
-Version: 1.30.1
+Version: 1.36.2
 Summary: Plotting suite for Oxford Nanopore sequencing data and alignments
 Home-page: https://github.com/wdecoster/NanoPlot
 Author: Wouter De Coster
@@ -139,6 +139,7 @@ Description: # NanoPlot
         
         
         ## ACKNOWLEDGMENTS/CONTRIBUTORS
+        - [Ilias Bukraa](https://github.com/iliasbukraa) for tremendous improvements and maintenance of the code
         - Andreas Sjödin for building and maintaining conda recipes
         - Darrin Schultz [@conchoecia](https://github.com/conchoecia) for Pauvre code
         - [@alexomics](https://github.com/alexomics) for fixing the indentation of the printed stats
@@ -182,7 +183,7 @@ Platform: UNKNOWN
 Classifier: Development Status :: 4 - Beta
 Classifier: Intended Audience :: Science/Research
 Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
-Classifier: License :: OSI Approved :: MIT License
+Classifier: License :: OSI Approved :: GNU General Public License v3 (GPLv3)
 Classifier: Programming Language :: Python :: 3
 Classifier: Programming Language :: Python :: 3.3
 Classifier: Programming Language :: Python :: 3.4


=====================================
PKG-INFO
=====================================
@@ -1,6 +1,6 @@
 Metadata-Version: 2.1
 Name: NanoPlot
-Version: 1.30.1
+Version: 1.36.2
 Summary: Plotting suite for Oxford Nanopore sequencing data and alignments
 Home-page: https://github.com/wdecoster/NanoPlot
 Author: Wouter De Coster
@@ -139,6 +139,7 @@ Description: # NanoPlot
         
         
         ## ACKNOWLEDGMENTS/CONTRIBUTORS
+        - [Ilias Bukraa](https://github.com/iliasbukraa) for tremendous improvements and maintenance of the code
         - Andreas Sjödin for building and maintaining conda recipes
         - Darrin Schultz [@conchoecia](https://github.com/conchoecia) for Pauvre code
         - [@alexomics](https://github.com/alexomics) for fixing the indentation of the printed stats
@@ -182,7 +183,7 @@ Platform: UNKNOWN
 Classifier: Development Status :: 4 - Beta
 Classifier: Intended Audience :: Science/Research
 Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
-Classifier: License :: OSI Approved :: MIT License
+Classifier: License :: OSI Approved :: GNU General Public License v3 (GPLv3)
 Classifier: Programming Language :: Python :: 3
 Classifier: Programming Language :: Python :: 3.3
 Classifier: Programming Language :: Python :: 3.4


=====================================
README.md
=====================================
@@ -131,6 +131,7 @@ This script now also provides read length vs mean quality plots in the '[pauvre]
 
 
 ## ACKNOWLEDGMENTS/CONTRIBUTORS
+- [Ilias Bukraa](https://github.com/iliasbukraa) for tremendous improvements and maintenance of the code
 - Andreas Sjödin for building and maintaining conda recipes
 - Darrin Schultz [@conchoecia](https://github.com/conchoecia) for Pauvre code
 - [@alexomics](https://github.com/alexomics) for fixing the indentation of the printed stats


=====================================
README.rst
=====================================
@@ -159,6 +159,8 @@ This script now also provides read length vs mean quality plots in the
 ACKNOWLEDGMENTS/CONTRIBUTORS
 ----------------------------
 
+-  `Ilias Bukraa <https://github.com/iliasbukraa>`__ for tremendous
+   improvements and maintenance of the code
 -  Andreas Sjödin for building and maintaining conda recipes
 -  Darrin Schultz [@conchoecia](https://github.com/conchoecia) for
    Pauvre code


=====================================
debian/changelog
=====================================
@@ -1,5 +1,13 @@
-nanoplot (1.30.1-1) UNRELEASED; urgency=medium
+nanoplot (1.36.2-1) UNRELEASED; urgency=medium
 
+  [ Andreas Tille ]
   * Initial release (Closes: #964345)
 
+  [ Steffen Moeller, all via routine-update]
+  * Standards-Version: 4.5.1
+  * Trim trailing whitespace.
+  * Set upstream metadata fields: Bug-Database, Bug-Submit.
+
+  FIXME: needs kaleido (https://pypi.org/project/kaleido/) for testing
+
  -- Andreas Tille <tille at debian.org>  Wed, 22 Apr 2020 16:40:23 +0200


=====================================
debian/control
=====================================
@@ -12,7 +12,7 @@ Build-Depends: debhelper-compat (= 13),
                python3-pauvre <!nocheck>,
                python3-plotly <!nocheck>,
                python3-seaborn <!nocheck>,
-Standards-Version: 4.5.0
+Standards-Version: 4.5.1
 Vcs-Browser: https://salsa.debian.org/med-team/nanoplot
 Vcs-Git: https://salsa.debian.org/med-team/nanoplot.git
 Homepage: https://github.com/wdecoster/NanoPlot
@@ -49,4 +49,3 @@ Description: plotting scripts for long read sequencing data
     MinKnow basecalling (optionally compressed)
   * fasta files (optionally compressed)
   * multiple files of the same type can be offered simultaneously
-


=====================================
debian/upstream/metadata
=====================================
@@ -1,3 +1,5 @@
+Bug-Database: https://github.com/wdecoster/NanoPlot/issues
+Bug-Submit: https://github.com/wdecoster/NanoPlot/issues/new
 Reference:
   Author: >
     Wouter De Coster and Svenn D'Hert and Darrin T Schultz and Marc Cruts
@@ -16,3 +18,12 @@ Reference:
 Registry:
  - Name: conda:bioconda
    Entry: nanoplot
+ - Name: bio.tools
+   Entry: NA
+   Checked: 2021-05-24
+ - Name: SciCrunch
+   Entry: NA
+   Checked: 2021-05-24
+ - Name: guix
+   Entry: NA
+   Checked: 2021-05-24


=====================================
extra/color_options.txt deleted
=====================================
@@ -1,148 +0,0 @@
-aliceblue
-antiquewhite
-aqua
-aquamarine
-azure
-beige
-bisque
-black
-blanchedalmond
-blue
-blueviolet
-brown
-burlywood
-cadetblue
-chartreuse
-chocolate
-coral
-cornflowerblue
-cornsilk
-crimson
-cyan
-darkblue
-darkcyan
-darkgoldenrod
-darkgray
-darkgreen
-darkgrey
-darkkhaki
-darkmagenta
-darkolivegreen
-darkorange
-darkorchid
-darkred
-darksalmon
-darkseagreen
-darkslateblue
-darkslategray
-darkslategrey
-darkturquoise
-darkviolet
-deeppink
-deepskyblue
-dimgray
-dimgrey
-dodgerblue
-firebrick
-floralwhite
-forestgreen
-fuchsia
-gainsboro
-ghostwhite
-gold
-goldenrod
-gray
-green
-greenyellow
-grey
-honeydew
-hotpink
-indianred
-indigo
-ivory
-khaki
-lavender
-lavenderblush
-lawngreen
-lemonchiffon
-lightblue
-lightcoral
-lightcyan
-lightgoldenrodyellow
-lightgray
-lightgreen
-lightgrey
-lightpink
-lightsalmon
-lightseagreen
-lightskyblue
-lightslategray
-lightslategrey
-lightsteelblue
-lightyellow
-lime
-limegreen
-linen
-magenta
-maroon
-mediumaquamarine
-mediumblue
-mediumorchid
-mediumpurple
-mediumseagreen
-mediumslateblue
-mediumspringgreen
-mediumturquoise
-mediumvioletred
-midnightblue
-mintcream
-mistyrose
-moccasin
-navajowhite
-navy
-oldlace
-olive
-olivedrab
-orange
-orangered
-orchid
-palegoldenrod
-palegreen
-paleturquoise
-palevioletred
-papayawhip
-peachpuff
-peru
-pink
-plum
-powderblue
-purple
-rebeccapurple
-red
-rosybrown
-royalblue
-saddlebrown
-salmon
-sandybrown
-seagreen
-seashell
-sienna
-silver
-skyblue
-slateblue
-slategray
-slategrey
-snow
-springgreen
-steelblue
-tan
-teal
-thistle
-tomato
-turquoise
-violet
-wheat
-white
-whitesmoke
-yellow
-yellowgreen


=====================================
extra/color_options_hex.txt
=====================================
@@ -0,0 +1,148 @@
+aliceblue,#F0F8FF
+antiquewhite,#FAEBD7
+aqua,#00FFFF
+aquamarine,#7FFFD4
+azure,#F0FFFF
+beige,#F5F5DC
+bisque,#FFE4C4
+black,#000000
+blanchedalmond,#FFEBCD
+blue,#0000FF
+blueviolet,#8A2BE2
+brown,#A52A2A
+burlywood,#DEB887
+cadetblue,#5F9EA0
+chartreuse,#7FFF00
+chocolate,#D2691E
+coral,#FF7F50
+cornflowerblue,#6495ED
+cornsilk,#FFF8DC
+crimson,#DC143C
+cyan,#00FFFF
+darkblue,#00008B
+darkcyan,#008B8B
+darkgoldenrod,#B8860B
+darkgray,#A9A9A9
+darkgreen,#006400
+darkgrey,#A9A9A9
+darkkhaki,#BDB76B
+darkmagenta,#8B008B
+darkolivegreen,#556B2F
+darkorange,#FF8C00
+darkorchid,#9932CC
+darkred,#8B0000
+darksalmon,#E9967A
+darkseagreen,#8FBC8F
+darkslateblue,#483D8B
+darkslategray,#2F4F4F
+darkslategrey,#2F4F4F
+darkturquoise,#00CED1
+darkviolet,#9400D3
+deeppink,#FF1493
+deepskyblue,#00BFFF
+dimgray,#696969
+dimgrey,#696969
+dodgerblue,#1E90FF
+firebrick,#B22222
+floralwhite,#FFFAF0
+forestgreen,#228B22
+fuchsia,#FF00FF
+gainsboro,#DCDCDC
+ghostwhite,#F8F8FF
+gold,#FFD700
+goldenrod,#DAA520
+gray,#808080
+green,#008000
+greenyellow,#ADFF2F
+grey,#808080
+honeydew,#F0FFF0
+hotpink,#FF69B4
+indianred,#CD5C5C
+indigo,#4B0082
+ivory,#FFFFF0
+khaki,#F0E68C
+lavender,#E6E6FA
+lavenderblush,#FFF0F5
+lawngreen,#7CFC00
+lemonchiffon,#FFFACD
+lightblue,#ADD8E6
+lightcoral,#F08080
+lightcyan,#E0FFFF
+lightgoldenrodyellow,#FAFAD2
+lightgray,#D3D3D3
+lightgreen,#90EE90
+lightgrey,#D3D3D3
+lightpink,#FFB6C1
+lightsalmon,#FFA07A
+lightseagreen,#20B2AA
+lightskyblue,#87CEFA
+lightslategray,#778899
+lightslategrey,#778899
+lightsteelblue,#B0C4DE
+lightyellow,#FFFFE0
+lime,#00FF00
+limegreen,#32CD32
+linen,#FAF0E6
+magenta,#FF00FF
+maroon,#800000
+mediumaquamarine,#66CDAA
+mediumblue,#0000CD
+mediumorchid,#BA55D3
+mediumpurple,#9370DB
+mediumseagreen,#3CB371
+mediumslateblue,#7B68EE
+mediumspringgreen,#00FA9A
+mediumturquoise,#48D1CC
+mediumvioletred,#C71585
+midnightblue,#191970
+mintcream,#F5FFFA
+mistyrose,#FFE4E1
+moccasin,#FFE4B5
+navajowhite,#FFDEAD
+navy,#000080
+oldlace,#FDF5E6
+olive,#808000
+olivedrab,#6B8E23
+orange,#FFA500
+orangered,#FF4500
+orchid,#DA70D6
+palegoldenrod,#EEE8AA
+palegreen,#98FB98
+paleturquoise,#AFEEEE
+palevioletred,#DB7093
+papayawhip,#FFEFD5
+peachpuff,#FFDAB9
+peru,#CD853F
+pink,#FFC0CB
+plum,#DDA0DD
+powderblue,#B0E0E6
+purple,#800080
+rebeccapurple,#663399
+red,#FF0000
+rosybrown,#BC8F8F
+royalblue,#4169E1
+saddlebrown,#8B4513
+salmon,#FA8072
+sandybrown,#F4A460
+seagreen,#2E8B57
+seashell,#FFF5EE
+sienna,#A0522D
+silver,#C0C0C0
+skyblue,#87CEEB
+slateblue,#6A5ACD
+slategray,#708090
+slategrey,#708090
+snow,#FFFAFA
+springgreen,#00FF7F
+steelblue,#4682B4
+tan,#D2B48C
+teal,#008080
+thistle,#D8BFD8
+tomato,#FF6347
+turquoise,#40E0D0
+violet,#EE82EE
+wheat,#F5DEB3
+white,#FFFFFF
+whitesmoke,#F5F5F5
+yellow,#FFFF00
+yellowgreen,#9ACD32


=====================================
nanoplot/NanoPlot.py
=====================================
@@ -11,13 +11,13 @@ Input data can be given as one or multiple of:
 -a summary file generated by albacore
 '''
 
-
 from os import path
 import logging
 import nanomath
 import numpy as np
 from scipy import stats
 import nanoplot.utils as utils
+import nanoplot.report as report
 from nanoget import get_input
 from nanoplot.filteroptions import filter_and_transform_data
 from nanoplot.version import __version__
@@ -37,21 +37,24 @@ def main():
     try:
         utils.make_output_dir(args.outdir)
         utils.init_logs(args)
-        args.format = nanoplotter.check_valid_format(args.format)
-        sources = {
-            "fastq": args.fastq,
-            "bam": args.bam,
-            "cram": args.cram,
-            "fastq_rich": args.fastq_rich,
-            "fastq_minimal": args.fastq_minimal,
-            "summary": args.summary,
-            "fasta": args.fasta,
-            "ubam": args.ubam,
-        }
-
+        # args.format = nanoplotter.check_valid_format(args.format)
         if args.pickle:
             datadf = pickle.load(open(args.pickle, 'rb'))
+        elif args.feather:
+            from nanoget import combine_dfs
+            from pandas import read_feather
+            datadf = combine_dfs([read_feather(p) for p in args.feather], method="simple")
         else:
+            sources = {
+                "fastq": args.fastq,
+                "bam": args.bam,
+                "cram": args.cram,
+                "fastq_rich": args.fastq_rich,
+                "fastq_minimal": args.fastq_minimal,
+                "summary": args.summary,
+                "fasta": args.fasta,
+                "ubam": args.ubam,
+            }
             datadf = get_input(
                 source=[n for n, s in sources.items() if s][0],
                 files=[f for f in sources.values() if f][0],
@@ -60,7 +63,7 @@ def main():
                 combine="simple",
                 barcoded=args.barcoded,
                 huge=args.huge,
-                keep_supp=not(args.no_supplementary))
+                keep_supp=not (args.no_supplementary))
         if args.store:
             pickle.dump(
                 obj=datadf,
@@ -71,28 +74,32 @@ def main():
                           index=False,
                           compression="gzip")
 
-        settings["statsfile"] = [make_stats(datadf, settings, suffix="")]
+        settings["statsfile"] = [make_stats(datadf, settings, suffix="", tsv_stats=args.tsv_stats)]
         datadf, settings = filter_and_transform_data(datadf, settings)
         if settings["filtered"]:  # Bool set when filter was applied in filter_and_transform_data()
             settings["statsfile"].append(
-                make_stats(datadf[datadf["length_filter"]], settings, suffix="_post_filtering")
+                make_stats(datadf[datadf["length_filter"]], settings,
+                           suffix="_post_filtering", tsv_stats=args.tsv_stats)
             )
 
         if args.barcoded:
+            main_path = settings["path"]
             barcodes = list(datadf["barcode"].unique())
             plots = []
             for barc in barcodes:
                 logging.info("Processing {}".format(barc))
-                settings["path"] = path.join(args.outdir, args.prefix + barc + "_")
                 dfbarc = datadf[datadf["barcode"] == barc]
                 if len(dfbarc) > 5:
                     settings["title"] = barc
+                    settings["path"] = path.join(args.outdir, args.prefix + barc + "_")
+                    plots.append(report.BarcodeTitle(barc))
                     plots.extend(
                         make_plots(dfbarc, settings)
                     )
                 else:
                     sys.stderr.write("Found barcode {} less than 5x, ignoring...\n".format(barc))
                     logging.info("Found barcode {} less than 5 times, ignoring".format(barc))
+            settings["path"] = main_path
         else:
             plots = make_plots(datadf, settings)
         make_report(plots, settings)
@@ -107,20 +114,22 @@ def main():
         raise
 
 
-def make_stats(datadf, settings, suffix):
+def make_stats(datadf, settings, suffix, tsv_stats=True):
     statsfile = settings["path"] + "NanoStats" + suffix + ".txt"
-    nanomath.write_stats(
+    stats_df = nanomath.write_stats(
         datadfs=[datadf],
-        outputfile=statsfile)
+        outputfile=statsfile,
+        as_tsv=tsv_stats)
     logging.info("Calculated statistics")
     if settings["barcoded"]:
         barcodes = list(datadf["barcode"].unique())
         statsfile = settings["path"] + "NanoStats_barcoded.txt"
-        nanomath.write_stats(
+        stats_df = nanomath.write_stats(
             datadfs=[datadf[datadf["barcode"] == b] for b in barcodes],
             outputfile=statsfile,
-            names=barcodes)
-    return statsfile
+            names=barcodes,
+            as_tsv=tsv_stats)
+    return stats_df if tsv_stats else statsfile
 
 
 def make_plots(datadf, settings):
@@ -128,16 +137,27 @@ def make_plots(datadf, settings):
     Call plotting functions from nanoplotter
     settings["lengths_pointer"] is a column in the DataFrame specifying which lengths to use
     '''
-    plot_settings = dict(font_scale=settings["font_scale"])
-    nanoplotter.plot_settings(plot_settings, dpi=settings["dpi"])
     color = nanoplotter.check_valid_color(settings["color"])
     colormap = nanoplotter.check_valid_colormap(settings["colormap"])
+
     plotdict = {type: settings["plots"].count(type) for type in ["kde", "hex", "dot", 'pauvre']}
+    if "hex" in settings["plots"]:
+        print(
+            "WARNING: hex as part of --plots has been deprecated and will be ignored. To get the hex output, rerun with --legacy hex.")
+
+    if settings["legacy"] is None:
+        plotdict_legacy = {}
+    else:
+        plotdict_legacy = {plot: settings["legacy"].count(plot) for plot in ["kde", "hex", "dot"]}
+
     plots = []
+
+    subdf = utils.subsample_datasets(datadf)
     if settings["N50"]:
         n50 = nanomath.get_N50(np.sort(datadf["lengths"]))
     else:
         n50 = None
+
     plots.extend(
         nanoplotter.length_plots(
             array=datadf[datadf["length_filter"]]["lengths"].astype('uint64'),
@@ -145,7 +165,6 @@ def make_plots(datadf, settings):
             path=settings["path"],
             n50=n50,
             color=color,
-            figformat=settings["format"],
             title=settings["title"])
     )
     logging.info("Created length plots")
@@ -154,27 +173,27 @@ def make_plots(datadf, settings):
             nanoplotter.scatter(
                 x=datadf[datadf["length_filter"]][settings["lengths_pointer"].replace('log_', '')],
                 y=datadf[datadf["length_filter"]]["quals"],
+                legacy=plotdict_legacy,
                 names=['Read lengths', 'Average read quality'],
                 path=settings["path"] + "LengthvsQualityScatterPlot",
                 color=color,
-                figformat=settings["format"],
+                colormap=colormap,
                 plots=plotdict,
-                title=settings["title"],
-                plot_settings=plot_settings)
+                title=settings["title"])
         )
         if settings["logBool"]:
             plots.extend(
                 nanoplotter.scatter(
                     x=datadf[datadf["length_filter"]][settings["lengths_pointer"]],
                     y=datadf[datadf["length_filter"]]["quals"],
+                    legacy=plotdict_legacy,
                     names=['Read lengths', 'Average read quality'],
                     path=settings["path"] + "LengthvsQualityScatterPlot",
                     color=color,
-                    figformat=settings["format"],
+                    colormap=colormap,
                     plots=plotdict,
                     log=True,
-                    title=settings["title"],
-                    plot_settings=plot_settings)
+                    title=settings["title"])
             )
         logging.info("Created LengthvsQual plot")
     if "channelIDs" in datadf:
@@ -183,30 +202,27 @@ def make_plots(datadf, settings):
                 array=datadf["channelIDs"],
                 title=settings["title"],
                 path=settings["path"] + "ActivityMap_ReadsPerChannel",
-                color=colormap,
-                figformat=settings["format"])
+                colormap=colormap)
         )
         logging.info("Created spatialheatmap for succesfull basecalls.")
     if "start_time" in datadf:
         plots.extend(
             nanoplotter.time_plots(
                 df=datadf,
+                subsampled_df=subdf,
                 path=settings["path"],
                 color=color,
-                figformat=settings["format"],
-                title=settings["title"],
-                plot_settings=plot_settings)
+                title=settings["title"])
         )
         if settings["logBool"]:
             plots.extend(
                 nanoplotter.time_plots(
                     df=datadf,
+                    subsampled_df=subdf,
                     path=settings["path"],
                     color=color,
-                    figformat=settings["format"],
                     title=settings["title"],
-                    log_length=True,
-                    plot_settings=plot_settings)
+                    log_length=True)
             )
         logging.info("Created timeplots.")
     if "aligned_lengths" in datadf and "lengths" in datadf:
@@ -214,13 +230,13 @@ def make_plots(datadf, settings):
             nanoplotter.scatter(
                 x=datadf[datadf["length_filter"]]["aligned_lengths"],
                 y=datadf[datadf["length_filter"]]["lengths"],
+                legacy=plotdict_legacy,
                 names=["Aligned read lengths", "Sequenced read length"],
                 path=settings["path"] + "AlignedReadlengthvsSequencedReadLength",
-                figformat=settings["format"],
                 plots=plotdict,
                 color=color,
-                title=settings["title"],
-                plot_settings=plot_settings)
+                colormap=colormap,
+                title=settings["title"])
         )
         logging.info("Created AlignedLength vs Length plot.")
     if "mapQ" in datadf and "quals" in datadf:
@@ -228,40 +244,40 @@ def make_plots(datadf, settings):
             nanoplotter.scatter(
                 x=datadf["mapQ"],
                 y=datadf["quals"],
+                legacy=plotdict_legacy,
                 names=["Read mapping quality", "Average basecall quality"],
                 path=settings["path"] + "MappingQualityvsAverageBaseQuality",
                 color=color,
-                figformat=settings["format"],
+                colormap=colormap,
                 plots=plotdict,
-                title=settings["title"],
-                plot_settings=plot_settings)
+                title=settings["title"])
         )
         logging.info("Created MapQvsBaseQ plot.")
         plots.extend(
             nanoplotter.scatter(
                 x=datadf[datadf["length_filter"]][settings["lengths_pointer"].replace('log_', '')],
                 y=datadf[datadf["length_filter"]]["mapQ"],
+                legacy=plotdict_legacy,
                 names=["Read length", "Read mapping quality"],
                 path=settings["path"] + "MappingQualityvsReadLength",
                 color=color,
-                figformat=settings["format"],
+                colormap=colormap,
                 plots=plotdict,
-                title=settings["title"],
-                plot_settings=plot_settings)
+                title=settings["title"])
         )
         if settings["logBool"]:
             plots.extend(
                 nanoplotter.scatter(
                     x=datadf[datadf["length_filter"]][settings["lengths_pointer"]],
                     y=datadf[datadf["length_filter"]]["mapQ"],
+                    legacy=plotdict_legacy,
                     names=["Read length", "Read mapping quality"],
                     path=settings["path"] + "MappingQualityvsReadLength",
                     color=color,
-                    figformat=settings["format"],
+                    colormap=colormap,
                     plots=plotdict,
                     log=True,
-                    title=settings["title"],
-                    plot_settings=plot_settings)
+                    title=settings["title"])
             )
         logging.info("Created Mapping quality vs read length plot.")
     if "percentIdentity" in datadf:
@@ -271,52 +287,52 @@ def make_plots(datadf, settings):
                 nanoplotter.scatter(
                     x=datadf["percentIdentity"],
                     y=datadf["aligned_quals"],
+                    legacy=plotdict_legacy,
                     names=["Percent identity", "Average Base Quality"],
                     path=settings["path"] + "PercentIdentityvsAverageBaseQuality",
                     color=color,
-                    figformat=settings["format"],
+                    colormap=colormap,
                     plots=plotdict,
                     stat=stats.pearsonr if not settings["hide_stats"] else None,
                     minvalx=minPID,
-                    title=settings["title"],
-                    plot_settings=plot_settings)
+                    title=settings["title"])
             )
             logging.info("Created Percent ID vs Base quality plot.")
         plots.extend(
             nanoplotter.scatter(
                 x=datadf[datadf["length_filter"]][settings["lengths_pointer"].replace('log_', '')],
                 y=datadf[datadf["length_filter"]]["percentIdentity"],
+                legacy=plotdict_legacy,
                 names=["Aligned read length", "Percent identity"],
                 path=settings["path"] + "PercentIdentityvsAlignedReadLength",
                 color=color,
-                figformat=settings["format"],
+                colormap=colormap,
                 plots=plotdict,
                 stat=stats.pearsonr if not settings["hide_stats"] else None,
                 minvaly=minPID,
-                title=settings["title"],
-                plot_settings=plot_settings)
+                title=settings["title"])
         )
         if settings["logBool"]:
             plots.extend(
                 nanoplotter.scatter(
                     x=datadf[datadf["length_filter"]][settings["lengths_pointer"]],
                     y=datadf[datadf["length_filter"]]["percentIdentity"],
+                    legacy=plotdict_legacy,
                     names=["Aligned read length", "Percent identity"],
                     path=settings["path"] + "PercentIdentityvsAlignedReadLength",
                     color=color,
-                    figformat=settings["format"],
+                    colormap=colormap,
                     plots=plotdict,
                     stat=stats.pearsonr if not settings["hide_stats"] else None,
                     log=True,
                     minvaly=minPID,
-                    title=settings["title"],
-                    plot_settings=plot_settings)
+                    title=settings["title"])
             )
 
         plots.append(nanoplotter.dynamic_histogram(array=datadf["percentIdentity"],
                                                    name="percent identity",
                                                    path=settings["path"]
-                                                   + "PercentIdentityHistogram",
+                                                        + "PercentIdentityHistogram",
                                                    title=settings["title"],
                                                    color=color))
         logging.info("Created Percent ID vs Length plot")
@@ -328,49 +344,19 @@ def make_report(plots, settings):
     Creates a fat html report based on the previously created files
     plots is a list of Plot objects defined by a path and title
     statsfile is the file to which the stats have been saved,
-    which is parsed to a table (rather dodgy)
+    which is parsed to a table (rather dodgy) or nicely if it's a pandas/tsv
     '''
     logging.info("Writing html report.")
-    html_content = ['<body>']
-
-    # Hyperlink Table of Contents panel
-    html_content.append('<div class="panel panelC">')
-    if settings["filtered"]:
-        html_content.append(
-            '<p><strong><a href="#stats0">Summary Statistics prior to filtering</a></strong></p>')
-        html_content.append(
-            '<p><strong><a href="#stats1">Summary Statistics after filtering</a></strong></p>')
-    else:
-        html_content.append(
-            '<p><strong><a href="#stats0">Summary Statistics</a></strong></p>')
-    html_content.append('<p><strong><a href="#plots">Plots</a></strong></p>')
-    html_content.extend(['<p style="margin-left:20px"><a href="#'
-                         + p.title.replace(' ', '_') + '">' + p.title + '</a></p>' for p in plots])
-    html_content.append('</div>')
-
-    # The report itself: stats
-    html_content.append('<div class="panel panelM"> <h1>NanoPlot report</h1>')
-    if settings["filtered"]:
-        html_content.append('<h2 id="stats0">Summary statistics prior to filtering</h2>')
-        html_content.append(utils.stats2html(settings["statsfile"][0]))
-        html_content.append('<h2 id="stats1">Summary statistics after filtering</h2>')
-        html_content.append(utils.stats2html(settings["statsfile"][1]))
-    else:
-        html_content.append('<h2 id="stats0">Summary statistics</h2>')
-        html_content.append(utils.stats2html(settings["statsfile"][0]))
 
-    # The report itself: plots
-    html_content.append('<h2 id="plots">Plots</h2>')
-    for plot in plots:
-        html_content.append('\n<h3 id="' + plot.title.replace(' ', '_') + '">'
-                            + plot.title + '</h3>\n' + plot.encode())
-        html_content.append('\n<br>\n<br>\n<br>\n<br>')
-    html_body = '\n'.join(html_content) + '</div></body></html>'
-    html_str = utils.html_head + html_body
-    htmlreport = settings["path"] + "NanoPlot-report.html"
-    with open(htmlreport, "w") as html_file:
-        html_file.write(html_str)
-    return htmlreport
+    html_content = [
+        '<body class="grid">',
+        report.html_toc(plots, filtered=settings["filtered"]),
+        report.html_stats(settings),
+        report.html_plots(plots),
+        report.run_info(settings) if settings["info_in_report"] else '',
+        '</main></body></html>']
+    with open(settings["path"] + "NanoPlot-report.html", "w") as html_file:
+        html_file.write(report.html_head + '\n'.join(html_content))
 
 
 if __name__ == "__main__":


=====================================
nanoplot/report.py
=====================================
@@ -0,0 +1,294 @@
+import pandas as pd
+import numpy as np
+
+
+class BarcodeTitle(object):
+    """Bit of a dummy class to add barcode titles to the report"""
+
+    def __init__(self, title):
+        self.title = title.upper()
+
+    def encode(self):
+        return ""
+
+
+def chunks(values, chunks):
+    if values:
+        chunksize = int(len(values) / chunks)
+        return ([' '.join(values[i:i + chunksize]) for i in range(0, len(values), chunksize)])
+    else:
+        return [" "] * chunks
+
+
+def html_stats(settings):
+    statsfile = settings["statsfile"]
+    filtered = settings["filtered"]
+    as_tsv = settings['tsv_stats']
+
+    stats_html = []
+    stats_html.append('<main class="grid-main"><h2>NanoPlot reports</h2>')
+    if filtered:
+        stats_html.append('<h3 id="stats0">Summary statistics prior to filtering</h3>')
+        if as_tsv:
+            stats_html.append(statsfile[0].to_html())
+            stats_html.append('<h3 id="stats1">Summary statistics after filtering</h3>')
+            stats_html.append(statsfile[1].to_html())
+        else:
+            stats_html.append(stats2html(statsfile[0]))
+            stats_html.append('<h3 id="stats1">Summary statistics after filtering</h3>')
+            stats_html.append(stats2html(statsfile[1]))
+    else:
+        stats_html.append('<h3 id="stats0">Summary statistics</h3>')
+        if as_tsv:
+            stats_html.append(statsfile[0].to_html())
+        else:
+            stats_html.append(stats2html(statsfile[0]))
+    return '\n'.join(stats_html)
+
+
+def stats2html(statsf):
+    df = pd.read_csv(statsf, sep=':', header=None, names=['feature', 'value'])
+    values = df["value"].str.strip().str.replace('\t', ' ').str.split().replace(np.nan, '')
+    num = len(values[0]) or 1
+    v = [chunks(i, num) for i in values]
+    df = pd.DataFrame(v, index=df["feature"])
+    df.columns.name = None
+    df.index.name = None
+    return df.to_html(header=False)
+
+
+def html_toc(plots, filtered=False):
+    toc = []
+    toc.append('<h1 class="hiddentitle">NanoPlot statistics report</h1>')
+    toc.append('<header class="grid-header"><nav><h2 class="hiddentitle">Menu</h2><ul>')
+    if filtered:
+        toc.append(
+            '<li><a href="#stats0">Summary Statistics prior to filtering</a></li>')
+        toc.append(
+            '<li><a href="#stats1">Summary Statistics after filtering</a></li>')
+    else:
+        toc.append('<li><a href="#stats0">Summary Statistics</a></li>')
+
+    toc.append('<li class="submenu"><a href="#plots" class="submenubtn">Plots</a>')
+    toc.append('<ul class="submenu-items">')
+    toc.extend(['<li><a href="#'
+                + p.title.replace(' ', '_') + '">' + p.title + '</a></li>' for p in plots])
+    toc.append('</ul>')
+    toc.append('</li>')
+    toc.append(
+        '<li class="issue-btn"><a href="https://github.com/wdecoster/NanoPlot/issues" target="_blank"  class="reporting">Report issue on Github</a></li>')
+    toc.append('</ul></nav></header>')
+    return '\n'.join(toc)
+
+
+def html_plots(plots):
+    html_plots = []
+    html_plots.append('<h3 id="plots">Plots</h3>')
+    for plot in plots:
+        html_plots.append('<button class="collapsible">' + plot.title + '</button>')
+        html_plots.append('<section class="collapsible-content"><h4 class="hiddentitle" id="' +
+                          plot.title.replace(' ', '_') + '">' + plot.title + '</h4>')
+        html_plots.append(plot.encode())
+        html_plots.append('</section>')
+
+    html_plots.append(
+        '<script>var coll = document.getElementsByClassName("collapsible");var i;for (i = 0; i < coll.length; i++) {coll[i].addEventListener("click", function() {this.classList.toggle("active");var content = this.nextElementSibling;if (content.style.display === "none") {content.style.display = "block";} else {content.style.display = "none";}});}</script>')
+
+    return '\n'.join(html_plots)
+
+
+def run_info(settings):
+    html_info = []
+    html_info.append('<h5>Run Info</h5>\n')
+    html_info.append('<h6>Data source:</h6>\n')
+    for k in ["fastq", "fasta", "fastq_rich", "fastq_minimal", "summary",
+              "bam", "ubam", "cram", "pickle", "feather"]:
+        html_info.append(f"{k}:\t{settings[k]}<br>")
+    html_info.append('<h6>Filtering parameters:</h6>\n')
+    for k in ['maxlength', 'minlength', 'drop_outliers', 'downsample', 'loglength',
+              'percentqual', 'alength', 'minqual', 'runtime_until', 'no_supplementary']:
+        html_info.append(f"{k}:\t{settings[k]}<br>")
+    # html_info.append('</p>')
+    return '\n'.join(html_info)
+
+
+html_head = """
+<!DOCTYPE html>
+<html>
+<head>
+<meta charset="UTF-8">
+<style>
+
+body {margin:0}
+
+.grid { /* grid definition for index page */
+    display: grid;
+    grid-template-areas:    'gheader'
+                            'gmain';
+    margin: 0;
+}
+
+.grid > .grid-header { /* definition of the header on index page and its position in the grid */
+    grid-area: gheader;
+}
+
+.grid > .grid-main { /* definition of the main content on index page and its position in the grid */
+    grid-area: gmain;
+}
+
+nav {
+    text-align: center;
+}
+
+ul {
+    border-bottom: 1px solid white;
+    font-family: "Trebuchet MS", sans-serif;
+    list-style-type: none; /* remove dot symbols from list */
+    margin: 0;
+    padding: 0;
+    overflow: hidden; /* contains the overflow of the element if it goes 'out of bounds' */
+    background-color: #001f3f;
+    font-size: 1.6em;
+}
+
+ul > li > ul {
+    font-size: 1em;
+}
+
+li {
+    float: left; /* floats the list items to the left side of the page */
+}
+
+li a, .submenubutton {
+    display: inline-block; /* display the list items inline block so the items are vertically displayed */
+    color: white;
+    text-align: center;
+    padding: 14px 16px;
+    text-decoration: none; /* removes the underline that comes with the a tag */
+}
+
+li a:hover, .submenu:hover .submenubutton { /* when you hover over a submenu item the bkgrnd color is gray */
+    background-color: #39CCCC;
+}
+
+.submenu {
+    display: inline-block; /* idem to above, list items are displayed underneath each other */
+}
+
+.submenu-items { /* hides the ul */
+    display: none;
+    position: absolute;
+    background-color: #f9f9f9;
+    min-width: 160px;
+    z-index: 1;
+}
+
+.submenu-items li {
+    display: block;
+    float: none;
+    overflow: hidden;
+}
+
+.submenu-items li a { /* styling of the links in the submenu */
+    color: black;
+    padding: 12px 16px;
+    text-decoration: none;
+    display: block;
+    text-align: left;
+}
+
+.submenu-items a:hover {
+    background-color: #f1f1f1;
+}
+
+.submenu:hover .submenu-items {
+    display: block;
+    float: bottom;
+    overflow: hidden;
+}
+
+li {
+  border-right: 1px solid #bbb;
+}
+
+.issue-btn {
+  border-right: none;
+  float: right;
+}
+
+.hiddentitle { /* hides titles that are not necessary for content, but are for outline */
+  position: absolute;
+  width: 1px;
+  height: 1px;
+  overflow: hidden;
+  left: -10000px;
+}
+
+h2 { color: #111; font-family: 'Helvetica Neue', sans-serif; font-size: 60px; font-weight: bold; letter-spacing: -1px; line-height: 1; text-align: center; }
+
+h3 { color: #111; font-family: 'Open Sans', sans-serif; font-size: 25px; font-weight: 300; line-height: 32px; text-align: center; padding-bottom: 0;}
+
+h4 { color: #111; font-family: 'Helvetica Neue', sans-serif; font-size: 16px; font-weight: 150; margin: 0 0 0 0; text-align: left; padding:20px 0px 20px 0px;}
+
+table {
+  font-family: Arial, Helvetica, sans-serif;
+  border-collapse: collapse;
+  table-layout: auto;
+  border-collapse: collapse;
+  width: 100%;
+}
+
+table td, table th {
+  border: 1px solid #ddd;
+  padding: 8px;
+}
+
+table tr:nth-child(even){background-color: #f2f2f2;}
+
+table tr:hover {background-color: #ddd;}
+
+/* Style the button that is used to open and close the collapsible content */
+.collapsible {
+  background-color: #39CCCC;
+  color: white;
+  cursor: pointer;
+  padding: 18px;
+  width: 100%;
+  border: none;
+  text-align: left;
+  outline: none;
+  font-size: 15px;
+}
+
+/* Add a background color to the button if it is clicked on (add the .active class with JS), and when you move the mouse over it (hover) */
+.active, .collapsible:hover {
+    color:white;
+  background-color: #001f3f;
+}
+
+/* Style the collapsible content. Note: hidden by default */
+.collapsible-content {
+  padding: 0 18px;
+  display: block;
+  overflow: hidden;
+  background-color: #FFFFFF;
+  text-align: center;
+}
+
+.collapsible:after {
+  content: '-';
+  font-size: 20px;
+    font-weight: bold;
+  float: right;
+    color:white;
+  margin-left: 5px;
+}
+
+.active:after {
+  content: '+'; /* Unicode character for "minus" sign (-) */
+      color: white;
+}
+</style>
+<title>NanoPlot Report</title>
+</head>
+"""


=====================================
nanoplot/utils.py
=====================================
@@ -7,8 +7,6 @@ from nanoplot.version import __version__
 from argparse import HelpFormatter, Action, ArgumentParser
 import textwrap as _textwrap
 import pandas as pd
-import numpy as np
-from matplotlib import cm
 
 
 class CustomHelpFormatter(HelpFormatter):
@@ -41,7 +39,6 @@ class Action_Print_Colors(Action):
 
 
 class Action_Print_Colormaps(Action):
-
     def __init__(self, option_strings, dest="==SUPPRESS==", default="==SUPPRESS==", help=None):
         super(Action_Print_Colormaps, self).__init__(
             option_strings=option_strings,
@@ -97,6 +94,12 @@ def get_args():
                          help="Specify an optional prefix to be used for the output files.",
                          default="",
                          type=str)
+    general.add_argument("--tsv_stats",
+                         help="Output the stats file as a properly formatted TSV.",
+                         action='store_true')
+    general.add_argument("--info_in_report",
+                         help="Add NanoPlot run info in the report.",
+                         action='store_true')
     filtering = parser.add_argument_group(
         title='Options for filtering or transforming input prior to plotting')
     filtering.add_argument("--maxlength",
@@ -151,18 +154,22 @@ def get_args():
     visual.add_argument("-cm", "--colormap",
                         help="Specify a valid matplotlib colormap for the heatmap",
                         default="Greens")
-    visual.add_argument("-f", "--format",
-                        help="Specify the output format of the plots.",
-                        default="png",
-                        type=str,
-                        choices=['eps', 'jpeg', 'jpg', 'pdf', 'pgf', 'png', 'ps',
-                                 'raw', 'rgba', 'svg', 'svgz', 'tif', 'tiff'])
+    # visual.add_argument("-f", "--format",
+    #                     help="Specify the output format of the plots.",
+    #                     default="png",
+    #                     type=str,
+    #                     choices=['eps', 'jpeg', 'jpg', 'pdf', 'pgf', 'png', 'ps',
+    #                              'raw', 'rgba', 'svg', 'svgz', 'tif', 'tiff'])
     visual.add_argument("--plots",
                         help="Specify which bivariate plots have to be made.",
                         default=['kde', 'dot'],
                         type=str,
                         nargs='*',
-                        choices=['kde', 'hex', 'dot', 'pauvre'])
+                        choices=['kde', 'hex', 'dot'])
+    visual.add_argument("--legacy", help="Specify which bivariate plots have to be made (legacy mode).",
+                        type=str,
+                        nargs='*',
+                        choices=['kde', 'dot', 'hex'])
     visual.add_argument("--listcolors",
                         help="List the colors which are available for plotting and exit.",
                         action=Action_Print_Colors,
@@ -238,6 +245,10 @@ def get_args():
     mtarget.add_argument("--pickle",
                          help="Data is a pickle file stored earlier.",
                          metavar="pickle")
+    mtarget.add_argument("--feather",
+                         help="Data is in one or more feather file(s).",
+                         nargs='+',
+                         metavar="file")
     args = parser.parse_args()
     if args.listcolors:
         list_colors()
@@ -259,13 +270,19 @@ def custom_formatter(prog):
 
 def list_colors():
     parent_directory = os.path.dirname(os.path.abspath(os.path.dirname(__file__)))
-    colours = open(os.path.join(parent_directory, "extra/color_options.txt")).readlines()
-    print("{}".format(", ".join([c.strip() for c in colours])))
+    colours = open(os.path.join(parent_directory, "extra/color_options_hex.txt"))
+    col_hex = {}
+
+    for line in colours:
+        key, value = line.split(",")
+        col_hex[key] = value.strip()
+    print("Valid colors: {}".format("\n".join([c.strip() for c in list(col_hex.keys())])))
     sys.exit(0)
 
 
 def list_colormaps():
-    print("{}".format(", ".join([c.strip() for c in cm.cmap_d.keys()])))
+    print('Valid colormaps:\nGreys,YlGnBu,Greens,YlOrRd,Bluered,RdBu,Reds,Blues,Picnic,'
+          'Rainbow,Portland,Jet,Hot,Blackbody,Earth,Electric,Viridis,Cividis')
     sys.exit(0)
 
 
@@ -293,52 +310,26 @@ def init_logs(args, tool="NanoPlot"):
     return logname
 
 
-def chunks(values, chunks):
-    if values:
-        chunksize = int(len(values) / chunks)
-        return([' '.join(values[i:i + chunksize]) for i in range(0, len(values), chunksize)])
+def subsample_datasets(df, minimal=10000):
+    if 'dataset' in df:
+        list_df = []
+
+        for d in df["dataset"].unique():
+            dataset = df.loc[df['dataset'] == d]
+
+            if len(dataset.index) < minimal:
+                list_df.append(dataset)
+
+            else:
+                list_df.append(dataset.sample(minimal))
+
+        subsampled_df = pd.concat(list_df, ignore_index=True)
+
     else:
-        return [" "] * chunks
-
-
-def stats2html(statsf):
-    df = pd.read_csv(statsf, sep=':', header=None, names=['feature', 'value'])
-    values = df["value"].str.strip().str.replace('\t', ' ').str.split().replace(np.nan, '')
-    num = len(values[0]) or 1
-    v = [chunks(i, num) for i in values]
-    return pd.DataFrame(v, index=df["feature"]).to_html(header=False)
-
-
-html_head = """<!DOCTYPE html>
-<html>
-    <head>
-    <meta charset="UTF-8">
-        <style>
-        table, th, td {
-            text-align: left;
-            padding: 2px;
-            /* border: 1px solid black;
-            border-collapse: collapse; */
-        }
-        h2 {
-            line-height: 0pt;
-        }
-        .panel {
-            display: inline-block;
-            background: #ffffff;
-            min-height: 100px;
-            box-shadow:0px 0px 5px 5px #C9C9C9;
-            -webkit-box-shadow:2px 2px 5px 5x #C9C9C9;
-            -moz-box-shadow:2px 2px 5px 5px #C9C9C9;
-            margin: 10px;
-            padding: 10px;
-        }
-        .panelC {
-            float: left
-        }
-        .panelM {
-            float: left
-        }
-        </style>
-        <title>NanoPlot Report</title>
-    </head>"""
+        if len(df.index) < minimal:
+            subsampled_df = df
+
+        else:
+            subsampled_df = df.sample(minimal)
+
+    return subsampled_df


=====================================
nanoplot/version.py
=====================================
@@ -1 +1 @@
-__version__ = "1.30.1"
+__version__ = "1.36.2"


=====================================
nanoplotter/nanoplotter_main.py
=====================================
@@ -23,24 +23,19 @@ spatialHeatmap(array, title, path, color, format)
 
 """
 
-
+import plotly.graph_objs as go
+import plotly
 import logging
 import sys
+import os
 import pandas as pd
 import numpy as np
-from collections import namedtuple
 from nanoplotter.plot import Plot
-import matplotlib as mpl
-mpl.use('Agg')
-import matplotlib.pyplot as plt
-from matplotlib import colors as mcolors
-import seaborn as sns
-from pauvre.marginplot import margin_plot
-from nanoplotter.timeplots import time_plots
+import plotly.express as px
+import plotly.figure_factory as ff
 from nanoplotter.spatial_heatmap import spatial_heatmap
-from matplotlib import cm
-import plotly
-import plotly.graph_objs as go
+from nanoplotter.timeplots import time_plots
+import re
 
 
 def check_valid_color(color):
@@ -48,9 +43,15 @@ def check_valid_color(color):
 
     If color is invalid the default is returned.
     """
-    if color in list(mcolors.CSS4_COLORS.keys()) + ["#4CB391"]:
+    colors, _ = colors_and_colormaps()
+    if color in colors:
+        logging.info("NanoPlot:  Valid color {}.".format(color))
+        return colors.get(color)
+
+    elif re.search(r'^#(?:[0-9a-fA-F]{3}){1,2}$', color):
         logging.info("NanoPlot:  Valid color {}.".format(color))
         return color
+
     else:
         logging.info("NanoPlot:  Invalid color {}, using default.".format(color))
         sys.stderr.write("Invalid color {}, using default.\n".format(color))
@@ -62,7 +63,8 @@ def check_valid_colormap(colormap):
 
     If colormap is invalid the default is returned.
     """
-    if colormap in list(cm.cmap_d.keys()):
+    _, colormaps = colors_and_colormaps()
+    if colormap in colormaps:
         logging.info("NanoPlot:  Valid colormap {}.".format(colormap))
         return colormap
     else:
@@ -71,30 +73,122 @@ def check_valid_colormap(colormap):
         return "Greens"
 
 
-def check_valid_format(figformat):
-    """Check if the specified figure format is valid.
-
-    If format is invalid the default is returned.
-    Probably installation-dependent
+def scatter(x, y, legacy, names, path, plots, color="#4CB391", colormap="Greens",
+            stat=None, log=False, minvalx=0, minvaly=0, title=None, xmax=None, ymax=None):
+    """->
+    create marginalised scatterplots and KDE plot with marginalized histograms
+    -> update from scatter_legacy function to utilise plotly package
+    - scatterplot with histogram on both axes
+    - kernel density plot with histograms on both axes
+    - hexbin not implemented yet
+    - pauvre plot temporarily not available
     """
-    fig = plt.figure()
-    if figformat in list(fig.canvas.get_supported_filetypes().keys()):
-        logging.info("NanoPlot:  valid output format {}".format(figformat))
-        return figformat
-    else:
-        logging.info("NanoPlot:  invalid output format {}".format(figformat))
-        sys.stderr.write("Invalid format {}, using default.\n".format(figformat))
-        return "png"
+    logging.info("NanoPlot:  Creating {} vs {} plots using statistics from {} reads.".format(
+        names[0], names[1], x.size))
+    if not contains_variance([x, y], names):
+        return []
 
+    plots_made = []
+    idx = np.random.choice(x.index, min(10000, len(x)), replace=False)
+    maxvalx = xmax or np.amax(x[idx])
+    maxvaly = ymax or np.amax(y[idx])
 
-def plot_settings(plot_settings, dpi):
-    sns.set(**plot_settings)
-    mpl.rcParams['savefig.dpi'] = dpi
+    if plots["dot"]:
+        if log:
+            dot_plot = Plot(
+                path=path + "_loglength_dot.html",
+                title=f"{names[0]} vs {names[1]} plot using dots "
+                      "after log transformation of read lengths")
+        else:
+            dot_plot = Plot(
+                path=path + "_dot.html",
+                title=f"{names[0]} vs {names[1]} plot using dots")
+
+        fig = px.scatter(x=x[idx], y=y[idx], marginal_x="histogram", marginal_y="histogram",
+                         range_x=[minvalx, maxvalx], range_y=[minvaly, maxvaly])
+        fig.update_traces(marker=dict(color=color))
+        fig.update_yaxes(rangemode="tozero")
+        fig.update_xaxes(rangemode="tozero")
 
+        fig.update_layout(xaxis_title=names[0],
+                          yaxis_title=names[1],
+                          title=title or dot_plot.title,
+                          title_x=0.5)
+
+        if log:
+            ticks = [10 ** i for i in range(10) if not 10 ** i > 10 * (10 ** maxvalx)]
+            fig.update_layout(
+                xaxis=dict(
+                    tickmode='array',
+                    tickvals=np.log10(ticks),
+                    ticktext=ticks,
+                    tickangle=45
+                )
+            )
+
+        dot_plot.fig = fig
+        dot_plot.html = dot_plot.fig.to_html(full_html=False, include_plotlyjs='cdn')
+        dot_plot.save()
+        plots_made.append(dot_plot)
 
-def scatter(x, y, names, path, plots, color="#4CB391", figformat="png",
-            stat=None, log=False, minvalx=0, minvaly=0, title=None,
-            plot_settings={}, xmax=None, ymax=None):
+    if plots["kde"]:
+        if log:
+            kde_plot = Plot(
+                path=path + "_loglength_kde.html",
+                title="{} vs {} plot using a kernel density estimation "
+                      "after log transformation of read lengths".format(names[0], names[1]))
+        else:
+            kde_plot = Plot(
+                path=path + "_kde.html",
+                title="{} vs {} plot using a kernel density estimation".format(names[0], names[1]))
+
+        col = hex_to_rgb_scale_0_1(color)
+        fig = ff.create_2d_density(x[idx], y[idx], point_size=3,
+                                   hist_color=col,
+                                   point_color=col,
+                                   colorscale=colormap, width=1870)
+
+        fig.update_layout(xaxis_title=names[0],
+                          yaxis_title=names[1],
+                          title=title or kde_plot.title,
+                          title_x=0.5,
+                          xaxis=dict(tickangle=45))
+
+        if log:
+            ticks = [10 ** i for i in range(10) if not 10 ** i > 10 * (10 ** maxvalx)]
+            fig.update_layout(
+                xaxis=dict(
+                    tickmode='array',
+                    tickvals=np.log10(ticks),
+                    ticktext=ticks,
+                    tickangle=45
+                )
+            )
+
+        kde_plot.fig = fig
+        kde_plot.html = kde_plot.fig.to_html(full_html=False, include_plotlyjs='cdn')
+        kde_plot.save()
+        plots_made.append(kde_plot)
+
+    if 1 in legacy.values():
+        plots_made += scatter_legacy(x=x[idx],
+                                     y=y[idx],
+                                     names=names,
+                                     path=path,
+                                     plots=legacy,
+                                     color=color,
+                                     figformat="png",
+                                     stat=stat,
+                                     log=log,
+                                     minvalx=minvalx,
+                                     minvaly=minvaly,
+                                     title=title)
+    return plots_made
+
+
+def scatter_legacy(x, y, names, path, plots, color="#4CB391", figformat="png",
+                   stat=None, log=False, minvalx=0, minvaly=0, title=None,
+                   xmax=None, ymax=None):
     """Create bivariate plots.
 
     Create four types of bivariate plots of x vs y, containing marginal summaries
@@ -103,15 +197,25 @@ def scatter(x, y, names, path, plots, color="#4CB391", figformat="png",
     -A kernel density plot with density curves on axes
     -A pauvre-style plot using code from https://github.com/conchoecia/pauvre
     """
+    try:
+        import matplotlib as mpl
+        mpl.use('Agg')
+        import seaborn as sns
+        import matplotlib.pyplot as plt
+    except ImportError:
+        sys.stderr("need additional modules when running with --legacy")
+        return []
+
     logging.info("NanoPlot:  Creating {} vs {} plots using statistics from {} reads.".format(
         names[0], names[1], x.size))
     if not contains_variance([x, y], names):
         return []
-    sns.set(style="ticks", **plot_settings)
+    sns.set(style="ticks")
     maxvalx = xmax or np.amax(x)
     maxvaly = ymax or np.amax(y)
 
     plots_made = []
+    path = path + "_legacy"
 
     if plots["hex"]:
         if log:
@@ -135,7 +239,7 @@ def scatter(x, y, names, path, plots, color="#4CB391", figformat="png",
             height=10)
         plot.set_axis_labels(names[0], names[1])
         if log:
-            ticks = [10**i for i in range(10) if not 10**i > 10 * (10**maxvalx)]
+            ticks = [10 ** i for i in range(10) if not 10 ** i > 10 * (10 ** maxvalx)]
             plot.ax_joint.set_xticks(np.log10(ticks))
             plot.ax_marg_x.set_xticks(np.log10(ticks))
             plot.ax_joint.set_xticklabels(ticks)
@@ -145,7 +249,7 @@ def scatter(x, y, names, path, plots, color="#4CB391", figformat="png",
         hex_plot.save(format=figformat)
         plots_made.append(hex_plot)
 
-    sns.set(style="darkgrid", **plot_settings)
+    sns.set(style="darkgrid")
     if plots["dot"]:
         if log:
             dot_plot = Plot(
@@ -169,7 +273,7 @@ def scatter(x, y, names, path, plots, color="#4CB391", figformat="png",
             joint_kws={"s": 1})
         plot.set_axis_labels(names[0], names[1])
         if log:
-            ticks = [10**i for i in range(10) if not 10**i > 10 * (10**maxvalx)]
+            ticks = [10 ** i for i in range(10) if not 10 ** i > 10 * (10 ** maxvalx)]
             plot.ax_joint.set_xticks(np.log10(ticks))
             plot.ax_marg_x.set_xticks(np.log10(ticks))
             plot.ax_joint.set_xticklabels(ticks)
@@ -180,155 +284,179 @@ def scatter(x, y, names, path, plots, color="#4CB391", figformat="png",
         plots_made.append(dot_plot)
 
     if plots["kde"]:
-        idx = np.random.choice(x.index, min(2000, len(x)), replace=False)
-        if log:
-            kde_plot = Plot(
-                path=path + "_loglength_kde." + figformat,
-                title="{} vs {} plot using a kernel density estimation "
-                      "after log transformation of read lengths".format(names[0], names[1]))
+        if len(x) > 2:
+            idx = np.random.choice(x.index, min(2000, len(x)), replace=False)
+            if log:
+                kde_plot = Plot(
+                    path=path + "_loglength_kde." + figformat,
+                    title="{} vs {} plot using a kernel density estimation "
+                          "after log transformation of read lengths".format(names[0], names[1]))
+            else:
+                kde_plot = Plot(
+                    path=path + "_kde." + figformat,
+                    title=f"{names[0]} vs {names[1]} plot using a kernel density estimation")
+            plot = sns.jointplot(
+                x=x[idx],
+                y=y[idx],
+                kind="kde",
+                clip=((0, np.Inf), (0, np.Inf)),
+                xlim=(minvalx, maxvalx),
+                ylim=(minvaly, maxvaly),
+                space=0,
+                color=color,
+                stat_func=stat,
+                shade_lowest=False,
+                height=10)
+            plot.set_axis_labels(names[0], names[1])
+            if log:
+                ticks = [10 ** i for i in range(10) if not 10 ** i > 10 * (10 ** maxvalx)]
+                plot.ax_joint.set_xticks(np.log10(ticks))
+                plot.ax_marg_x.set_xticks(np.log10(ticks))
+                plot.ax_joint.set_xticklabels(ticks)
+            plt.subplots_adjust(top=0.90)
+            plot.fig.suptitle(title or "{} vs {} plot".format(names[0], names[1]), fontsize=25)
+            kde_plot.fig = plot
+            kde_plot.save(format=figformat)
+            plots_made.append(kde_plot)
         else:
-            kde_plot = Plot(
-                path=path + "_kde." + figformat,
-                title="{} vs {} plot using a kernel density estimation".format(names[0], names[1]))
-        plot = sns.jointplot(
-            x=x[idx],
-            y=y[idx],
-            kind="kde",
-            clip=((0, np.Inf), (0, np.Inf)),
-            xlim=(minvalx, maxvalx),
-            ylim=(minvaly, maxvaly),
-            space=0,
-            color=color,
-            stat_func=stat,
-            shade_lowest=False,
-            height=10)
-        plot.set_axis_labels(names[0], names[1])
-        if log:
-            ticks = [10**i for i in range(10) if not 10**i > 10 * (10**maxvalx)]
-            plot.ax_joint.set_xticks(np.log10(ticks))
-            plot.ax_marg_x.set_xticks(np.log10(ticks))
-            plot.ax_joint.set_xticklabels(ticks)
-        plt.subplots_adjust(top=0.90)
-        plot.fig.suptitle(title or "{} vs {} plot".format(names[0], names[1]), fontsize=25)
-        kde_plot.fig = plot
-        kde_plot.save(format=figformat)
-        plots_made.append(kde_plot)
-
-    if plots["pauvre"] and names == ['Read lengths', 'Average read quality'] and log is False:
-        pauvre_plot = Plot(
-            path=path + "_pauvre." + figformat,
-            title="{} vs {} plot using pauvre-style @conchoecia".format(names[0], names[1]))
-        sns.set(style="white", **plot_settings)
-        margin_plot(df=pd.DataFrame({"length": x, "meanQual": y}),
-                    Y_AXES=False,
-                    title=title or "Length vs Quality in Pauvre-style",
-                    plot_maxlen=None,
-                    plot_minlen=0,
-                    plot_maxqual=None,
-                    plot_minqual=0,
-                    lengthbin=None,
-                    qualbin=None,
-                    BASENAME="whatever",
-                    path=pauvre_plot.path,
-                    fileform=[figformat],
-                    dpi=600,
-                    TRANSPARENT=True,
-                    QUIET=True)
-        plots_made.append(pauvre_plot)
+            sys.stderr.write("Not enough observations (reads) to create a kde plot.\n")
+            logging.info("NanoPlot: Not enough observations (reads) to create a kde plot")
     plt.close("all")
     return plots_made
 
 
+# def pauvre_plot():
+#     from pauvre.marginplot import margin_plot
+#     if plots["pauvre"] and names == ['Read lengths', 'Average read quality'] and log is False:
+#         pauvre_plot = Plot(
+#             path=path + "_pauvre." + figformat,
+#             title="{} vs {} plot using pauvre-style @conchoecia".format(names[0], names[1]))
+#         sns.set(style="white")
+#         margin_plot(df=pd.DataFrame({"length": x, "meanQual": y}),
+#                     Y_AXES=False,
+#                     title=title or "Length vs Quality in Pauvre-style",
+#                     plot_maxlen=None,
+#                     plot_minlen=0,
+#                     plot_maxqual=None,
+#                     plot_minqual=0,
+#                     lengthbin=None,
+#                     qualbin=None,
+#                     BASENAME="whatever",
+#                     path=pauvre_plot.path,
+#                     fileform=[figformat],
+#                     dpi=600,
+#                     TRANSPARENT=True,
+#                     QUIET=True)
+#         plots_made.append(pauvre_plot)
+
+
 def contains_variance(arrays, names):
     """
     Make sure both arrays for bivariate ("scatter") plot have a stddev > 0
     """
     for ar, name in zip(arrays, names):
         if np.std(ar) == 0:
-            sys.stderr.write(
-                "No variation in '{}', skipping bivariate plots.\n".format(name.lower()))
-            logging.info("NanoPlot:  No variation in {}, skipping bivariate plot".format(name))
+            sys.stderr.write(f"No variation in '{name.lower()}', skipping bivariate plots.\n")
+            logging.info(f"NanoPlot: No variation in {name}, skipping bivariate plot")
             return False
     else:
         return True
 
 
-def length_plots(array, name, path, title=None, n50=None, color="#4CB391", figformat="png"):
+def length_plots(array, name, path, title=None, n50=None, color="#4CB391"):
     """Create histogram of normal and log transformed read lengths."""
     logging.info("NanoPlot:  Creating length plots for {}.".format(name))
     maxvalx = np.amax(array)
     if n50:
-        logging.info("NanoPlot:  Using {} reads with read length N50 of {}bp and maximum of {}bp."
+        logging.info("NanoPlot: Using {} reads with read length N50 of {}bp and maximum of {}bp."
                      .format(array.size, n50, maxvalx))
     else:
-        logging.info("NanoPlot:  Using {} reads maximum of {}bp.".format(array.size, maxvalx))
+        logging.info(f"NanoPlot:  Using {array.size} reads maximum of {maxvalx}bp.")
 
     plots = []
-    HistType = namedtuple('HistType', 'weight name ylabel')
-    for h_type in [HistType(None, "", "Number of reads"),
-                   HistType(array, "Weighted ", "Number of bases")]:
+
+    HistType = [{'weight': array, 'name': 'Weighted', 'ylabel': 'Number of reads'},
+                {'weight': None, 'name': 'Non weighted', 'ylabel': 'Number of reads'}]
+
+    for h_type in HistType:
         histogram = Plot(
-            path=path + h_type.name.replace(" ", "_") + "Histogram" +
-            name.replace(' ', '') + "." + figformat,
-            title=h_type.name + "Histogram of read lengths")
-        ax = sns.distplot(
-            a=array,
-            kde=False,
-            hist=True,
-            bins=max(round(int(maxvalx) / 500), 10),
-            color=color,
-            hist_kws=dict(weights=h_type.weight,
-                          edgecolor=color,
-                          linewidth=0.2,
-                          alpha=0.8))
+            path=path + h_type["name"].replace(" ", "_") + "Histogram" +
+                 name.replace(' ', '') + ".html",
+            title=f"{h_type['name']} histogram of read lengths")
+
+        hist, bin_edges = np.histogram(array,
+                                       bins=max(round(int(maxvalx) / 500), 10),
+                                       weights=h_type["weight"])
+
+        fig = go.Figure()
+
+        fig.add_trace(go.Bar(x=bin_edges[1:],
+                             y=hist,
+                             marker_color=color))
+
         if n50:
-            plt.axvline(n50)
-            plt.annotate('N50', xy=(n50, np.amax([h.get_height() for h in ax.patches])), size=8)
-        ax.set(
-            xlabel='Read length',
-            ylabel=h_type.ylabel,
-            title=title or histogram.title)
-        plt.ticklabel_format(style='plain', axis='y')
-        histogram.fig = ax.get_figure()
-        histogram.save(format=figformat)
-        plt.close("all")
+            fig.add_vline(n50)
+            fig.add_annotation(text='N50', x=n50, y=0.95)
+            fig.update_annotations(font_size=8)
+
+        fig.update_layout(xaxis_title='Read length',
+                          yaxis_title=h_type["ylabel"],
+                          title=title or histogram.title,
+                          title_x=0.5)
+
+        histogram.fig = fig
+        histogram.html = histogram.fig.to_html(full_html=False, include_plotlyjs='cdn')
+        histogram.save()
 
         log_histogram = Plot(
-            path=path + h_type.name.replace(" ", "_") + "LogTransformed_Histogram" +
-            name.replace(' ', '') + "." + figformat,
-            title=h_type.name + "Histogram of read lengths after log transformation")
-        ax = sns.distplot(
-            a=np.log10(array),
-            kde=False,
-            hist=True,
-            color=color,
-            hist_kws=dict(weights=h_type.weight,
-                          edgecolor=color,
-                          linewidth=0.2,
-                          alpha=0.8))
-        ticks = [10**i for i in range(10) if not 10**i > 10 * maxvalx]
-        ax.set(
-            xticks=np.log10(ticks),
-            xticklabels=ticks,
-            xlabel='Read length',
-            ylabel=h_type.ylabel,
-            title=title or log_histogram.title)
+            path=path + h_type["name"].replace(" ", "_") + "LogTransformed_Histogram" +
+                 name.replace(' ', '') + ".html",
+            title=h_type["name"] + " histogram of read lengths after log transformation")
+
+        if h_type["weight"] is None:
+            hist_log, bin_edges_log = np.histogram(np.log10(array),
+                                                   bins=max(round(int(maxvalx) / 500), 10),
+                                                   weights=h_type["weight"])
+
+        else:
+            hist_log, bin_edges_log = np.histogram(np.log10(array),
+                                                   bins=max(round(int(maxvalx) / 500), 10),
+                                                   weights=np.log10(h_type["weight"]))
+
+        fig = go.Figure()
+        fig.add_trace(go.Bar(x=bin_edges_log[1:],
+                             y=hist_log,
+                             marker_color=color))
+
+        ticks = [10 ** i for i in range(10) if not 10 ** i > 10 * maxvalx]
+
+        fig.update_layout(
+            xaxis=dict(
+                tickmode='array',
+                tickvals=np.log10(ticks),
+                ticktext=ticks),
+            xaxis_title='Read length',
+            yaxis_title=h_type["ylabel"],
+            title=title or log_histogram.title,
+            title_x=0.5)
+
         if n50:
-            plt.axvline(np.log10(n50))
-            plt.annotate('N50', xy=(np.log10(n50), np.amax(
-                [h.get_height() for h in ax.patches])), size=8)
-        plt.ticklabel_format(style='plain', axis='y')
-        log_histogram.fig = ax.get_figure()
-        log_histogram.save(format=figformat)
-        plt.close("all")
+            fig.add_vline(np.log10(n50))
+            fig.add_annotation(text='N50', x=np.log10(n50), y=0.95)
+            fig.update_annotations(font_size=8)
+
+        log_histogram.fig = fig
+        log_histogram.html = log_histogram.fig.to_html(full_html=False, include_plotlyjs='cdn')
+        log_histogram.save()
+
         plots.extend([histogram, log_histogram])
-    plots.append(dynamic_histogram(array=array, name=name, path=path, title=title, color=color))
+
     plots.append(yield_by_minimal_length_plot(array=array,
                                               name=name,
                                               path=path,
                                               title=title,
-                                              color=color,
-                                              figformat=figformat))
+                                              color=color))
+
     return plots
 
 
@@ -337,12 +465,13 @@ def dynamic_histogram(array, name, path, title=None, color="#4CB391"):
     Use plotly to a histogram
     Return html code, but also save as png
     """
-    dynhist = Plot(path=path + "Dynamic_Histogram_{}.html".format(name.replace(' ', '_')),
-                   title=title or "Dynamic histogram of {}".format(name))
+    dynhist = Plot(
+        path=path + f"Dynamic_Histogram_{name[0].lower() + name[1:].replace(' ', '_')}.html",
+        title="Dynamic histogram of {}".format(name[0].lower() + name[1:]))
     ylabel = "Number of reads" if len(array) <= 10000 else "Downsampled number of reads"
     dynhist.html, dynhist.fig = plotly_histogram(array=array.sample(min(len(array), 10000)),
                                                  color=color,
-                                                 title=dynhist.title,
+                                                 title=title or dynhist.title,
                                                  xlabel=name,
                                                  ylabel=ylabel)
     dynhist.save()
@@ -364,35 +493,57 @@ def plotly_histogram(array, color="#4CB391", title=None, xlabel=None, ylabel=Non
     fig = go.Figure(
         {"data": data,
          "layout": go.Layout(barmode='overlay',
-                             title=title)})
+                             title=title,
+                             title_x=0.5)})
     return html, fig
 
 
-def yield_by_minimal_length_plot(array, name, path,
-                                 title=None, color="#4CB391", figformat="png"):
+def yield_by_minimal_length_plot(array, name, path, title=None, color="#4CB391"):
     df = pd.DataFrame(data={"lengths": np.sort(array)[::-1]})
-    df["cumyield_gb"] = df["lengths"].cumsum() / 10**9
+    df["cumyield_gb"] = df["lengths"].cumsum() / 10 ** 9
+    idx = np.random.choice(array.index, min(10000, len(array)), replace=False)
+
     yield_by_length = Plot(
-        path=path + "Yield_By_Length." + figformat,
+        path=path + "Yield_By_Length.html",
         title="Yield by length")
-    ax = sns.regplot(
-        x='lengths',
-        y="cumyield_gb",
-        data=df,
-        x_ci=None,
-        fit_reg=False,
-        color=color,
-        scatter_kws={"s": 3})
-    ax.set(
-        xlabel='Read length',
-        ylabel='Cumulative yield for minimal length',
-        title=title or yield_by_length.title)
-    yield_by_length.fig = ax.get_figure()
-    yield_by_length.save(format=figformat)
-    plt.close("all")
+
+    fig = px.scatter(x=df.reindex(idx)["lengths"], y=df.reindex(idx)["cumyield_gb"])
+    fig.update_traces(marker=dict(color=color))
+    fig.update_layout(xaxis_title='Read length',
+                      yaxis_title='Cumulative yield for minimal length [Gb]',
+                      title=title or yield_by_length.title,
+                      title_x=0.5)
+
+    yield_by_length.fig = fig
+    yield_by_length.html = yield_by_length.fig.to_html(full_html=False, include_plotlyjs='cdn')
+    yield_by_length.save()
+
     return yield_by_length
 
 
+def colors_and_colormaps():
+    colormaps = ('Greys,YlGnBu,Greens,YlOrRd,Bluered,RdBu,Reds,Blues,Picnic,Rainbow,Portland,Jet,'
+                 'Hot,Blackbody,Earth,Electric,Viridis,Cividis').split(',')
+    parent_directory = os.path.dirname(os.path.abspath(os.path.dirname(__file__)))
+    colours = open(os.path.join(parent_directory, "extra/color_options_hex.txt"))
+    col_hex = {}
+
+    for line in colours:
+        key, value = line.split(",")
+        col_hex[key] = value.strip()
+
+    return col_hex, colormaps
+
+
+def hex_to_rgb_scale_0_1(hexcolor):
+    color = hexcolor.lstrip("#")
+    RGB_color = tuple(int(color[x:x + 2], 16) for x in (0, 2, 4))
+
+    RGB_color = [x / 255 for x in RGB_color]
+
+    return tuple(RGB_color)
+
+
 def run_tests():
     import pickle
     df = pickle.load(open("nanotest/sequencing_summary.pickle", "rb"))
@@ -401,17 +552,15 @@ def run_tests():
         y=df["quals"],
         names=['Read lengths', 'Average read quality'],
         path="LengthvsQualityScatterPlot",
-        plots={'dot': 1, 'kde': 1, 'hex': 1, 'pauvre': 1},
-        plot_settings=dict(font_scale=1))
+        plots={'dot': 1, 'kde': 1})
     time_plots(
         df=df,
-        path=".",
-        color="#4CB391",
-        plot_settings=dict(font_scale=1))
+        path="./",
+        color="#4CB391")
     length_plots(
         array=df["lengths"],
         name="lengths",
-        path=".")
+        path="./")
     spatial_heatmap(
         array=df["channelIDs"],
         title="Number of reads generated per channel",


=====================================
nanoplotter/plot.py
=====================================
@@ -1,8 +1,10 @@
-import plotly.io as pio
+import os
 from base64 import b64encode
 from io import BytesIO
+from pathlib import Path
 from urllib.parse import quote as urlquote
 import sys
+from kaleido.scopes.plotly import PlotlyScope
 import logging
 
 
@@ -40,7 +42,16 @@ class Plot(object):
         if self.html:
             with open(self.path, 'w') as html_out:
                 html_out.write(self.html)
-            self.save_static()
+            try:
+                self.save_static()
+            except (AttributeError, ValueError) as e:
+                p = os.path.splitext(self.path)[0]+".png"
+                if os.path.exists(p):
+                    os.remove(p)
+
+                logging.warning("No static plots are saved due to some kaleido problem:")
+                logging.warning(e)
+
         elif self.fig:
             self.fig.savefig(
                 fname=self.path,
@@ -56,9 +67,6 @@ class Plot(object):
             sys.stderr.write(".show not implemented for Plot instance without fig attribute!")
 
     def save_static(self):
-        try:
-            pio.write_image(self.fig, self.path.replace('html', 'png'))
-        except ValueError as e:
-            logging.warning("Nanoplotter: orca not found, not creating static image of html. "
-                            "See https://github.com/plotly/orca")
-            logging.warning(e, exc_info=True)
+        scope = PlotlyScope()
+        with open(self.path.replace('html', 'png'), "wb") as f:
+            f.write(scope.transform(self.fig, format="png"))


=====================================
nanoplotter/spatial_heatmap.py
=====================================
@@ -1,9 +1,8 @@
 import numpy as np
 import logging
 from nanoplotter.plot import Plot
-import matplotlib.pyplot as plt
-import seaborn as sns
 import pandas as pd
+import plotly.graph_objects as go
 
 
 class Layout(object):
@@ -64,28 +63,30 @@ def make_layout(maxval):
             flowcell='PromethION')
 
 
-def spatial_heatmap(array, path, title=None, color="Greens", figformat="png"):
+def spatial_heatmap(array, path, colormap, title=None):
     """Taking channel information and creating post run channel activity plots."""
     logging.info("Nanoplotter: Creating heatmap of reads per channel using {} reads."
                  .format(array.size))
+
     activity_map = Plot(
-        path=path + "." + figformat,
+        path=path + ".html",
         title="Number of reads generated per channel")
+
     layout = make_layout(maxval=np.amax(array))
     valueCounts = pd.value_counts(pd.Series(array))
+
     for entry in valueCounts.keys():
         layout.template[np.where(layout.structure == entry)] = valueCounts[entry]
-    plt.figure()
-    ax = sns.heatmap(
-        data=pd.DataFrame(layout.template, index=layout.yticks, columns=layout.xticks),
-        xticklabels="auto",
-        yticklabels="auto",
-        square=True,
-        cbar_kws={"orientation": "horizontal"},
-        cmap=color,
-        linewidths=0.20)
-    ax.set_title(title or activity_map.title)
-    activity_map.fig = ax.get_figure()
-    activity_map.save(format=figformat)
-    plt.close("all")
+
+    data = pd.DataFrame(layout.template, index=layout.yticks, columns=layout.xticks)
+
+    fig = go.Figure(data=go.Heatmap(z=data.values.tolist(), colorscale=colormap))
+    fig.update_layout(xaxis_title='Channel',
+                      yaxis_title='Number of reads',
+                      title=title or activity_map.title,
+                      title_x=0.5)
+
+    activity_map.fig = fig
+    activity_map.html = activity_map.fig.to_html(full_html=False, include_plotlyjs='cdn')
+    activity_map.save()
     return [activity_map]


=====================================
nanoplotter/timeplots.py
=====================================
@@ -2,14 +2,14 @@ import sys
 import logging
 from nanoplotter.plot import Plot
 from datetime import timedelta
-import seaborn as sns
-import matplotlib.pyplot as plt
 from math import ceil
 import pandas as pd
 import numpy as np
+import plotly.graph_objs as go
+import plotly.express as px
 
 
-def check_valid_time_and_sort(df, timescol, days=5, warning=True):
+def check_valid_time_and_sort(df, timescol="start_time", days=5, warning=True):
     """Check if the data contains reads created within the same `days` timeframe.
 
     if not, print warning and only return part of the data which is within `days` days
@@ -34,212 +34,246 @@ def check_valid_time_and_sort(df, timescol, days=5, warning=True):
             .reset_index()
 
 
-def time_plots(df, path, title=None, color="#4CB391", figformat="png",
-               log_length=False, plot_settings=None):
+def time_plots(df, subsampled_df, path, title=None, color="#4CB391", log_length=False):
     """Making plots of time vs read length, time vs quality and cumulative yield."""
-    dfs = check_valid_time_and_sort(df, "start_time")
-    logging.info("Nanoplotter: Creating timeplots using {} reads.".format(len(dfs)))
+
+    logging.info(f"Nanoplotter: Creating timeplots using {len(df)} (full) or "
+                 f"{len(subsampled_df)} (subsampled dataset) reads.")
+    dfs = check_valid_time_and_sort(df)
     cumyields = cumulative_yield(dfs=dfs.set_index("start_time"),
                                  path=path,
-                                 figformat=figformat,
                                  title=title,
                                  color=color)
     reads_pores_over_time = plot_over_time(dfs=dfs.set_index("start_time"),
                                            path=path,
-                                           figformat=figformat,
                                            title=title,
                                            color=color)
-    violins = violin_plots_over_time(dfs=dfs,
+    violins = violin_plots_over_time(dfs=check_valid_time_and_sort(subsampled_df),
                                      path=path,
-                                     figformat=figformat,
                                      title=title,
                                      log_length=log_length,
-                                     plot_settings=plot_settings)
+                                     color=color)
     return cumyields + reads_pores_over_time + violins
 
 
-def violin_plots_over_time(dfs, path, figformat, title,
-                           log_length=False, plot_settings=None):
+def violin_plots_over_time(dfs, path, title, log_length=False, color="#4CB391"):
+
     dfs['timebin'] = add_time_bins(dfs)
     plots = []
+
+    dfs.sort_values("timebin")
+
     plots.append(length_over_time(dfs=dfs,
                                   path=path,
-                                  figformat=figformat,
                                   title=title,
                                   log_length=log_length,
-                                  plot_settings=plot_settings))
+                                  color=color))
     if "quals" in dfs:
         plots.append(quality_over_time(dfs=dfs,
                                        path=path,
-                                       figformat=figformat,
                                        title=title,
-                                       plot_settings=plot_settings))
+                                       color=color))
     if "duration" in dfs:
         plots.append(sequencing_speed_over_time(dfs=dfs,
                                                 path=path,
-                                                figformat=figformat,
                                                 title=title,
-                                                plot_settings=plot_settings))
+                                                color=color))
     return plots
 
 
-def length_over_time(dfs, path, figformat, title, log_length=False, plot_settings={}):
+def length_over_time(dfs, path, title, log_length=False, color="#4CB391"):
     if log_length:
-        time_length = Plot(path=path + "TimeLogLengthViolinPlot." + figformat,
+        time_length = Plot(path=path + "TimeLogLengthViolinPlot.html",
                            title="Violin plot of log read lengths over time")
     else:
-        time_length = Plot(path=path + "TimeLengthViolinPlot." + figformat,
+        time_length = Plot(path=path + "TimeLengthViolinPlot.html",
                            title="Violin plot of read lengths over time")
-    sns.set(style="white", **plot_settings)
-    if log_length:
-        length_column = "log_lengths"
-    else:
-        length_column = "lengths"
+
+    length_column = "log_lengths" if log_length else "lengths"
 
     if "length_filter" in dfs:  # produced by NanoPlot filtering of too long reads
         temp_dfs = dfs[dfs["length_filter"]]
     else:
         temp_dfs = dfs
 
-    ax = sns.violinplot(x="timebin",
-                        y=length_column,
-                        data=temp_dfs,
-                        inner=None,
-                        cut=0,
-                        linewidth=0)
-    ax.set(xlabel='Interval (hours)',
-           ylabel="Read length",
-           title=title or time_length.title)
+    fig = go.Figure()
+
+    fig.add_trace(go.Violin(y=temp_dfs[length_column],
+                            x=temp_dfs["timebin"],
+                            points=False, spanmode="hard",
+                            line_color='black', line_width=1.5,
+                            fillcolor=color, opacity=0.8))
+    fig.update_layout(xaxis_title='Interval (hours)',
+                      yaxis_title='Read length',
+                      title=title or time_length.title,
+                      title_x=0.5)
+
     if log_length:
         ticks = [10**i for i in range(10) if not 10**i > 10 * np.amax(dfs["lengths"])]
-        ax.set(yticks=np.log10(ticks),
-               yticklabels=ticks)
-    plt.xticks(rotation=45, ha='center', fontsize=8)
-    time_length.fig = ax.get_figure()
-    time_length.save(format=figformat)
-    plt.close("all")
+        fig.update_layout(
+            yaxis=dict(
+                tickmode='array',
+                tickvals=np.log10(ticks),
+                ticktext=ticks
+            )
+        )
+
+    fig.update_yaxes(tickangle=45)
+
+    time_length.fig = fig
+    time_length.html = time_length.fig.to_html(full_html=False, include_plotlyjs='cdn')
+    time_length.save()
+
     return time_length
 
 
-def quality_over_time(dfs, path, figformat, title, plot_settings={}):
-    time_qual = Plot(path=path + "TimeQualityViolinPlot." + figformat,
+def quality_over_time(dfs, path, title=None, color="#4CB391"):
+    time_qual = Plot(path=path + "TimeQualityViolinPlot.html",
                      title="Violin plot of quality over time")
-    sns.set(style="white", **plot_settings)
-    ax = sns.violinplot(x="timebin",
-                        y="quals",
-                        data=dfs,
-                        inner=None,
-                        cut=0,
-                        linewidth=0)
-    ax.set(xlabel='Interval (hours)',
-           ylabel="Basecall quality",
-           title=title or time_qual.title)
-    plt.xticks(rotation=45, ha='center', fontsize=8)
-    time_qual.fig = ax.get_figure()
-    time_qual.save(format=figformat)
-    plt.close("all")
+
+    fig = go.Figure()
+
+    fig.add_trace(go.Violin(y=dfs["quals"],
+                            x=dfs["timebin"],
+                            points=False, spanmode="hard",
+                            line_color='black', line_width=1.5,
+                            fillcolor=color, opacity=0.8))
+
+    fig.update_layout(xaxis_title='Interval (hours)',
+                      yaxis_title='Basecall quality',
+                      title=title or time_qual.title,
+                      title_x=0.5)
+
+    fig.update_xaxes(tickangle=45)
+
+    time_qual.fig = fig
+    time_qual.html = time_qual.fig.to_html(full_html=False, include_plotlyjs='cdn')
+    time_qual.save()
+
     return time_qual
 
 
-def sequencing_speed_over_time(dfs, path, figformat, title, plot_settings={}):
-    time_duration = Plot(path=path + "TimeSequencingSpeed_ViolinPlot." + figformat,
+def sequencing_speed_over_time(dfs, path, title, color="#4CB391"):
+    time_duration = Plot(path=path + "TimeSequencingSpeed_ViolinPlot.html",
                          title="Violin plot of sequencing speed over time")
-    sns.set(style="white", **plot_settings)
-    if "timebin" not in dfs:
-        dfs['timebin'] = add_time_bins(dfs)
+
     mask = dfs['duration'] != 0
-    ax = sns.violinplot(x=dfs.loc[mask, "timebin"],
-                        y=dfs.loc[mask, "lengths"] / dfs.loc[mask, "duration"],
-                        inner=None,
-                        cut=0,
-                        linewidth=0)
-    ax.set(xlabel='Interval (hours)',
-           ylabel="Sequencing speed (nucleotides/second)",
-           title=title or time_duration.title)
-    plt.xticks(rotation=45, ha='center', fontsize=8)
-    time_duration.fig = ax.get_figure()
-    time_duration.save(format=figformat)
-    plt.close("all")
+
+    fig = go.Figure()
+
+    fig.add_trace(
+        go.Violin(x=dfs.loc[mask, "timebin"],
+                  y=dfs.loc[mask, "lengths"] / dfs.loc[mask, "duration"],
+                  points=False, spanmode="hard",
+                  line_color='black', line_width=1.5,
+                  fillcolor=color, opacity=0.8))
+
+    fig.update_layout(xaxis_title='Interval (hours)',
+                      yaxis_title='Sequencing speed (nucleotides/second)',
+                      title=title or time_duration.title,
+                      title_x=0.5)
+
+    fig.update_xaxes(tickangle=45)
+
+    time_duration.fig = fig
+    time_duration.html = time_duration.fig.to_html(full_html=False, include_plotlyjs='cdn')
+    time_duration.save()
+
     return time_duration
 
 
 def add_time_bins(dfs, bin_length=3):
     maxtime = dfs["start_time"].max().total_seconds()
     labels = [str(i) + "-" + str(i + bin_length)
-              for i in range(0, 168, bin_length) if not i > (maxtime / 3600)]
+              for i in range(0, 168, bin_length) if not i >= (maxtime / 3600)]
     return pd.cut(x=dfs["start_time"],
                   bins=ceil((maxtime / 3600) / bin_length),
                   labels=labels)
 
 
-def plot_over_time(dfs, path, figformat, title, color):
-    num_reads = Plot(path=path + "NumberOfReads_Over_Time." + figformat,
+def plot_over_time(dfs, path, title, color="#4CB391"):
+    num_reads = Plot(path=path + "NumberOfReads_Over_Time.html",
                      title="Number of reads over time")
     s = dfs.loc[:, "lengths"].resample('10T').count()
-    ax = sns.regplot(x=s.index.total_seconds() / 3600,
-                     y=s,
-                     x_ci=None,
-                     fit_reg=False,
-                     color=color,
-                     scatter_kws={"s": 3})
-    ax.set(xlabel='Run time (hours)',
-           ylabel='Number of reads per 10 minutes',
-           title=title or num_reads.title)
-    num_reads.fig = ax.get_figure()
-    num_reads.save(format=figformat)
-    plt.close("all")
+
+    fig = px.scatter(
+        data_frame=None,
+        x=s.index.total_seconds() / 3600,
+        y=s)
+    fig.update_traces(marker=dict(color=color))
+
+    fig.update_layout(xaxis_title='Run time (hours)',
+                      yaxis_title='Number of reads per 10 minutes',
+                      title=title or num_reads.title,
+                      title_x=0.5)
+
+    num_reads.fig = fig
+    num_reads.html = num_reads.fig.to_html(full_html=False, include_plotlyjs='cdn')
+    num_reads.save()
+
     plots = [num_reads]
 
     if "channelIDs" in dfs:
-        pores_over_time = Plot(path=path + "ActivePores_Over_Time." + figformat,
+        pores_over_time = Plot(path=path + "ActivePores_Over_Time.html",
                                title="Number of active pores over time")
         s = dfs.loc[:, "channelIDs"].resample('10T').nunique()
-        ax = sns.regplot(x=s.index.total_seconds() / 3600,
-                         y=s,
-                         x_ci=None,
-                         fit_reg=False,
-                         color=color,
-                         scatter_kws={"s": 3})
-        ax.set(xlabel='Run time (hours)',
-               ylabel='Active pores per 10 minutes',
-               title=title or pores_over_time.title)
-        pores_over_time.fig = ax.get_figure()
-        pores_over_time.save(format=figformat)
-        plt.close("all")
+
+        fig = px.scatter(
+            data_frame=None,
+            x=s.index.total_seconds() / 3600,
+            y=s)
+        fig.update_traces(marker=dict(color=color))
+
+        fig.update_layout(xaxis_title='Run time (hours)',
+                          yaxis_title='Active pores per 10 minutes',
+                          title=title or pores_over_time.title,
+                          title_x=0.5)
+
+        pores_over_time.fig = fig
+        pores_over_time.html = pores_over_time.fig.to_html(full_html=False, include_plotlyjs='cdn')
+        pores_over_time.save()
+
         plots.append(pores_over_time)
     return plots
 
 
-def cumulative_yield(dfs, path, figformat, title, color):
-    cum_yield_gb = Plot(path=path + "CumulativeYieldPlot_Gigabases." + figformat,
+def cumulative_yield(dfs, path, title, color):
+    cum_yield_gb = Plot(path=path + "CumulativeYieldPlot_Gigabases.html",
                         title="Cumulative yield")
-    s = dfs.loc[:, "lengths"].cumsum().resample('1T').max() / 1e9
-    ax = sns.regplot(x=s.index.total_seconds() / 3600,
-                     y=s,
-                     x_ci=None,
-                     fit_reg=False,
-                     color=color,
-                     scatter_kws={"s": 3})
-    ax.set(xlabel='Run time (hours)',
-           ylabel='Cumulative yield in gigabase',
-           title=title or cum_yield_gb.title)
-    cum_yield_gb.fig = ax.get_figure()
-    cum_yield_gb.save(format=figformat)
-    plt.close("all")
-
-    cum_yield_reads = Plot(path=path + "CumulativeYieldPlot_NumberOfReads." + figformat,
+
+    s = dfs.loc[:, "lengths"].cumsum().resample('10T').max() / 1e9
+
+    fig = px.scatter(
+        x=s.index.total_seconds() / 3600,
+        y=s)
+    fig.update_traces(marker=dict(color=color))
+
+    fig.update_layout(xaxis_title='Run time (hours)',
+                      yaxis_title='Cumulative yield in gigabase',
+                      title=title or cum_yield_gb.title,
+                      title_x=0.5)
+
+    cum_yield_gb.fig = fig
+    cum_yield_gb.html = cum_yield_gb.fig.to_html(full_html=False, include_plotlyjs='cdn')
+    cum_yield_gb.save()
+
+    cum_yield_reads = Plot(path=path + "CumulativeYieldPlot_NumberOfReads.html",
                            title="Cumulative yield")
+
     s = dfs.loc[:, "lengths"].resample('10T').count().cumsum()
-    ax = sns.regplot(x=s.index.total_seconds() / 3600,
-                     y=s,
-                     x_ci=None,
-                     fit_reg=False,
-                     color=color,
-                     scatter_kws={"s": 3})
-    ax.set(xlabel='Run time (hours)',
-           ylabel='Cumulative yield in number of reads',
-           title=title or cum_yield_reads.title)
-    cum_yield_reads.fig = ax.get_figure()
-    cum_yield_reads.save(format=figformat)
-    plt.close("all")
+
+    fig = px.scatter(
+        x=s.index.total_seconds() / 3600,
+        y=s)
+    fig.update_traces(marker=dict(color=color))
+
+    fig.update_layout(xaxis_title='Run time (hours)',
+                      yaxis_title='Cumulative yield in number of reads',
+                      title=title or cum_yield_gb.title,
+                      title_x=0.5)
+
+    cum_yield_reads.fig = fig
+    cum_yield_reads.html = cum_yield_reads.fig.to_html(full_html=False, include_plotlyjs='cdn')
+    cum_yield_reads.save()
+
     return [cum_yield_gb, cum_yield_reads]


=====================================
scripts/test.sh
=====================================
@@ -1,41 +1,50 @@
 set -ev
 
-git clone https://github.com/wdecoster/nanotest.git
+if [ -d "nanotest" ]; then
+    echo "nanotest already cloned"
+else
+    git clone https://github.com/wdecoster/nanotest.git
+fi
 
 NanoPlot -h
-NanoPlot --listcolors
+# NanoPlot --listcolors
 echo ""
 echo ""
 echo ""
 echo "testing bam:"
-NanoPlot --bam nanotest/alignment.bam --verbose
+NanoPlot --bam nanotest/alignment.bam --verbose -o tests
 echo ""
 echo ""
 echo ""
 echo "testing bam without supplementary alignments:"
-NanoPlot --bam nanotest/alignment.bam --verbose --no_supplementary
+NanoPlot --bam nanotest/alignment.bam --verbose --no_supplementary -o tests
 echo ""
 echo ""
 echo ""
 echo "testing summary:"
-NanoPlot --summary nanotest/sequencing_summary.txt --loglength --verbose
+NanoPlot --summary nanotest/sequencing_summary.txt --loglength --verbose -o tests
 echo ""
 echo ""
 echo ""
 echo "testing fastq rich:"
-NanoPlot --fastq_rich nanotest/reads.fastq.gz --verbose --downsample 800
+NanoPlot --fastq_rich nanotest/reads.fastq.gz --verbose --downsample 800 -o tests
 echo ""
 echo ""
 echo ""
 echo "testing fastq minimal:"
-NanoPlot --fastq_minimal nanotest/reads.fastq.gz --store --verbose --plot dot
+NanoPlot --fastq_minimal nanotest/reads.fastq.gz --store --verbose --plots dot -o tests
 echo ""
 echo ""
 echo ""
 echo "testing fastq plain:"
-NanoPlot --fastq nanotest/reads.fastq.gz --verbose --minqual 4 --color red
+NanoPlot --fastq nanotest/reads.fastq.gz --verbose --minqual 4 --color red -o tests
 echo ""
 echo ""
 echo ""
 echo "testing fasta:"
-NanoPlot --fasta nanotest/reads.fa.gz --verbose --maxlength 35000
+NanoPlot --fasta nanotest/reads.fa.gz --verbose --maxlength 35000 -o tests
+echo ""
+echo ""
+echo ""
+# echo "testing feather:"
+# NanoPlot --feather nanotest/summary1.feather --verbose --outdir plots


=====================================
setup.py
=====================================
@@ -21,7 +21,7 @@ setup(
         'Development Status :: 4 - Beta',
         'Intended Audience :: Science/Research',
         'Topic :: Scientific/Engineering :: Bio-Informatics',
-        'License :: OSI Approved :: MIT License',
+        'License :: OSI Approved :: GNU General Public License v3 (GPLv3)',
         'Programming Language :: Python :: 3',
         'Programming Language :: Python :: 3.3',
         'Programming Language :: Python :: 3.4',
@@ -32,16 +32,15 @@ setup(
     python_requires='>=3',
     install_requires=['biopython',
                       'pysam>0.10.0.0',
-                      'pandas>=0.22.0',
-                      'numpy',
+                      'pandas>=1.1.0',
+                      'numpy>=1.16.5',
                       'scipy',
                       'python-dateutil',
-                      'seaborn>=0.10.1',
-                      'matplotlib>=3.1.3',
-                      'nanoget>=1.13.0',
-                      'nanomath>=0.23.1',
-                      "pauvre==0.2.0",
+                      'nanoget>=1.14.0',
+                      'nanomath>=1.0.0',
                       'plotly>=4.1.0',
+                      'pyarrow',
+                      'kaleido'
                       ],
     package_data={'NanoPlot': []},
     package_dir={'nanoplot': 'nanoplot'},



View it on GitLab: https://salsa.debian.org/med-team/nanoplot/-/compare/a72e8e822e0673c8e89db24bc48e3a38e45d172f...6a9fa5980258accf2dd733150b2ac446233c6631

-- 
View it on GitLab: https://salsa.debian.org/med-team/nanoplot/-/compare/a72e8e822e0673c8e89db24bc48e3a38e45d172f...6a9fa5980258accf2dd733150b2ac446233c6631
You're receiving this email because of your account on salsa.debian.org.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://alioth-lists.debian.net/pipermail/debian-med-commit/attachments/20210523/097d75e4/attachment-0001.htm>