[med-svn] [Git][med-team/snakemake][master] 4 commits: Dockerfile: fix remote root-in-container security hole.
Rebecca N. Palmer
gitlab at salsa.debian.org
Mon May 4 12:17:47 BST 2020
Rebecca N. Palmer pushed to branch master at Debian Med / snakemake
Commits:
2705e3b8 by Rebecca N. Palmer at 2020-05-04T11:24:04+01:00
Dockerfile: fix remote root-in-container security hole.
- - - - -
34852dea by Rebecca N. Palmer at 2020-05-04T11:27:44+01:00
New upstream version 5.16.0
- - - - -
27426106 by Rebecca N. Palmer at 2020-05-04T11:28:02+01:00
Update upstream source from tag 'upstream/5.16.0'
Update to upstream version '5.16.0'
with Debian dir ccb919adeba020055e4d3c1e5dc74fc65bc8004e
- - - - -
44380419 by Rebecca N. Palmer at 2020-05-04T12:16:30+01:00
Skip a new conda-using test.
- - - - -
22 changed files:
- CHANGELOG.rst
- debian/changelog
- + debian/patches/security.patch
- debian/patches/series
- debian/rules
- debian/tests/run-unit-test
- + docs/snakefiles/images/snakemake-notebook-demo.gif
- docs/snakefiles/rules.rst
- snakemake/__init__.py
- snakemake/_version.py
- snakemake/dag.py
- snakemake/executors.py
- snakemake/jobs.py
- snakemake/notebook.py
- snakemake/parser.py
- snakemake/rules.py
- snakemake/scheduler.py
- snakemake/script.py
- snakemake/workflow.py
- + tests/test_jupyter_notebook/expected-results/result_final.txt
- + tests/test_jupyter_notebook/expected-results/result_intermediate.txt
- tests/tests.py
Changes:
=====================================
CHANGELOG.rst
=====================================
@@ -1,3 +1,13 @@
+[5.16.0] - 2020-04-29
+=====================
+Added
+-----
+- Interactive jupyter notebook editing. Notebooks defined by rules can be interactively drafted and updated using snakemake --edit-notebook (see docs).
+Changed
+-------
+- Fixed group resource usage to occupy one cluster/cloud node.
+- Minor bug fixes.
+
[5.15.0] - 2020-04-21
=====================
Changed
=====================================
debian/changelog
=====================================
@@ -1,10 +1,12 @@
-snakemake (5.15.0-1) UNRELEASED; urgency=medium
+snakemake (5.16.0-1) unstable; urgency=medium
* New upstream release. Drop / refresh patches.
* Stop using pytest-xdist: it sometimes starts tests from
a non-main thread, which breaks snakemake.
+ * Dockerfile: fix remote root-in-container security hole.
+ * Skip a new conda-using test.
- -- Rebecca N. Palmer <rebecca_palmer at zoho.com> Mon, 27 Apr 2020 12:05:59 +0100
+ -- Rebecca N. Palmer <rebecca_palmer at zoho.com> Mon, 04 May 2020 12:16:03 +0100
snakemake (5.14.0-1) unstable; urgency=medium
=====================================
debian/patches/security.patch
=====================================
@@ -0,0 +1,30 @@
+Description: Fix remote root-in-container security hole, and a link
+
+Repository signatures are only secure if you have the right key
+to check them with
+
+Author: Rebecca N. Palmer <rebecca_palmer at zoho.com>
+Forwarded: https://github.com/snakemake/snakemake/pull/371
+
+diff --git a/Dockerfile b/Dockerfile
+index 095b5335..a3afe66b 100644
+--- a/Dockerfile
++++ b/Dockerfile
+@@ -6,8 +6,8 @@ ENV PATH /opt/conda/bin:${PATH}
+ ENV LANG C.UTF-8
+ ENV SHELL /bin/bash
+ RUN /bin/bash -c "install_packages wget bzip2 ca-certificates gnupg2 squashfs-tools git && \
+- wget -O- http://neuro.debian.net/lists/xenial.us-ca.full > /etc/apt/sources.list.d/neurodebian.sources.list && \
+- wget -O- http://neuro.debian.net/_static/neuro.debian.net.asc | apt-key add - && \
++ wget -O- https://neuro.debian.net/lists/xenial.us-ca.full > /etc/apt/sources.list.d/neurodebian.sources.list && \
++ wget -O- https://neuro.debian.net/_static/neuro.debian.net.asc | apt-key add - && \
+ install_packages singularity-container && \
+ wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh && \
+ bash Miniconda3-latest-Linux-x86_64.sh -b -p /opt/conda && \
+diff --git a/examples/c/README.txt b/examples/c/README.txt
+index 995d30fe..d2eaceec 100644
+--- a/examples/c/README.txt
++++ b/examples/c/README.txt
+@@ -1 +1 @@
+-http://www.cs.colby.edu/maxwell/courses/tutorials/maketutor/
++https://www.cs.colby.edu/maxwell/courses/tutorials/maketutor/
=====================================
debian/patches/series
=====================================
@@ -7,3 +7,4 @@ local_javascript.patch
python3.patch
workaround_sphinx_issue.patch
remove_ccbysa_snippets.patch
+security.patch
=====================================
debian/rules
=====================================
@@ -7,7 +7,7 @@ export HOME=$(CURDIR)/fakehome
export PYBUILD_NAME=snakemake
export PYBUILD_DESTDIR_python3=debian/snakemake
export PYBUILD_BEFORE_TEST_python3=chmod +x {dir}/bin/snakemake; cp -r {dir}/bin {dir}/tests {build_dir}
-export PYBUILD_TEST_ARGS=python{version} -m pytest -v tests/test*.py -k 'not report and not ancient and not test_script and not default_remote and not issue635 and not convert_to_cwl and not issue1083 and not issue1092 and not issue1093 and not test_remote and not test_default_resources and not test_tibanna and not test_github_issue78 and not test_output_file_cache_remote and not test_env_modules and not test_archive and not test_container'
+export PYBUILD_TEST_ARGS=python{version} -m pytest -v tests/test*.py -k 'not report and not ancient and not test_script and not default_remote and not issue635 and not convert_to_cwl and not issue1083 and not issue1092 and not issue1093 and not test_remote and not test_default_resources and not test_tibanna and not test_github_issue78 and not test_output_file_cache_remote and not test_env_modules and not test_archive and not test_container and not test_jupyter_notebook'
# test_report
# test_ancient
@@ -19,7 +19,7 @@ export PYBUILD_TEST_ARGS=python{version} -m pytest -v tests/test*.py -k 'not rep
# test_issue1093 fails due to conda usage; commenting that out and installing bwa produces a different ordering than desired
# test_default_resources and test_remote needs moto to be packaged https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=777089
# test_env_modules relies on "module load" which is not packaged for Debian
-# test_archive uses conda
+# test_archive, test_jupyter_notebook use conda
# test_container uses docker://bash
export PYBUILD_AFTER_TEST_python3=rm -fr {build_dir}/bin {build_dir}/tests {dir}/tests/test_filegraph/.snakemake/ {dir}/tests/linting/*/.snakemake/
=====================================
debian/tests/run-unit-test
=====================================
@@ -17,5 +17,5 @@ cd "${AUTOPKGTEST_TMP}"
export HOME="${AUTOPKGTEST_TMP}"
-python3 -m pytest -v ${ROOT}/tests/test*.py -k 'not report and not ancient and not test_script and not default_remote and not issue635 and not convert_to_cwl and not issue1083 and not issue1092 and not issue1093 and not test_remote and not test_default_resources and not test_singularity and not test_singularity_conda and not test_cwl_singularity and not test_cwl and not test_url_include and not test_tibanna and not test_github_issue78 and not test_output_file_cache_remote and not test_env_modules and not test_archive and not test_container'
+python3 -m pytest -v ${ROOT}/tests/test*.py -k 'not report and not ancient and not test_script and not default_remote and not issue635 and not convert_to_cwl and not issue1083 and not issue1092 and not issue1093 and not test_remote and not test_default_resources and not test_singularity and not test_singularity_conda and not test_cwl_singularity and not test_cwl and not test_url_include and not test_tibanna and not test_github_issue78 and not test_output_file_cache_remote and not test_env_modules and not test_archive and not test_container and not test_jupyter_notebook'
=====================================
docs/snakefiles/images/snakemake-notebook-demo.gif
=====================================
Binary files /dev/null and b/docs/snakefiles/images/snakemake-notebook-demo.gif differ
=====================================
docs/snakefiles/rules.rst
=====================================
@@ -608,18 +608,14 @@ Integration works as follows (note the use of `notebook:` instead of `script:`):
.. code-block:: python
- rule NAME:
- input:
- "path/to/inputfile",
- "path/to/other/inputfile"
+ rule hello:
output:
- "path/to/outputfile",
- "path/to/another/outputfile"
+ "test.txt"
log:
# optional path to the processed notebook
- notebook = "logs/notebooks/processed_notebook.ipynb"
+ notebook="logs/notebooks/processed_notebook.ipynb"
notebook:
- "notebooks/notebook.ipynb"
+ "hello.py.ipynb"
.. note:
@@ -627,15 +623,41 @@ Integration works as follows (note the use of `notebook:` instead of `script:`):
A modular, readable workflow definition with Snakemake, and the ability to quickly explore and plot data with Jupyter.
The benefit will be maximal when integrating many small notebooks that each do a particular job, hence allowing to get away from large monolithic, and therefore unreadable notebooks.
+It is recommended to prefix the ``.ipynb`` suffix with either ``.py`` or ``.r`` to indicate the notebook language.
In the notebook, a snakemake object is available, which can be accessed in the same way as the with :ref:`script integration <snakefiles_external-scripts>`.
In other words, you have access to input files via ``snakemake.input`` (in the Python case) and ``snakemake at input`` (in the R case) etc..
-Hence, integrating a new notebook works by first writing it from scratch in the usual interactive way.
-Then, you replace all hardcoded variables with references to properties of the rule where it shall be integrated, e.g. replacing the path to an input file with ``snakemake.input[0]``.
-Once having moved the notebook to the right place in the pipeline (ideally a subfolder ``notebooks``) and referring to it from the rule, Snakemake will be able to re-execute it while inserting the desired variable values from the rule properties.
-
Optionally it is possible to automatically store the processed notebook.
This can be achieved by adding a named logfile ``notebook=...`` to the ``log`` directive.
+In order to simplify the coding of notebooks given the automatically inserted ``snakemake`` object, Snakemake provides an interactive edit mode for notebook rules.
+Let us assume you have written above rule, but the notebook does not yet exist.
+By running
+
+.. code-block:: console
+
+ snakemake --cores 1 --edit-notebook test.txt
+
+you instruct Snakemake to allow interactive editing of the notebook needed to create the file ``test.txt``.
+Snakemake will run all dependencies of the notebook rule, such that all input files are present.
+Then, it will start a jupyter notebook server with an empty draft of the notebook, in which you can interactively program everything needed for this particular step.
+Once done, you should save the notebook from the jupyter web interface, go to the jupyter dashboard and hit the ``Quit`` button on the top right in order to shut down the jupyter server.
+Snakemake will detect that the server is closed and automatically store the drafted notebook into the path given in the rule (here ``hello.py.ipynb``).
+If the notebook already exists, above procedure can be used to easily modify it.
+Note that Snakemake requires local execution for the notebook edit mode.
+On a cluster or the cloud, you can generate all dependencies of the notebook rule via
+
+.. code-block:: console
+
+ snakemake --cluster ... --jobs 100 --until test.txt
+
+Then, the notebook rule can easily be executed locally.
+An demo of the entire interactive editing process can be found by clicking below:
+
+.. image:: images/snakemake-notebook-demo.gif
+ :scale: 20%
+ :alt: Notebook integration demo
+ :align: center
+
Protected and Temporary Files
-----------------------------
=====================================
snakemake/__init__.py
=====================================
@@ -153,6 +153,7 @@ def snakemake(
show_failed_logs=False,
keep_incomplete=False,
messaging=None,
+ edit_notebook=None,
):
"""Run snakemake on a given snakefile.
@@ -262,7 +263,8 @@ def snakemake(
cluster_status (str): status command for cluster execution. If None, Snakemake will rely on flag files. Otherwise, it expects the command to return "success", "failure" or "running" when executing with a cluster jobid as single argument.
export_cwl (str): Compile workflow to CWL and save to given file
log_handler (function): redirect snakemake output to this custom log handler, a function that takes a log message dictionary (see below) as its only argument (default None). The log message dictionary for the log handler has to following entries:
- keep_incomplete (bool): keep incomplete output files of failed jobs
+ keep_incomplete (bool): keep incomplete output files of failed jobs
+ edit_notebook (object): "notebook.Listen" object to configuring notebook server for interactive editing of a rule notebook. If None, do not edit.
log_handler (list): redirect snakemake output to this list of custom log handler, each a function that takes a log message dictionary (see below) as its only argument (default []). The log message dictionary for the log handler has to following entries:
:level:
@@ -334,10 +336,10 @@ def snakemake(
if updated_files is None:
updated_files = list()
- if cluster or cluster_sync or drmaa or tibanna:
- cores = sys.maxsize
+ if cluster or cluster_sync or drmaa or tibanna or kubernetes:
+ cores = None
else:
- nodes = sys.maxsize
+ nodes = None
if isinstance(cluster_config, str):
# Loading configuration from one file is still supported for
@@ -355,9 +357,15 @@ def snakemake(
cluster_config_content = dict()
run_local = not (cluster or cluster_sync or drmaa or kubernetes or tibanna)
- if run_local and not dryrun:
- # clean up all previously recorded jobids.
- shell.cleanup()
+ if run_local:
+ if not dryrun:
+ # clean up all previously recorded jobids.
+ shell.cleanup()
+ else:
+ if edit_notebook:
+ raise WorkflowError(
+ "Notebook edit mode is only allowed with local execution."
+ )
# force thread use for any kind of cluster
use_threads = (
@@ -491,6 +499,7 @@ def snakemake(
cores=cores,
nodes=nodes,
resources=resources,
+ edit_notebook=edit_notebook,
)
success = True
workflow.include(
@@ -1093,9 +1102,9 @@ def get_argument_parser(profile=None):
),
)
- group_utils = parser.add_argument_group("UTILITIES")
+ group_report = parser.add_argument_group("REPORTS")
- group_utils.add_argument(
+ group_report.add_argument(
"--report",
nargs="?",
const="report.html",
@@ -1106,12 +1115,33 @@ def get_argument_parser(profile=None):
"In the latter case, results are stored along with a file report.html in the zip archive. "
"If no filename is given, an embedded report.html is the default.",
)
- group_utils.add_argument(
+ group_report.add_argument(
"--report-stylesheet",
metavar="CSSFILE",
help="Custom stylesheet to use for report. In particular, this can be used for "
"branding the report with e.g. a custom logo, see docs.",
)
+
+ group_notebooks = parser.add_argument_group("NOTEBOOKS")
+
+ group_notebooks.add_argument(
+ "--edit-notebook",
+ metavar="TARGET",
+ help="Interactively edit the notebook associated with the rule used to generate the given target file. "
+ "This will start a local jupyter notebook server. "
+ "Any changes to the notebook should be saved, and the server has to be stopped by "
+ "closing the notebook and hitting the 'Quit' button on the jupyter dashboard. "
+ "Afterwards, the updated notebook will be automatically stored in the path defined in the rule. "
+ "If the notebook is not yet present, this will create an empty draft. ",
+ )
+ group_notebooks.add_argument(
+ "--notebook-listen",
+ metavar="IP:PORT",
+ default="localhost:8888",
+ help="The IP address and PORT the notebook server used for editing the notebook (--edit-notebook) will listen on.",
+ )
+
+ group_utils = parser.add_argument_group("UTILITIES")
group_utils.add_argument(
"--lint",
nargs="?",
@@ -1121,6 +1151,7 @@ def get_argument_parser(profile=None):
"specific suggestions to improve code quality (work in progress, more lints "
"to be added in the future). If no argument is provided, plain text output is used.",
)
+
group_utils.add_argument(
"--export-cwl",
action="store",
@@ -2102,6 +2133,13 @@ def main(argv=None):
slack_logger = logging.SlackLogger()
log_handler.append(slack_logger.log_handler)
+ if args.edit_notebook:
+ from snakemake import notebook
+
+ args.target = [args.edit_notebook]
+ args.force = True
+ args.edit_notebook = notebook.Listen(args.notebook_listen)
+
success = snakemake(
args.snakefile,
batch=batch,
@@ -2211,6 +2249,7 @@ def main(argv=None):
export_cwl=args.export_cwl,
show_failed_logs=args.show_failed_logs,
keep_incomplete=args.keep_incomplete,
+ edit_notebook=args.edit_notebook,
log_handler=log_handler,
)
=====================================
snakemake/_version.py
=====================================
@@ -22,9 +22,9 @@ def get_keywords():
# setup.py/versioneer.py will grep for the variable names, so they must
# each be defined on a line of their own. _version.py will just call
# get_keywords().
- git_refnames = " (HEAD -> master, tag: v5.15.0)"
- git_full = "6d3aa424cfffaf33f36ca4b6da870ce786ddcb30"
- git_date = "2020-04-21 10:55:47 +0200"
+ git_refnames = " (tag: v5.16.0)"
+ git_full = "7ec74f79067b3392607479c6f3bec4a2e90cbc05"
+ git_date = "2020-04-29 16:01:58 +0200"
keywords = {"refnames": git_refnames, "full": git_full, "date": git_date}
return keywords
=====================================
snakemake/dag.py
=====================================
@@ -319,6 +319,9 @@ class DAG:
self.update_dynamic(job)
self.postprocess()
+ def is_edit_notebook_job(self, job):
+ return self.workflow.edit_notebook and job.targetfile in self.targetfiles
+
@property
def dynamic_output_jobs(self):
"""Iterate over all jobs with dynamic output files."""
=====================================
snakemake/executors.py
=====================================
@@ -203,15 +203,16 @@ class RealExecutor(AbstractExecutor):
handle_touch=True,
ignore_missing_output=False,
):
- job.postprocess(
- upload_remote=upload_remote,
- handle_log=handle_log,
- handle_touch=handle_touch,
- ignore_missing_output=ignore_missing_output,
- latency_wait=self.latency_wait,
- assume_shared_fs=self.assume_shared_fs,
- )
- self.stats.report_job_end(job)
+ if not self.dag.is_edit_notebook_job(job):
+ job.postprocess(
+ upload_remote=upload_remote,
+ handle_log=handle_log,
+ handle_touch=handle_touch,
+ ignore_missing_output=ignore_missing_output,
+ latency_wait=self.latency_wait,
+ assume_shared_fs=self.assume_shared_fs,
+ )
+ self.stats.report_job_end(job)
def handle_job_error(self, job, upload_remote=True):
job.postprocess(
@@ -422,6 +423,7 @@ class CPUExecutor(RealExecutor):
self.workflow.cleanup_scripts,
job.shadow_dir,
job.jobid,
+ self.workflow.edit_notebook,
)
def run_single_job(self, job):
@@ -2025,6 +2027,7 @@ def run_wrapper(
cleanup_scripts,
shadow_dir,
jobid,
+ edit_notebook,
):
"""
Wrapper around the run method that handles exceptions and benchmarking.
@@ -2102,6 +2105,7 @@ def run_wrapper(
bench_iteration,
cleanup_scripts,
passed_shadow_dir,
+ edit_notebook,
)
else:
# The benchmarking is started here as we have a run section
@@ -2129,6 +2133,7 @@ def run_wrapper(
bench_iteration,
cleanup_scripts,
passed_shadow_dir,
+ edit_notebook,
)
# Store benchmark record for this iteration
bench_records.append(bench_record)
@@ -2154,6 +2159,7 @@ def run_wrapper(
None,
cleanup_scripts,
passed_shadow_dir,
+ edit_notebook,
)
except (KeyboardInterrupt, SystemExit) as e:
# Re-raise the keyboard interrupt in order to record an error in the
=====================================
snakemake/jobs.py
=====================================
@@ -1173,6 +1173,7 @@ class GroupJob(AbstractJob):
def resources(self):
if self._resources is None:
self._resources = defaultdict(int)
+ self._resources["_nodes"] = 1
pipe_group = any([job.is_pipe for job in self.jobs])
# iterate over siblings that can be executed in parallel
for siblings in self.toposorted:
=====================================
snakemake/notebook.py
=====================================
@@ -1,34 +1,111 @@
-import os
+import os, sys
+from urllib.error import URLError
+from urllib.parse import urlparse
+import tempfile
+import re
+import shutil
from snakemake.exceptions import WorkflowError
from snakemake.shell import shell
from snakemake.script import get_source, ScriptBase, PythonScript, RScript
+from snakemake.logging import logger
+
+KERNEL_STARTED_RE = re.compile("Kernel started: (?P<kernel_id>\S+)")
+KERNEL_SHUTDOWN_RE = re.compile("Kernel shutdown: (?P<kernel_id>\S+)")
+
+
+class Listen:
+ def __init__(self, arg):
+ self.ip, self.port = arg.split(":")
class JupyterNotebook(ScriptBase):
+
+ editable = True
+
+ def draft(self, listen):
+ import nbformat
+
+ preamble = self.get_preamble()
+ nb = nbformat.v4.new_notebook()
+ self.insert_preamble_cell(preamble, nb)
+
+ nb["cells"].append(nbformat.v4.new_code_cell("# start coding here"))
+
+ with open(self.local_path, "wb") as out:
+ out.write(nbformat.writes(nb).encode())
+
+ self.source = open(self.local_path).read()
+
+ self.evaluate(edit=listen)
+
def write_script(self, preamble, fd):
import nbformat
nb = nbformat.reads(self.source, as_version=4) # nbformat.NO_CONVERT
- preamble_cell = nbformat.v4.new_code_cell(preamble)
- nb["cells"].insert(0, preamble_cell)
+ self.remove_preamble_cell(nb)
+ self.insert_preamble_cell(preamble, nb)
fd.write(nbformat.writes(nb).encode())
- def execute_script(self, fname):
+ def execute_script(self, fname, edit=None):
+ import nbformat
+
fname_out = self.log.get("notebook", None)
- if fname_out is None:
+ if fname_out is None or edit:
output_parameter = ""
else:
fname_out = os.path.join(os.getcwd(), fname_out)
output_parameter = "--output {fname_out:q}"
- cmd_tmp = "jupyter-nbconvert --execute {output_parameter} --to notebook --ExecutePreprocessor.timeout=-1 {{fname:q}}".format(
- output_parameter=output_parameter
- )
+ if edit is not None:
+ logger.info("Opening notebook for editing.")
+ cmd = (
+ "jupyter notebook --log-level ERROR --ip {edit.ip} --port {edit.port} "
+ "--no-browser --NotebookApp.quit_button=True {{fname:q}}".format(
+ edit=edit
+ )
+ )
+ else:
+ cmd = (
+ "jupyter-nbconvert --log-level ERROR --execute {output_parameter} "
+ "--to notebook --ExecutePreprocessor.timeout=-1 {{fname:q}}".format(
+ output_parameter=output_parameter
+ )
+ )
- self._execute_cmd(cmd_tmp, fname_out=fname_out, fname=fname)
+ self._execute_cmd(cmd, fname_out=fname_out, fname=fname)
+
+ if edit:
+ logger.info("Saving modified notebook.")
+ nb = nbformat.read(fname, as_version=4)
+ self.remove_preamble_cell(nb)
+
+ nbformat.write(nb, self.local_path)
+
+ def insert_preamble_cell(self, preamble, notebook):
+ import nbformat
+
+ preamble_cell = nbformat.v4.new_code_cell(preamble)
+ preamble_cell["metadata"]["tags"] = ["snakemake-job-properties"]
+ notebook["cells"].insert(0, preamble_cell)
+
+ def remove_preamble_cell(self, notebook):
+ preambles = [
+ i
+ for i, cell in enumerate(notebook["cells"])
+ if "snakemake-job-properties" in cell["metadata"].get("tags", [])
+ ]
+ if len(preambles) > 1:
+ raise WorkflowError(
+ "More than one snakemake preamble cell found in notebook. "
+ "Please clean up the notebook first, by removing all or all but one of them."
+ )
+ elif len(preambles) == 1:
+ preamble = preambles[0]
+ # remove old preamble
+ del notebook["cells"][preamble]
class PythonJupyterNotebook(JupyterNotebook):
@@ -91,6 +168,16 @@ class RJupyterNotebook(JupyterNotebook):
)
+def get_exec_class(language):
+ exec_class = {
+ "jupyter_python": PythonJupyterNotebook,
+ "jupyter_r": RJupyterNotebook,
+ }.get(language, None)
+ if exec_class is None:
+ raise ValueError("Unsupported notebook: Expecting Jupyter Notebook (.ipynb).")
+ return exec_class
+
+
def notebook(
path,
basedir,
@@ -112,20 +199,42 @@ def notebook(
bench_iteration,
cleanup_scripts,
shadow_dir,
+ edit=None,
):
"""
Load a script from the given basedir + path and execute it.
"""
- path, source, language = get_source(path, basedir)
+ draft = False
+ if edit is not None:
+ if urlparse(path).scheme == "":
+ if not os.path.exists(path):
+ # draft the notebook, it does not exist yet
+ language = None
+ draft = True
+ path = "file://{}".format(os.path.abspath(path))
+ if path.endswith(".py.ipynb"):
+ language = "jupyter_python"
+ elif path.endswith(".r.ipynb"):
+ language = "jupyter_r"
+ else:
+ raise WorkflowError(
+ "Notebook to edit has to end on .py.ipynb or .r.ipynb in order "
+ "to decide which programming language shall be used."
+ )
+ else:
+ raise WorkflowError(
+ "Notebook {} is not local, but edit mode is only allowed for "
+ "local notebooks.".format(path)
+ )
- ExecClass = {
- "jupyter_python": PythonJupyterNotebook,
- "jupyter_r": RJupyterNotebook,
- }.get(language, None)
- if ExecClass is None:
- raise ValueError("Unsupported notebook: Expecting Jupyter Notebook (.ipynb).")
+ if not draft:
+ path, source, language = get_source(path, basedir)
+ else:
+ source = None
- executor = ExecClass(
+ exec_class = get_exec_class(language)
+
+ executor = exec_class(
path,
source,
basedir,
@@ -148,4 +257,8 @@ def notebook(
cleanup_scripts,
shadow_dir,
)
- executor.evaluate()
+
+ if draft:
+ executor.draft(listen=edit)
+ else:
+ executor.evaluate(edit=edit)
=====================================
snakemake/parser.py
=====================================
@@ -470,7 +470,7 @@ class Run(RuleKeywordState):
"def __rule_{rulename}(input, output, params, wildcards, threads, "
"resources, log, version, rule, conda_env, container_img, "
"singularity_args, use_singularity, env_modules, bench_record, jobid, "
- "is_shell, bench_iteration, cleanup_scripts, shadow_dir):".format(
+ "is_shell, bench_iteration, cleanup_scripts, shadow_dir, edit_notebook):".format(
rulename=self.rulename
if self.rulename is not None
else self.snakefile.rulecount
@@ -582,6 +582,17 @@ class Notebook(Script):
start_func = "@workflow.notebook"
end_func = "notebook"
+ def args(self):
+ # basedir
+ yield ", {!r}".format(os.path.abspath(os.path.dirname(self.snakefile.path)))
+ # other args
+ yield (
+ ", input, output, params, wildcards, threads, resources, log, "
+ "config, rule, conda_env, container_img, singularity_args, env_modules, "
+ "bench_record, jobid, bench_iteration, cleanup_scripts, shadow_dir, "
+ "edit_notebook"
+ )
+
class Wrapper(Script):
start_func = "@workflow.wrapper"
@@ -679,7 +690,9 @@ class Rule(GlobalKeywordState):
for t in self.start():
yield t, token
else:
- self.error("Expected name or colon after rule keyword.", token)
+ self.error(
+ "Expected name or colon after " "rule or checkpoint keyword.", token
+ )
def block_content(self, token):
if is_name(token):
=====================================
snakemake/rules.py
=====================================
@@ -964,7 +964,9 @@ class Rule:
"Resources function did not return int or str.", rule=self
)
if isinstance(res, int):
- res = min(self.workflow.global_resources.get(name, res), res)
+ global_res = self.workflow.global_resources.get(name, res)
+ if global_res is not None:
+ res = min(global_res, res)
return res
threads = apply("_cores", self.resources["_cores"])
=====================================
snakemake/scheduler.py
=====================================
@@ -98,7 +98,11 @@ class JobScheduler:
self.max_jobs_per_second = max_jobs_per_second
self.keepincomplete = keepincomplete
- self.resources = dict(self.workflow.global_resources)
+ self.global_resources = {
+ name: (sys.maxsize if res is None else res)
+ for name, res in workflow.global_resources.items()
+ }
+ self.resources = dict(self.global_resources)
use_threads = (
force_use_threads
@@ -507,7 +511,7 @@ Problem", Akcay, Li, Xu, Annals of Operations Research, 2012
return [c_j * y_j for c_j, y_j in zip(c, y)]
b = [
- self.resources[name] for name in self.workflow.global_resources
+ self.resources[name] for name in self.global_resources
] # resource capacities
while True:
@@ -548,12 +552,12 @@ Problem", Akcay, Li, Xu, Annals of Operations Research, 2012
solution = [job for job, sel in zip(jobs, x) if sel]
# update resources
- for name, b_i in zip(self.workflow.global_resources, b):
+ for name, b_i in zip(self.global_resources, b):
self.resources[name] = b_i
return solution
def calc_resource(self, name, value):
- gres = self.workflow.global_resources[name]
+ gres = self.global_resources[name]
if value > gres:
if name == "_cores":
name = "threads"
@@ -569,15 +573,13 @@ Problem", Akcay, Li, Xu, Annals of Operations Research, 2012
def rule_weight(self, rule):
res = rule.resources
return [
- self.calc_resource(name, res.get(name, 0))
- for name in self.workflow.global_resources
+ self.calc_resource(name, res.get(name, 0)) for name in self.global_resources
]
def job_weight(self, job):
res = job.resources
return [
- self.calc_resource(name, res.get(name, 0))
- for name in self.workflow.global_resources
+ self.calc_resource(name, res.get(name, 0)) for name in self.global_resources
]
def job_reward(self, job):
=====================================
snakemake/script.py
=====================================
@@ -244,6 +244,8 @@ class JuliaEncoder:
class ScriptBase(ABC):
+ editable = False
+
def __init__(
self,
path,
@@ -291,7 +293,9 @@ class ScriptBase(ABC):
self.cleanup_scripts = cleanup_scripts
self.shadow_dir = shadow_dir
- def evaluate(self):
+ def evaluate(self, edit=False):
+ assert not edit or self.editable
+
fd = None
try:
# generate preamble
@@ -307,7 +311,7 @@ class ScriptBase(ABC):
self.write_script(preamble, fd)
# execute script
- self.execute_script(fd.name)
+ self.execute_script(fd.name, edit=edit)
except URLError as e:
raise WorkflowError(e)
finally:
@@ -320,6 +324,10 @@ class ScriptBase(ABC):
# nothing to clean up (TODO: ??)
pass
+ @property
+ def local_path(self):
+ return self.path[7:]
+
@abstractmethod
def get_preamble(self):
...
@@ -329,7 +337,7 @@ class ScriptBase(ABC):
...
@abstractmethod
- def execute_script(self, fname):
+ def execute_script(self, fname, edit=False):
...
def _execute_cmd(self, cmd, **kwargs):
@@ -398,9 +406,9 @@ class PythonScript(ScriptBase):
return textwrap.dedent(
"""
- ######## Snakemake header ########
+ ######## snakemake preamble start (automatically inserted, do not edit) ########
import sys; sys.path.extend([{searchpath}]); import pickle; snakemake = pickle.loads({snakemake}); from snakemake.logging import logger; logger.printshellcmds = {printshellcmds}; {preamble_addendum}
- ######## Original script #########
+ ######## snakemake preamble end #########
"""
).format(
searchpath=searchpath,
@@ -444,7 +452,7 @@ class PythonScript(ScriptBase):
fd.write(preamble.encode())
fd.write(self.source)
- def execute_script(self, fname):
+ def execute_script(self, fname, edit=False):
py_exec = sys.executable
if self.conda_env is not None:
py = os.path.join(self.conda_env, "bin", "python")
@@ -505,7 +513,7 @@ class RScript(ScriptBase):
):
return textwrap.dedent(
"""
- ######## Snakemake header ########
+ ######## snakemake preamble start (automatically inserted, do not edit) ########
library(methods)
Snakemake <- setClass(
"Snakemake",
@@ -545,7 +553,7 @@ class RScript(ScriptBase):
)
{preamble_addendum}
- ######## Original script #########
+ ######## snakemake preamble end #########
"""
).format(
REncoder.encode_namedlist(input_),
@@ -601,7 +609,7 @@ class RScript(ScriptBase):
fd.write(preamble.encode())
fd.write(self.source)
- def execute_script(self, fname):
+ def execute_script(self, fname, edit=False):
if self.conda_env is not None and "R_LIBS" in os.environ:
logger.warning(
"R script job uses conda environment but "
@@ -619,7 +627,7 @@ class RMarkdown(ScriptBase):
def get_preamble(self):
return textwrap.dedent(
"""
- ######## Snakemake header ########
+ ######## snakemake preamble start (automatically inserted, do not edit) ########
library(methods)
Snakemake <- setClass(
"Snakemake",
@@ -658,7 +666,7 @@ class RMarkdown(ScriptBase):
}}
)
- ######## Original script #########
+ ######## snakemake preamble end #########
"""
).format(
REncoder.encode_namedlist(self.input),
@@ -700,7 +708,7 @@ class RMarkdown(ScriptBase):
fd.write(preamble.encode())
fd.write(str.encode(code[pos:]))
- def execute_script(self, fname):
+ def execute_script(self, fname, edit=False):
if len(self.output) != 1:
raise WorkflowError(
"RMarkdown scripts (.Rmd) may only have a single output file."
@@ -718,7 +726,7 @@ class JuliaScript(ScriptBase):
def get_preamble(self):
return textwrap.dedent(
"""
- ######## Snakemake header ########
+ ######## snakemake preamble start (automatically inserted, do not edit) ########
struct Snakemake
input::Dict
output::Dict
@@ -747,7 +755,7 @@ class JuliaScript(ScriptBase):
{}, #scriptdir::String
#, #source::Any
)
- ######## Original script #########
+ ######## snakemake preamble end #########
""".format(
JuliaEncoder.encode_namedlist(self.input),
JuliaEncoder.encode_namedlist(self.output),
@@ -779,13 +787,11 @@ class JuliaScript(ScriptBase):
fd.write(preamble.encode())
fd.write(self.source)
- def execute_script(self, fname):
+ def execute_script(self, fname, edit=False):
self._execute_cmd("julia {fname:q}", fname=fname)
def get_source(path, basedir="."):
- import nbformat
-
source = None
if not path.startswith("http") and not path.startswith("git+file"):
if path.startswith("file://"):
@@ -795,6 +801,7 @@ def get_source(path, basedir="."):
if not os.path.isabs(path):
path = os.path.abspath(os.path.join(basedir, path))
path = "file://" + path
+ # TODO this should probably be removed again. It does not work for report and hash!
path = format(path, stepout=1)
if path.startswith("file://"):
sourceurl = "file:" + pathname2url(path[7:])
@@ -809,6 +816,14 @@ def get_source(path, basedir="."):
with urlopen(sourceurl) as source:
source = source.read()
+ language = get_language(path, source)
+
+ return path, source, language
+
+
+def get_language(path, source):
+ import nbformat
+
language = None
if path.endswith(".py"):
language = "python"
@@ -828,7 +843,7 @@ def get_source(path, basedir="."):
language += "_" + kernel_language.lower()
- return path, source, language
+ return language
def script(
@@ -858,18 +873,18 @@ def script(
"""
path, source, language = get_source(path, basedir)
- ExecClass = {
+ exec_class = {
"python": PythonScript,
"r": RScript,
"rmarkdown": RMarkdown,
"julia": JuliaScript,
}.get(language, None)
- if ExecClass is None:
+ if exec_class is None:
raise ValueError(
"Unsupported script: Expecting either Python (.py), R (.R), RMarkdown (.Rmd) or Julia (.jl) script."
)
- executor = ExecClass(
+ executor = exec_class(
path,
source,
basedir,
=====================================
snakemake/workflow.py
=====================================
@@ -103,6 +103,7 @@ class Workflow:
cores=1,
resources=None,
conda_cleanup_pkgs=None,
+ edit_notebook=False,
):
"""
Create the controller.
@@ -162,6 +163,7 @@ class Workflow:
self.run_local = run_local
self.report_text = None
self.conda_cleanup_pkgs = conda_cleanup_pkgs
+ self.edit_notebook = edit_notebook
# environment variables to pass to jobs
# These are defined via the "envvars:" syntax in the Snakefile itself
self.envvars = set()
@@ -851,16 +853,21 @@ class Workflow:
logger.resources_info(
"Provided cluster nodes: {}".format(self.nodes)
)
+ elif kubernetes or tibanna:
+ logger.resources_info("Provided cloud nodes: {}".format(self.nodes))
else:
- warning = (
- "" if self.cores > 1 else " (use --cores to define parallelism)"
- )
- logger.resources_info(
- "Provided cores: {}{}".format(self.cores, warning)
- )
- logger.resources_info(
- "Rules claiming more threads " "will be scaled down."
- )
+ if self.cores is not None:
+ warning = (
+ ""
+ if self.cores > 1
+ else " (use --cores to define parallelism)"
+ )
+ logger.resources_info(
+ "Provided cores: {}{}".format(self.cores, warning)
+ )
+ logger.resources_info(
+ "Rules claiming more threads " "will be scaled down."
+ )
provided_resources = format_resources(self.global_resources)
if provided_resources:
=====================================
tests/test_jupyter_notebook/expected-results/result_final.txt
=====================================
@@ -0,0 +1 @@
+result of serious computation!!!!!!
\ No newline at end of file
=====================================
tests/test_jupyter_notebook/expected-results/result_intermediate.txt
=====================================
@@ -0,0 +1 @@
+result of serious computation!!!
\ No newline at end of file
=====================================
tests/tests.py
=====================================
@@ -1014,3 +1014,7 @@ def test_string_resources():
default_resources=DefaultResources(["gpu_model='nvidia-tesla-1000'"]),
cluster="./qsub.py",
)
+
+
+def test_jupyter_notebook():
+ run(dpath("test_jupyter_notebook"), use_conda=True)
View it on GitLab: https://salsa.debian.org/med-team/snakemake/-/compare/9667f24ac33e6204cd6801693ca496db43c21384...443804191d0ee873fc54e14ad9392386f6c69d13
--
View it on GitLab: https://salsa.debian.org/med-team/snakemake/-/compare/9667f24ac33e6204cd6801693ca496db43c21384...443804191d0ee873fc54e14ad9392386f6c69d13
You're receiving this email because of your account on salsa.debian.org.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://alioth-lists.debian.net/pipermail/debian-med-commit/attachments/20200504/9a4ef371/attachment-0001.html>
More information about the debian-med-commit
mailing list