[med-svn] [Git][med-team/snakemake][master] 4 commits: Dockerfile: fix remote root-in-container security hole.

Rebecca N. Palmer gitlab at salsa.debian.org
Mon May 4 12:17:47 BST 2020



Rebecca N. Palmer pushed to branch master at Debian Med / snakemake


Commits:
2705e3b8 by Rebecca N. Palmer at 2020-05-04T11:24:04+01:00
Dockerfile: fix remote root-in-container security hole.

- - - - -
34852dea by Rebecca N. Palmer at 2020-05-04T11:27:44+01:00
New upstream version 5.16.0
- - - - -
27426106 by Rebecca N. Palmer at 2020-05-04T11:28:02+01:00
Update upstream source from tag 'upstream/5.16.0'

Update to upstream version '5.16.0'
with Debian dir ccb919adeba020055e4d3c1e5dc74fc65bc8004e
- - - - -
44380419 by Rebecca N. Palmer at 2020-05-04T12:16:30+01:00
Skip a new conda-using test.

- - - - -


22 changed files:

- CHANGELOG.rst
- debian/changelog
- + debian/patches/security.patch
- debian/patches/series
- debian/rules
- debian/tests/run-unit-test
- + docs/snakefiles/images/snakemake-notebook-demo.gif
- docs/snakefiles/rules.rst
- snakemake/__init__.py
- snakemake/_version.py
- snakemake/dag.py
- snakemake/executors.py
- snakemake/jobs.py
- snakemake/notebook.py
- snakemake/parser.py
- snakemake/rules.py
- snakemake/scheduler.py
- snakemake/script.py
- snakemake/workflow.py
- + tests/test_jupyter_notebook/expected-results/result_final.txt
- + tests/test_jupyter_notebook/expected-results/result_intermediate.txt
- tests/tests.py


Changes:

=====================================
CHANGELOG.rst
=====================================
@@ -1,3 +1,13 @@
+[5.16.0] - 2020-04-29
+=====================
+Added
+-----
+- Interactive jupyter notebook editing. Notebooks defined by rules can be interactively drafted and updated using snakemake --edit-notebook (see docs).
+Changed
+-------
+- Fixed group resource usage to occupy one cluster/cloud node.
+- Minor bug fixes.
+
 [5.15.0] - 2020-04-21
 =====================
 Changed


=====================================
debian/changelog
=====================================
@@ -1,10 +1,12 @@
-snakemake (5.15.0-1) UNRELEASED; urgency=medium
+snakemake (5.16.0-1) unstable; urgency=medium
 
   * New upstream release.  Drop / refresh patches.
   * Stop using pytest-xdist: it sometimes starts tests from
     a non-main thread, which breaks snakemake.
+  * Dockerfile: fix remote root-in-container security hole.
+  * Skip a new conda-using test.
 
- -- Rebecca N. Palmer <rebecca_palmer at zoho.com>  Mon, 27 Apr 2020 12:05:59 +0100
+ -- Rebecca N. Palmer <rebecca_palmer at zoho.com>  Mon, 04 May 2020 12:16:03 +0100
 
 snakemake (5.14.0-1) unstable; urgency=medium
 


=====================================
debian/patches/security.patch
=====================================
@@ -0,0 +1,30 @@
+Description: Fix remote root-in-container security hole, and a link
+
+Repository signatures are only secure if you have the right key
+to check them with
+
+Author: Rebecca N. Palmer <rebecca_palmer at zoho.com>
+Forwarded: https://github.com/snakemake/snakemake/pull/371
+
+diff --git a/Dockerfile b/Dockerfile
+index 095b5335..a3afe66b 100644
+--- a/Dockerfile
++++ b/Dockerfile
+@@ -6,8 +6,8 @@ ENV PATH /opt/conda/bin:${PATH}
+ ENV LANG C.UTF-8
+ ENV SHELL /bin/bash
+ RUN /bin/bash -c "install_packages wget bzip2 ca-certificates gnupg2 squashfs-tools git && \
+-    wget -O- http://neuro.debian.net/lists/xenial.us-ca.full > /etc/apt/sources.list.d/neurodebian.sources.list && \
+-    wget -O- http://neuro.debian.net/_static/neuro.debian.net.asc | apt-key add - && \
++    wget -O- https://neuro.debian.net/lists/xenial.us-ca.full > /etc/apt/sources.list.d/neurodebian.sources.list && \
++    wget -O- https://neuro.debian.net/_static/neuro.debian.net.asc | apt-key add - && \
+     install_packages singularity-container && \
+     wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh && \
+     bash Miniconda3-latest-Linux-x86_64.sh -b -p /opt/conda && \
+diff --git a/examples/c/README.txt b/examples/c/README.txt
+index 995d30fe..d2eaceec 100644
+--- a/examples/c/README.txt
++++ b/examples/c/README.txt
+@@ -1 +1 @@
+-http://www.cs.colby.edu/maxwell/courses/tutorials/maketutor/
++https://www.cs.colby.edu/maxwell/courses/tutorials/maketutor/


=====================================
debian/patches/series
=====================================
@@ -7,3 +7,4 @@ local_javascript.patch
 python3.patch
 workaround_sphinx_issue.patch
 remove_ccbysa_snippets.patch
+security.patch


=====================================
debian/rules
=====================================
@@ -7,7 +7,7 @@ export HOME=$(CURDIR)/fakehome
 export PYBUILD_NAME=snakemake
 export PYBUILD_DESTDIR_python3=debian/snakemake
 export PYBUILD_BEFORE_TEST_python3=chmod +x {dir}/bin/snakemake; cp -r {dir}/bin {dir}/tests {build_dir}
-export PYBUILD_TEST_ARGS=python{version} -m pytest -v tests/test*.py -k 'not report and not ancient and not test_script and not default_remote and not issue635 and not convert_to_cwl and not issue1083 and not issue1092 and not issue1093 and not test_remote and not test_default_resources and not test_tibanna and not test_github_issue78 and not test_output_file_cache_remote and not test_env_modules and not test_archive and not test_container'
+export PYBUILD_TEST_ARGS=python{version} -m pytest -v tests/test*.py -k 'not report and not ancient and not test_script and not default_remote and not issue635 and not convert_to_cwl and not issue1083 and not issue1092 and not issue1093 and not test_remote and not test_default_resources and not test_tibanna and not test_github_issue78 and not test_output_file_cache_remote and not test_env_modules and not test_archive and not test_container and not test_jupyter_notebook'
 
 # test_report
 # test_ancient
@@ -19,7 +19,7 @@ export PYBUILD_TEST_ARGS=python{version} -m pytest -v tests/test*.py -k 'not rep
 # test_issue1093 fails due to conda usage; commenting that out and installing bwa produces a different ordering than desired
 # test_default_resources and test_remote needs moto to be packaged https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=777089
 # test_env_modules relies on "module load" which is not packaged for Debian
-# test_archive uses conda
+# test_archive, test_jupyter_notebook use conda
 # test_container uses docker://bash
 
 export PYBUILD_AFTER_TEST_python3=rm -fr {build_dir}/bin {build_dir}/tests {dir}/tests/test_filegraph/.snakemake/ {dir}/tests/linting/*/.snakemake/


=====================================
debian/tests/run-unit-test
=====================================
@@ -17,5 +17,5 @@ cd "${AUTOPKGTEST_TMP}"
 
 export HOME="${AUTOPKGTEST_TMP}"
 
-python3 -m pytest -v ${ROOT}/tests/test*.py -k 'not report and not ancient and not test_script and not default_remote and not issue635 and not convert_to_cwl and not issue1083 and not issue1092 and not issue1093 and not test_remote and not test_default_resources and not test_singularity and not test_singularity_conda and not test_cwl_singularity and not test_cwl and not test_url_include and not test_tibanna and not test_github_issue78 and not test_output_file_cache_remote and not test_env_modules and not test_archive and not test_container'
+python3 -m pytest -v ${ROOT}/tests/test*.py -k 'not report and not ancient and not test_script and not default_remote and not issue635 and not convert_to_cwl and not issue1083 and not issue1092 and not issue1093 and not test_remote and not test_default_resources and not test_singularity and not test_singularity_conda and not test_cwl_singularity and not test_cwl and not test_url_include and not test_tibanna and not test_github_issue78 and not test_output_file_cache_remote and not test_env_modules and not test_archive and not test_container and not test_jupyter_notebook'
 


=====================================
docs/snakefiles/images/snakemake-notebook-demo.gif
=====================================
Binary files /dev/null and b/docs/snakefiles/images/snakemake-notebook-demo.gif differ


=====================================
docs/snakefiles/rules.rst
=====================================
@@ -608,18 +608,14 @@ Integration works as follows (note the use of `notebook:` instead of `script:`):
 
 .. code-block:: python
 
-    rule NAME:
-        input:
-            "path/to/inputfile",
-            "path/to/other/inputfile"
+    rule hello:
         output:
-            "path/to/outputfile",
-            "path/to/another/outputfile"
+            "test.txt"
         log:
             # optional path to the processed notebook
-            notebook = "logs/notebooks/processed_notebook.ipynb"
+            notebook="logs/notebooks/processed_notebook.ipynb"
         notebook:
-            "notebooks/notebook.ipynb"
+            "hello.py.ipynb"
 
 .. note:
 
@@ -627,15 +623,41 @@ Integration works as follows (note the use of `notebook:` instead of `script:`):
     A modular, readable workflow definition with Snakemake, and the ability to quickly explore and plot data with Jupyter.
     The benefit will be maximal when integrating many small notebooks that each do a particular job, hence allowing to get away from large monolithic, and therefore unreadable notebooks.
 
+It is recommended to prefix the ``.ipynb`` suffix with either ``.py`` or ``.r`` to indicate the notebook language.
 In the notebook, a snakemake object is available, which can be accessed in the same way as the with :ref:`script integration <snakefiles_external-scripts>`.
 In other words, you have access to input files via ``snakemake.input`` (in the Python case) and ``snakemake at input`` (in the R case) etc..
-Hence, integrating a new notebook works by first writing it from scratch in the usual interactive way.
-Then, you replace all hardcoded variables with references to properties of the rule where it shall be integrated, e.g. replacing the path to an input file with ``snakemake.input[0]``.
-Once having moved the notebook to the right place in the pipeline (ideally a subfolder ``notebooks``) and referring to it from the rule, Snakemake will be able to re-execute it while inserting the desired variable values from the rule properties.
-
 Optionally it is possible to automatically store the processed notebook.
 This can be achieved by adding a named logfile ``notebook=...`` to the ``log`` directive.
 
+In order to simplify the coding of notebooks given the automatically inserted ``snakemake`` object, Snakemake provides an interactive edit mode for notebook rules.
+Let us assume you have written above rule, but the notebook does not yet exist.
+By running
+
+.. code-block:: console
+
+    snakemake --cores 1 --edit-notebook test.txt
+
+you instruct Snakemake to allow interactive editing of the notebook needed to create the file ``test.txt``.
+Snakemake will run all dependencies of the notebook rule, such that all input files are present.
+Then, it will start a jupyter notebook server with an empty draft of the notebook, in which you can interactively program everything needed for this particular step.
+Once done, you should save the notebook from the jupyter web interface, go to the jupyter dashboard and hit the ``Quit`` button on the top right in order to shut down the jupyter server.
+Snakemake will detect that the server is closed and automatically store the drafted notebook into the path given in the rule (here ``hello.py.ipynb``).
+If the notebook already exists, above procedure can be used to easily modify it.
+Note that Snakemake requires local execution for the notebook edit mode.
+On a cluster or the cloud, you can generate all dependencies of the notebook rule via
+
+.. code-block:: console
+
+    snakemake --cluster ... --jobs 100 --until test.txt
+
+Then, the notebook rule can easily be executed locally.
+An demo of the entire interactive editing process can be found by clicking below:
+
+.. image:: images/snakemake-notebook-demo.gif
+    :scale: 20%
+    :alt: Notebook integration demo
+    :align: center
+
 
 Protected and Temporary Files
 -----------------------------


=====================================
snakemake/__init__.py
=====================================
@@ -153,6 +153,7 @@ def snakemake(
     show_failed_logs=False,
     keep_incomplete=False,
     messaging=None,
+    edit_notebook=None,
 ):
     """Run snakemake on a given snakefile.
 
@@ -262,7 +263,8 @@ def snakemake(
         cluster_status (str):       status command for cluster execution. If None, Snakemake will rely on flag files. Otherwise, it expects the command to return "success", "failure" or "running" when executing with a cluster jobid as single argument.
         export_cwl (str):           Compile workflow to CWL and save to given file
         log_handler (function):     redirect snakemake output to this custom log handler, a function that takes a log message dictionary (see below) as its only argument (default None). The log message dictionary for the log handler has to following entries:
-        keep_incomplete (bool):      keep incomplete output files of failed jobs
+        keep_incomplete (bool):     keep incomplete output files of failed jobs
+        edit_notebook (object):     "notebook.Listen" object to configuring notebook server for interactive editing of a rule notebook. If None, do not edit.
         log_handler (list):         redirect snakemake output to this list of custom log handler, each a function that takes a log message dictionary (see below) as its only argument (default []). The log message dictionary for the log handler has to following entries:
 
             :level:
@@ -334,10 +336,10 @@ def snakemake(
     if updated_files is None:
         updated_files = list()
 
-    if cluster or cluster_sync or drmaa or tibanna:
-        cores = sys.maxsize
+    if cluster or cluster_sync or drmaa or tibanna or kubernetes:
+        cores = None
     else:
-        nodes = sys.maxsize
+        nodes = None
 
     if isinstance(cluster_config, str):
         # Loading configuration from one file is still supported for
@@ -355,9 +357,15 @@ def snakemake(
         cluster_config_content = dict()
 
     run_local = not (cluster or cluster_sync or drmaa or kubernetes or tibanna)
-    if run_local and not dryrun:
-        # clean up all previously recorded jobids.
-        shell.cleanup()
+    if run_local:
+        if not dryrun:
+            # clean up all previously recorded jobids.
+            shell.cleanup()
+    else:
+        if edit_notebook:
+            raise WorkflowError(
+                "Notebook edit mode is only allowed with local execution."
+            )
 
     # force thread use for any kind of cluster
     use_threads = (
@@ -491,6 +499,7 @@ def snakemake(
             cores=cores,
             nodes=nodes,
             resources=resources,
+            edit_notebook=edit_notebook,
         )
         success = True
         workflow.include(
@@ -1093,9 +1102,9 @@ def get_argument_parser(profile=None):
         ),
     )
 
-    group_utils = parser.add_argument_group("UTILITIES")
+    group_report = parser.add_argument_group("REPORTS")
 
-    group_utils.add_argument(
+    group_report.add_argument(
         "--report",
         nargs="?",
         const="report.html",
@@ -1106,12 +1115,33 @@ def get_argument_parser(profile=None):
         "In the latter case, results are stored along with a file report.html in the zip archive. "
         "If no filename is given, an embedded report.html is the default.",
     )
-    group_utils.add_argument(
+    group_report.add_argument(
         "--report-stylesheet",
         metavar="CSSFILE",
         help="Custom stylesheet to use for report. In particular, this can be used for "
         "branding the report with e.g. a custom logo, see docs.",
     )
+
+    group_notebooks = parser.add_argument_group("NOTEBOOKS")
+
+    group_notebooks.add_argument(
+        "--edit-notebook",
+        metavar="TARGET",
+        help="Interactively edit the notebook associated with the rule used to generate the given target file. "
+        "This will start a local jupyter notebook server. "
+        "Any changes to the notebook should be saved, and the server has to be stopped by "
+        "closing the notebook and hitting the 'Quit' button on the jupyter dashboard. "
+        "Afterwards, the updated notebook will be automatically stored in the path defined in the rule. "
+        "If the notebook is not yet present, this will create an empty draft. ",
+    )
+    group_notebooks.add_argument(
+        "--notebook-listen",
+        metavar="IP:PORT",
+        default="localhost:8888",
+        help="The IP address and PORT the notebook server used for editing the notebook (--edit-notebook) will listen on.",
+    )
+
+    group_utils = parser.add_argument_group("UTILITIES")
     group_utils.add_argument(
         "--lint",
         nargs="?",
@@ -1121,6 +1151,7 @@ def get_argument_parser(profile=None):
         "specific suggestions to improve code quality (work in progress, more lints "
         "to be added in the future). If no argument is provided, plain text output is used.",
     )
+
     group_utils.add_argument(
         "--export-cwl",
         action="store",
@@ -2102,6 +2133,13 @@ def main(argv=None):
             slack_logger = logging.SlackLogger()
             log_handler.append(slack_logger.log_handler)
 
+        if args.edit_notebook:
+            from snakemake import notebook
+
+            args.target = [args.edit_notebook]
+            args.force = True
+            args.edit_notebook = notebook.Listen(args.notebook_listen)
+
         success = snakemake(
             args.snakefile,
             batch=batch,
@@ -2211,6 +2249,7 @@ def main(argv=None):
             export_cwl=args.export_cwl,
             show_failed_logs=args.show_failed_logs,
             keep_incomplete=args.keep_incomplete,
+            edit_notebook=args.edit_notebook,
             log_handler=log_handler,
         )
 


=====================================
snakemake/_version.py
=====================================
@@ -22,9 +22,9 @@ def get_keywords():
     # setup.py/versioneer.py will grep for the variable names, so they must
     # each be defined on a line of their own. _version.py will just call
     # get_keywords().
-    git_refnames = " (HEAD -> master, tag: v5.15.0)"
-    git_full = "6d3aa424cfffaf33f36ca4b6da870ce786ddcb30"
-    git_date = "2020-04-21 10:55:47 +0200"
+    git_refnames = " (tag: v5.16.0)"
+    git_full = "7ec74f79067b3392607479c6f3bec4a2e90cbc05"
+    git_date = "2020-04-29 16:01:58 +0200"
     keywords = {"refnames": git_refnames, "full": git_full, "date": git_date}
     return keywords
 


=====================================
snakemake/dag.py
=====================================
@@ -319,6 +319,9 @@ class DAG:
             self.update_dynamic(job)
         self.postprocess()
 
+    def is_edit_notebook_job(self, job):
+        return self.workflow.edit_notebook and job.targetfile in self.targetfiles
+
     @property
     def dynamic_output_jobs(self):
         """Iterate over all jobs with dynamic output files."""


=====================================
snakemake/executors.py
=====================================
@@ -203,15 +203,16 @@ class RealExecutor(AbstractExecutor):
         handle_touch=True,
         ignore_missing_output=False,
     ):
-        job.postprocess(
-            upload_remote=upload_remote,
-            handle_log=handle_log,
-            handle_touch=handle_touch,
-            ignore_missing_output=ignore_missing_output,
-            latency_wait=self.latency_wait,
-            assume_shared_fs=self.assume_shared_fs,
-        )
-        self.stats.report_job_end(job)
+        if not self.dag.is_edit_notebook_job(job):
+            job.postprocess(
+                upload_remote=upload_remote,
+                handle_log=handle_log,
+                handle_touch=handle_touch,
+                ignore_missing_output=ignore_missing_output,
+                latency_wait=self.latency_wait,
+                assume_shared_fs=self.assume_shared_fs,
+            )
+            self.stats.report_job_end(job)
 
     def handle_job_error(self, job, upload_remote=True):
         job.postprocess(
@@ -422,6 +423,7 @@ class CPUExecutor(RealExecutor):
             self.workflow.cleanup_scripts,
             job.shadow_dir,
             job.jobid,
+            self.workflow.edit_notebook,
         )
 
     def run_single_job(self, job):
@@ -2025,6 +2027,7 @@ def run_wrapper(
     cleanup_scripts,
     shadow_dir,
     jobid,
+    edit_notebook,
 ):
     """
     Wrapper around the run method that handles exceptions and benchmarking.
@@ -2102,6 +2105,7 @@ def run_wrapper(
                             bench_iteration,
                             cleanup_scripts,
                             passed_shadow_dir,
+                            edit_notebook,
                         )
                     else:
                         # The benchmarking is started here as we have a run section
@@ -2129,6 +2133,7 @@ def run_wrapper(
                                 bench_iteration,
                                 cleanup_scripts,
                                 passed_shadow_dir,
+                                edit_notebook,
                             )
                     # Store benchmark record for this iteration
                     bench_records.append(bench_record)
@@ -2154,6 +2159,7 @@ def run_wrapper(
                     None,
                     cleanup_scripts,
                     passed_shadow_dir,
+                    edit_notebook,
                 )
     except (KeyboardInterrupt, SystemExit) as e:
         # Re-raise the keyboard interrupt in order to record an error in the


=====================================
snakemake/jobs.py
=====================================
@@ -1173,6 +1173,7 @@ class GroupJob(AbstractJob):
     def resources(self):
         if self._resources is None:
             self._resources = defaultdict(int)
+            self._resources["_nodes"] = 1
             pipe_group = any([job.is_pipe for job in self.jobs])
             # iterate over siblings that can be executed in parallel
             for siblings in self.toposorted:


=====================================
snakemake/notebook.py
=====================================
@@ -1,34 +1,111 @@
-import os
+import os, sys
+from urllib.error import URLError
+from urllib.parse import urlparse
+import tempfile
+import re
+import shutil
 
 from snakemake.exceptions import WorkflowError
 from snakemake.shell import shell
 from snakemake.script import get_source, ScriptBase, PythonScript, RScript
+from snakemake.logging import logger
+
+KERNEL_STARTED_RE = re.compile("Kernel started: (?P<kernel_id>\S+)")
+KERNEL_SHUTDOWN_RE = re.compile("Kernel shutdown: (?P<kernel_id>\S+)")
+
+
+class Listen:
+    def __init__(self, arg):
+        self.ip, self.port = arg.split(":")
 
 
 class JupyterNotebook(ScriptBase):
+
+    editable = True
+
+    def draft(self, listen):
+        import nbformat
+
+        preamble = self.get_preamble()
+        nb = nbformat.v4.new_notebook()
+        self.insert_preamble_cell(preamble, nb)
+
+        nb["cells"].append(nbformat.v4.new_code_cell("# start coding here"))
+
+        with open(self.local_path, "wb") as out:
+            out.write(nbformat.writes(nb).encode())
+
+        self.source = open(self.local_path).read()
+
+        self.evaluate(edit=listen)
+
     def write_script(self, preamble, fd):
         import nbformat
 
         nb = nbformat.reads(self.source, as_version=4)  # nbformat.NO_CONVERT
 
-        preamble_cell = nbformat.v4.new_code_cell(preamble)
-        nb["cells"].insert(0, preamble_cell)
+        self.remove_preamble_cell(nb)
+        self.insert_preamble_cell(preamble, nb)
 
         fd.write(nbformat.writes(nb).encode())
 
-    def execute_script(self, fname):
+    def execute_script(self, fname, edit=None):
+        import nbformat
+
         fname_out = self.log.get("notebook", None)
-        if fname_out is None:
+        if fname_out is None or edit:
             output_parameter = ""
         else:
             fname_out = os.path.join(os.getcwd(), fname_out)
             output_parameter = "--output {fname_out:q}"
 
-        cmd_tmp = "jupyter-nbconvert --execute {output_parameter} --to notebook --ExecutePreprocessor.timeout=-1 {{fname:q}}".format(
-            output_parameter=output_parameter
-        )
+        if edit is not None:
+            logger.info("Opening notebook for editing.")
+            cmd = (
+                "jupyter notebook --log-level ERROR --ip {edit.ip} --port {edit.port} "
+                "--no-browser --NotebookApp.quit_button=True {{fname:q}}".format(
+                    edit=edit
+                )
+            )
+        else:
+            cmd = (
+                "jupyter-nbconvert --log-level ERROR --execute {output_parameter} "
+                "--to notebook --ExecutePreprocessor.timeout=-1 {{fname:q}}".format(
+                    output_parameter=output_parameter
+                )
+            )
 
-        self._execute_cmd(cmd_tmp, fname_out=fname_out, fname=fname)
+        self._execute_cmd(cmd, fname_out=fname_out, fname=fname)
+
+        if edit:
+            logger.info("Saving modified notebook.")
+            nb = nbformat.read(fname, as_version=4)
+            self.remove_preamble_cell(nb)
+
+            nbformat.write(nb, self.local_path)
+
+    def insert_preamble_cell(self, preamble, notebook):
+        import nbformat
+
+        preamble_cell = nbformat.v4.new_code_cell(preamble)
+        preamble_cell["metadata"]["tags"] = ["snakemake-job-properties"]
+        notebook["cells"].insert(0, preamble_cell)
+
+    def remove_preamble_cell(self, notebook):
+        preambles = [
+            i
+            for i, cell in enumerate(notebook["cells"])
+            if "snakemake-job-properties" in cell["metadata"].get("tags", [])
+        ]
+        if len(preambles) > 1:
+            raise WorkflowError(
+                "More than one snakemake preamble cell found in notebook. "
+                "Please clean up the notebook first, by removing all or all but one of them."
+            )
+        elif len(preambles) == 1:
+            preamble = preambles[0]
+            # remove old preamble
+            del notebook["cells"][preamble]
 
 
 class PythonJupyterNotebook(JupyterNotebook):
@@ -91,6 +168,16 @@ class RJupyterNotebook(JupyterNotebook):
         )
 
 
+def get_exec_class(language):
+    exec_class = {
+        "jupyter_python": PythonJupyterNotebook,
+        "jupyter_r": RJupyterNotebook,
+    }.get(language, None)
+    if exec_class is None:
+        raise ValueError("Unsupported notebook: Expecting Jupyter Notebook (.ipynb).")
+    return exec_class
+
+
 def notebook(
     path,
     basedir,
@@ -112,20 +199,42 @@ def notebook(
     bench_iteration,
     cleanup_scripts,
     shadow_dir,
+    edit=None,
 ):
     """
     Load a script from the given basedir + path and execute it.
     """
-    path, source, language = get_source(path, basedir)
+    draft = False
+    if edit is not None:
+        if urlparse(path).scheme == "":
+            if not os.path.exists(path):
+                # draft the notebook, it does not exist yet
+                language = None
+                draft = True
+                path = "file://{}".format(os.path.abspath(path))
+                if path.endswith(".py.ipynb"):
+                    language = "jupyter_python"
+                elif path.endswith(".r.ipynb"):
+                    language = "jupyter_r"
+                else:
+                    raise WorkflowError(
+                        "Notebook to edit has to end on .py.ipynb or .r.ipynb in order "
+                        "to decide which programming language shall be used."
+                    )
+        else:
+            raise WorkflowError(
+                "Notebook {} is not local, but edit mode is only allowed for "
+                "local notebooks.".format(path)
+            )
 
-    ExecClass = {
-        "jupyter_python": PythonJupyterNotebook,
-        "jupyter_r": RJupyterNotebook,
-    }.get(language, None)
-    if ExecClass is None:
-        raise ValueError("Unsupported notebook: Expecting Jupyter Notebook (.ipynb).")
+    if not draft:
+        path, source, language = get_source(path, basedir)
+    else:
+        source = None
 
-    executor = ExecClass(
+    exec_class = get_exec_class(language)
+
+    executor = exec_class(
         path,
         source,
         basedir,
@@ -148,4 +257,8 @@ def notebook(
         cleanup_scripts,
         shadow_dir,
     )
-    executor.evaluate()
+
+    if draft:
+        executor.draft(listen=edit)
+    else:
+        executor.evaluate(edit=edit)


=====================================
snakemake/parser.py
=====================================
@@ -470,7 +470,7 @@ class Run(RuleKeywordState):
             "def __rule_{rulename}(input, output, params, wildcards, threads, "
             "resources, log, version, rule, conda_env, container_img, "
             "singularity_args, use_singularity, env_modules, bench_record, jobid, "
-            "is_shell, bench_iteration, cleanup_scripts, shadow_dir):".format(
+            "is_shell, bench_iteration, cleanup_scripts, shadow_dir, edit_notebook):".format(
                 rulename=self.rulename
                 if self.rulename is not None
                 else self.snakefile.rulecount
@@ -582,6 +582,17 @@ class Notebook(Script):
     start_func = "@workflow.notebook"
     end_func = "notebook"
 
+    def args(self):
+        # basedir
+        yield ", {!r}".format(os.path.abspath(os.path.dirname(self.snakefile.path)))
+        # other args
+        yield (
+            ", input, output, params, wildcards, threads, resources, log, "
+            "config, rule, conda_env, container_img, singularity_args, env_modules, "
+            "bench_record, jobid, bench_iteration, cleanup_scripts, shadow_dir, "
+            "edit_notebook"
+        )
+
 
 class Wrapper(Script):
     start_func = "@workflow.wrapper"
@@ -679,7 +690,9 @@ class Rule(GlobalKeywordState):
             for t in self.start():
                 yield t, token
         else:
-            self.error("Expected name or colon after rule keyword.", token)
+            self.error(
+                "Expected name or colon after " "rule or checkpoint keyword.", token
+            )
 
     def block_content(self, token):
         if is_name(token):


=====================================
snakemake/rules.py
=====================================
@@ -964,7 +964,9 @@ class Rule:
                         "Resources function did not return int or str.", rule=self
                     )
             if isinstance(res, int):
-                res = min(self.workflow.global_resources.get(name, res), res)
+                global_res = self.workflow.global_resources.get(name, res)
+                if global_res is not None:
+                    res = min(global_res, res)
             return res
 
         threads = apply("_cores", self.resources["_cores"])


=====================================
snakemake/scheduler.py
=====================================
@@ -98,7 +98,11 @@ class JobScheduler:
         self.max_jobs_per_second = max_jobs_per_second
         self.keepincomplete = keepincomplete
 
-        self.resources = dict(self.workflow.global_resources)
+        self.global_resources = {
+            name: (sys.maxsize if res is None else res)
+            for name, res in workflow.global_resources.items()
+        }
+        self.resources = dict(self.global_resources)
 
         use_threads = (
             force_use_threads
@@ -507,7 +511,7 @@ Problem", Akcay, Li, Xu, Annals of Operations Research, 2012
                 return [c_j * y_j for c_j, y_j in zip(c, y)]
 
             b = [
-                self.resources[name] for name in self.workflow.global_resources
+                self.resources[name] for name in self.global_resources
             ]  # resource capacities
 
             while True:
@@ -548,12 +552,12 @@ Problem", Akcay, Li, Xu, Annals of Operations Research, 2012
 
             solution = [job for job, sel in zip(jobs, x) if sel]
             # update resources
-            for name, b_i in zip(self.workflow.global_resources, b):
+            for name, b_i in zip(self.global_resources, b):
                 self.resources[name] = b_i
             return solution
 
     def calc_resource(self, name, value):
-        gres = self.workflow.global_resources[name]
+        gres = self.global_resources[name]
         if value > gres:
             if name == "_cores":
                 name = "threads"
@@ -569,15 +573,13 @@ Problem", Akcay, Li, Xu, Annals of Operations Research, 2012
     def rule_weight(self, rule):
         res = rule.resources
         return [
-            self.calc_resource(name, res.get(name, 0))
-            for name in self.workflow.global_resources
+            self.calc_resource(name, res.get(name, 0)) for name in self.global_resources
         ]
 
     def job_weight(self, job):
         res = job.resources
         return [
-            self.calc_resource(name, res.get(name, 0))
-            for name in self.workflow.global_resources
+            self.calc_resource(name, res.get(name, 0)) for name in self.global_resources
         ]
 
     def job_reward(self, job):


=====================================
snakemake/script.py
=====================================
@@ -244,6 +244,8 @@ class JuliaEncoder:
 
 
 class ScriptBase(ABC):
+    editable = False
+
     def __init__(
         self,
         path,
@@ -291,7 +293,9 @@ class ScriptBase(ABC):
         self.cleanup_scripts = cleanup_scripts
         self.shadow_dir = shadow_dir
 
-    def evaluate(self):
+    def evaluate(self, edit=False):
+        assert not edit or self.editable
+
         fd = None
         try:
             # generate preamble
@@ -307,7 +311,7 @@ class ScriptBase(ABC):
                 self.write_script(preamble, fd)
 
             # execute script
-            self.execute_script(fd.name)
+            self.execute_script(fd.name, edit=edit)
         except URLError as e:
             raise WorkflowError(e)
         finally:
@@ -320,6 +324,10 @@ class ScriptBase(ABC):
                     # nothing to clean up (TODO: ??)
                     pass
 
+    @property
+    def local_path(self):
+        return self.path[7:]
+
     @abstractmethod
     def get_preamble(self):
         ...
@@ -329,7 +337,7 @@ class ScriptBase(ABC):
         ...
 
     @abstractmethod
-    def execute_script(self, fname):
+    def execute_script(self, fname, edit=False):
         ...
 
     def _execute_cmd(self, cmd, **kwargs):
@@ -398,9 +406,9 @@ class PythonScript(ScriptBase):
 
         return textwrap.dedent(
             """
-        ######## Snakemake header ########
+        ######## snakemake preamble start (automatically inserted, do not edit) ########
         import sys; sys.path.extend([{searchpath}]); import pickle; snakemake = pickle.loads({snakemake}); from snakemake.logging import logger; logger.printshellcmds = {printshellcmds}; {preamble_addendum}
-        ######## Original script #########
+        ######## snakemake preamble end #########
         """
         ).format(
             searchpath=searchpath,
@@ -444,7 +452,7 @@ class PythonScript(ScriptBase):
         fd.write(preamble.encode())
         fd.write(self.source)
 
-    def execute_script(self, fname):
+    def execute_script(self, fname, edit=False):
         py_exec = sys.executable
         if self.conda_env is not None:
             py = os.path.join(self.conda_env, "bin", "python")
@@ -505,7 +513,7 @@ class RScript(ScriptBase):
     ):
         return textwrap.dedent(
             """
-        ######## Snakemake header ########
+        ######## snakemake preamble start (automatically inserted, do not edit) ########
         library(methods)
         Snakemake <- setClass(
             "Snakemake",
@@ -545,7 +553,7 @@ class RScript(ScriptBase):
         )
         {preamble_addendum}
 
-        ######## Original script #########
+        ######## snakemake preamble end #########
         """
         ).format(
             REncoder.encode_namedlist(input_),
@@ -601,7 +609,7 @@ class RScript(ScriptBase):
         fd.write(preamble.encode())
         fd.write(self.source)
 
-    def execute_script(self, fname):
+    def execute_script(self, fname, edit=False):
         if self.conda_env is not None and "R_LIBS" in os.environ:
             logger.warning(
                 "R script job uses conda environment but "
@@ -619,7 +627,7 @@ class RMarkdown(ScriptBase):
     def get_preamble(self):
         return textwrap.dedent(
             """
-        ######## Snakemake header ########
+        ######## snakemake preamble start (automatically inserted, do not edit) ########
         library(methods)
         Snakemake <- setClass(
             "Snakemake",
@@ -658,7 +666,7 @@ class RMarkdown(ScriptBase):
             }}
         )
 
-        ######## Original script #########
+        ######## snakemake preamble end #########
         """
         ).format(
             REncoder.encode_namedlist(self.input),
@@ -700,7 +708,7 @@ class RMarkdown(ScriptBase):
         fd.write(preamble.encode())
         fd.write(str.encode(code[pos:]))
 
-    def execute_script(self, fname):
+    def execute_script(self, fname, edit=False):
         if len(self.output) != 1:
             raise WorkflowError(
                 "RMarkdown scripts (.Rmd) may only have a single output file."
@@ -718,7 +726,7 @@ class JuliaScript(ScriptBase):
     def get_preamble(self):
         return textwrap.dedent(
             """
-                ######## Snakemake header ########
+                ######## snakemake preamble start (automatically inserted, do not edit) ########
                 struct Snakemake
                     input::Dict
                     output::Dict
@@ -747,7 +755,7 @@ class JuliaScript(ScriptBase):
                     {}, #scriptdir::String
                     #, #source::Any
                 )
-                ######## Original script #########
+                ######## snakemake preamble end #########
                 """.format(
                 JuliaEncoder.encode_namedlist(self.input),
                 JuliaEncoder.encode_namedlist(self.output),
@@ -779,13 +787,11 @@ class JuliaScript(ScriptBase):
         fd.write(preamble.encode())
         fd.write(self.source)
 
-    def execute_script(self, fname):
+    def execute_script(self, fname, edit=False):
         self._execute_cmd("julia {fname:q}", fname=fname)
 
 
 def get_source(path, basedir="."):
-    import nbformat
-
     source = None
     if not path.startswith("http") and not path.startswith("git+file"):
         if path.startswith("file://"):
@@ -795,6 +801,7 @@ def get_source(path, basedir="."):
         if not os.path.isabs(path):
             path = os.path.abspath(os.path.join(basedir, path))
         path = "file://" + path
+    # TODO this should probably be removed again. It does not work for report and hash!
     path = format(path, stepout=1)
     if path.startswith("file://"):
         sourceurl = "file:" + pathname2url(path[7:])
@@ -809,6 +816,14 @@ def get_source(path, basedir="."):
         with urlopen(sourceurl) as source:
             source = source.read()
 
+    language = get_language(path, source)
+
+    return path, source, language
+
+
+def get_language(path, source):
+    import nbformat
+
     language = None
     if path.endswith(".py"):
         language = "python"
@@ -828,7 +843,7 @@ def get_source(path, basedir="."):
 
         language += "_" + kernel_language.lower()
 
-    return path, source, language
+    return language
 
 
 def script(
@@ -858,18 +873,18 @@ def script(
     """
     path, source, language = get_source(path, basedir)
 
-    ExecClass = {
+    exec_class = {
         "python": PythonScript,
         "r": RScript,
         "rmarkdown": RMarkdown,
         "julia": JuliaScript,
     }.get(language, None)
-    if ExecClass is None:
+    if exec_class is None:
         raise ValueError(
             "Unsupported script: Expecting either Python (.py), R (.R), RMarkdown (.Rmd) or Julia (.jl) script."
         )
 
-    executor = ExecClass(
+    executor = exec_class(
         path,
         source,
         basedir,


=====================================
snakemake/workflow.py
=====================================
@@ -103,6 +103,7 @@ class Workflow:
         cores=1,
         resources=None,
         conda_cleanup_pkgs=None,
+        edit_notebook=False,
     ):
         """
         Create the controller.
@@ -162,6 +163,7 @@ class Workflow:
         self.run_local = run_local
         self.report_text = None
         self.conda_cleanup_pkgs = conda_cleanup_pkgs
+        self.edit_notebook = edit_notebook
         # environment variables to pass to jobs
         # These are defined via the "envvars:" syntax in the Snakefile itself
         self.envvars = set()
@@ -851,16 +853,21 @@ class Workflow:
                     logger.resources_info(
                         "Provided cluster nodes: {}".format(self.nodes)
                     )
+                elif kubernetes or tibanna:
+                    logger.resources_info("Provided cloud nodes: {}".format(self.nodes))
                 else:
-                    warning = (
-                        "" if self.cores > 1 else " (use --cores to define parallelism)"
-                    )
-                    logger.resources_info(
-                        "Provided cores: {}{}".format(self.cores, warning)
-                    )
-                    logger.resources_info(
-                        "Rules claiming more threads " "will be scaled down."
-                    )
+                    if self.cores is not None:
+                        warning = (
+                            ""
+                            if self.cores > 1
+                            else " (use --cores to define parallelism)"
+                        )
+                        logger.resources_info(
+                            "Provided cores: {}{}".format(self.cores, warning)
+                        )
+                        logger.resources_info(
+                            "Rules claiming more threads " "will be scaled down."
+                        )
 
                 provided_resources = format_resources(self.global_resources)
                 if provided_resources:


=====================================
tests/test_jupyter_notebook/expected-results/result_final.txt
=====================================
@@ -0,0 +1 @@
+result of serious computation!!!!!!
\ No newline at end of file


=====================================
tests/test_jupyter_notebook/expected-results/result_intermediate.txt
=====================================
@@ -0,0 +1 @@
+result of serious computation!!!
\ No newline at end of file


=====================================
tests/tests.py
=====================================
@@ -1014,3 +1014,7 @@ def test_string_resources():
         default_resources=DefaultResources(["gpu_model='nvidia-tesla-1000'"]),
         cluster="./qsub.py",
     )
+
+
+def test_jupyter_notebook():
+    run(dpath("test_jupyter_notebook"), use_conda=True)



View it on GitLab: https://salsa.debian.org/med-team/snakemake/-/compare/9667f24ac33e6204cd6801693ca496db43c21384...443804191d0ee873fc54e14ad9392386f6c69d13

-- 
View it on GitLab: https://salsa.debian.org/med-team/snakemake/-/compare/9667f24ac33e6204cd6801693ca496db43c21384...443804191d0ee873fc54e14ad9392386f6c69d13
You're receiving this email because of your account on salsa.debian.org.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://alioth-lists.debian.net/pipermail/debian-med-commit/attachments/20200504/9a4ef371/attachment-0001.html>


More information about the debian-med-commit mailing list