[med-svn] [Git][med-team/snakemake][upstream] New upstream version 5.17.0

Sat May 9 16:45:14 BST 2020


Steffen Möller pushed to branch upstream at Debian Med / snakemake


Commits:
4cdfa957 by Steffen Moeller at 2020-05-09T17:26:55+02:00
New upstream version 5.17.0
- - - - -


15 changed files:

- CHANGELOG.rst
- Dockerfile
- docs/getting_started/installation.rst
- examples/c/README.txt
- snakemake/__init__.py
- snakemake/_version.py
- snakemake/caching/local.py
- snakemake/executors.py
- snakemake/notebook.py
- snakemake/remote/XRootD.py
- snakemake/report/report.html
- snakemake/workflow.py
- tests/test_envvars/Snakefile
- + tests/test_envvars/expected-results/test2.out
- tests/tests.py


Changes:

=====================================
CHANGELOG.rst
=====================================
@@ -1,3 +1,15 @@
+[5.17.0] - 2020-05-07
+=====================
+Added
+-----
+- --envvars flag for passing secrets to cloud executors
+Changed
+-------
+- Wider thumbnail dialogs in report.
+- Updated installation instructions.
+- Various small kubernetes bug fixes.
+- Bug fix for iRods remote files.
+
 [5.16.0] - 2020-04-29
 =====================
 Added


=====================================
Dockerfile
=====================================
@@ -6,8 +6,8 @@ ENV PATH /opt/conda/bin:${PATH}
 ENV LANG C.UTF-8
 ENV SHELL /bin/bash
 RUN /bin/bash -c "install_packages wget bzip2 ca-certificates gnupg2 squashfs-tools git && \
-    wget -O- http://neuro.debian.net/lists/xenial.us-ca.full > /etc/apt/sources.list.d/neurodebian.sources.list && \
-    wget -O- http://neuro.debian.net/_static/neuro.debian.net.asc | apt-key add - && \
+    wget -O- https://neuro.debian.net/lists/xenial.us-ca.full > /etc/apt/sources.list.d/neurodebian.sources.list && \
+    wget -O- https://neuro.debian.net/_static/neuro.debian.net.asc | apt-key add - && \
     install_packages singularity-container && \
     wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh && \
     bash Miniconda3-latest-Linux-x86_64.sh -b -p /opt/conda && \


=====================================
docs/getting_started/installation.rst
=====================================
@@ -23,82 +23,42 @@ Make sure to ...
 * Install the **Python 3** version of Miniconda.
 * Answer yes to the question whether conda shall be put into your PATH.
 
-Then, you can install Snakemake with
+The default conda solver is a bit slow and sometimes has issues with `selecting the latest package releases <https://github.com/conda/conda/issues/9905>`_. Therefore, we recommend to install `Mamba <https://github.com/QuantStack/mamba>?`_ as a drop-in replacement via
 
 .. code-block:: console
 
-    $ conda install -c conda-forge -c bioconda snakemake
+    $ conda install -c conda-forge mamba
 
-from the `Bioconda <https://bioconda.github.io>`_ channel.
-Alternatively, Snakemake can be installed into an isolated software environment with
+Then, you can install Snakemake with
 
 .. code-block:: console
 
-    $ conda create -c conda-forge -c bioconda -n snakemake snakemake
+    $ mamba create -c conda-forge -c bioconda -n snakemake snakemake
 
-The software environment has to be activated before using Snakemake:
+from the `Bioconda <https://bioconda.github.io>`_ channel.
+This will install snakemake into an isolated software environment, that has to be activated with
 
 .. code-block:: console
 
     $ conda activate snakemake
     $ snakemake --help
 
+Installing into isolated environments is best practice in order to avoid side effects with other packages.
 A minimal version of Snakemake which only depends on the bare necessities can be installed with
 
 .. code-block:: console
 
-    $ conda install -c bioconda -c conda-forge snakemake-minimal
+    $ mamba create -c bioconda -c conda-forge -n snakemake snakemake-minimal
 
 Note that Snakemake is available via Bioconda for historical, reproducibility, and continuity reasons.
 However, it is easy to combine Snakemake installation with other channels, e.g., by prefixing the package name with ``::bioconda``, i.e.,
 
 .. code-block:: console
 
-    $ conda install -c conda-forge bioconda::snakemake bioconda::snakemake-minimal
-
-Global Installation
-===================
-
-With a working Python ``>=3.5`` setup, installation of Snakemake can be performed by issuing
-
-.. code-block:: console
-
-    $ easy_install3 snakemake
-
-or
-
-.. code-block:: console
-
-    $ pip3 install snakemake
-
-in your terminal.
-
-
-Installing in Virtualenv
-========================
-
-To create an installation in a virtual environment, use the following commands:
-
-.. code-block:: console
-
-    $ virtualenv -p python3 .venv
-    $ source .venv/bin/activate
-    $ pip install snakemake
-
-
-Installing from Source
-======================
-
-We recommend installing Snakemake into a virtualenv or a conda environment instead of globally.
-Use the following commands to create a virtualenv and install Snakemake.
-Note that this will install the development version and as you are installing from the source code, we trust that you know what you are doing and how to checkout individual versions/tags.
-
-.. code-block:: console
+    $ mamba create -n some-env -c conda-forge bioconda::snakemake bioconda::snakemake-minimal ...
 
-    $ git clone https://github.com/snakemake/snakemake.git
-    $ cd snakemake
-    $ virtualenv -p python3 .venv
-    $ source .venv/bin/activate
-    $ python setup.py install
+Installation via pip
+====================
 
-You can also use ``python setup.py develop`` to create a "development installation" in which no files are copied but a link is created and changes in the source code are immediately visible in your ``snakemake`` commands.
+Instead of conda, snakemake can be installed with pip.
+However, note that snakemake has non-python dependencies, such that the pip based installation has a limited functionality if those dependencies are not manually installed in addition.


=====================================
examples/c/README.txt
=====================================
@@ -1 +1 @@
-http://www.cs.colby.edu/maxwell/courses/tutorials/maketutor/
+https://www.cs.colby.edu/maxwell/courses/tutorials/maketutor/


=====================================
snakemake/__init__.py
=====================================
@@ -154,6 +154,7 @@ def snakemake(
     keep_incomplete=False,
     messaging=None,
     edit_notebook=None,
+    envvars=None,
 ):
     """Run snakemake on a given snakefile.
 
@@ -500,6 +501,7 @@ def snakemake(
             nodes=nodes,
             resources=resources,
             edit_notebook=edit_notebook,
+            envvars=envvars,
         )
         success = True
         workflow.include(
@@ -976,6 +978,12 @@ def get_argument_parser(profile=None):
             "the given order."
         ),
     )
+    group_exec.add_argument(
+        "--envvars",
+        nargs="+",
+        metavar="VARNAME",
+        help="Environment variables to pass to cloud jobs.",
+    )
     group_exec.add_argument(
         "--directory",
         "-d",
@@ -2250,6 +2258,7 @@ def main(argv=None):
             show_failed_logs=args.show_failed_logs,
             keep_incomplete=args.keep_incomplete,
             edit_notebook=args.edit_notebook,
+            envvars=args.envvars,
             log_handler=log_handler,
         )
 


=====================================
snakemake/_version.py
=====================================
@@ -22,9 +22,9 @@ def get_keywords():
     # setup.py/versioneer.py will grep for the variable names, so they must
     # each be defined on a line of their own. _version.py will just call
     # get_keywords().
-    git_refnames = " (tag: v5.16.0)"
-    git_full = "7ec74f79067b3392607479c6f3bec4a2e90cbc05"
-    git_date = "2020-04-29 16:01:58 +0200"
+    git_refnames = " (tag: v5.17.0)"
+    git_full = "0570e1a7c450e13891c108176e4836377de134e4"
+    git_date = "2020-05-07 14:26:43 +0200"
     keywords = {"refnames": git_refnames, "full": git_full, "date": git_date}
     return keywords
 


=====================================
snakemake/caching/local.py
=====================================
@@ -26,6 +26,19 @@ class OutputFileCache(AbstractOutputFileCache):
     def __init__(self):
         super().__init__()
         self.path = Path(self.cache_location)
+        # make readable/writeable for all
+        self.file_permissions = (
+            stat.S_IRUSR
+            | stat.S_IWUSR
+            | stat.S_IRGRP
+            | stat.S_IWGRP
+            | stat.S_IROTH
+            | stat.S_IWOTH
+        )
+        # directories need to have exec permission as well (for opening)
+        self.dir_permissions = (
+            self.file_permissions | stat.S_IXUSR | stat.S_IXGRP | stat.S_IXOTH
+        )
 
     def check_writeable(self, cachefile):
         if not (os.access(cachefile.parent, os.W_OK) or os.access(cachefile, os.W_OK)):
@@ -63,16 +76,8 @@ class OutputFileCache(AbstractOutputFileCache):
                 # does not lead to concurrent writes to the same file.
                 # We can use the plain copy method of shutil, because we do not care about the metadata.
                 shutil.move(outputfile, tmp, copy_function=shutil.copy)
-                # make readable/writeable for all
-                os.chmod(
-                    tmp,
-                    stat.S_IRUSR
-                    | stat.S_IWUSR
-                    | stat.S_IRGRP
-                    | stat.S_IWGRP
-                    | stat.S_IROTH
-                    | stat.S_IWOTH,
-                )
+
+                self.set_permissions(tmp)
 
                 # Move to the actual path (now we are on the same FS, hence move is atomic).
                 # Here we use the default copy function, also copying metadata (which is important here).
@@ -83,7 +88,7 @@ class OutputFileCache(AbstractOutputFileCache):
 
     def fetch(self, job: Job):
         """
-        Retrieve cached output file and copy to the place where the job expects it's output.
+        Retrieve cached output file and symlink to the place where the job expects it's output.
         """
         for outputfile, cachefile in self.get_outputfiles_and_cachefiles(job):
 
@@ -91,8 +96,15 @@ class OutputFileCache(AbstractOutputFileCache):
                 self.raise_cache_miss_exception(job)
 
             self.check_readable(cachefile)
-
-            self.symlink(cachefile, outputfile)
+            if cachefile.is_dir():
+                # For directories, create a new one and symlink each entry.
+                # Then, the .snakemake_timestamp of the new dir is touched
+                # by the executor.
+                outputfile.mkdir(parents=True, exist_ok=True)
+                for f in cachefile.iterdir():
+                    self.symlink(f, outputfile / f.name)
+            else:
+                self.symlink(cachefile, outputfile)
 
     def exists(self, job: Job):
         """
@@ -111,7 +123,7 @@ class OutputFileCache(AbstractOutputFileCache):
         base_path = self.path / provenance_hash
 
         return (
-            (outputfile, base_path.with_suffix(ext))
+            (Path(outputfile), base_path.with_suffix(ext))
             for outputfile, ext in self.get_outputfiles(job)
         )
 
@@ -128,3 +140,17 @@ class OutputFileCache(AbstractOutputFileCache):
                 )
             )
             shutil.copyfile(path, outputfile)
+
+    def set_permissions(self, entry):
+        # make readable/writeable for all
+        if entry.is_dir():
+            # recursively apply permissions for all contained files
+            for root, dirs, files in os.walk(entry):
+                root = Path(root)
+                for d in dirs:
+                    os.chmod(root / d, self.dir_permissions)
+                for f in files:
+                    os.chmod(root / f, self.file_permissions)
+            os.chmod(entry, self.dir_permissions)
+        else:
+            os.chmod(entry, self.file_permissions)


=====================================
snakemake/executors.py
=====================================
@@ -1413,6 +1413,20 @@ class KubernetesExecutor(ClusterExecutor):
                     "to the working directory are allowed.".format(f)
                 )
                 continue
+
+            # The kubernetes API can't create secret files larger than 1MB.
+            source_file_size = os.path.getsize(f)
+            max_file_size = 1000000
+            if source_file_size > max_file_size:
+                logger.warning(
+                    "Skipping the source file {f}. Its size {source_file_size} exceeds "
+                    "the maximum file size (1MB) that can be passed "
+                    "from host to kubernetes.".format(
+                        f=f, source_file_size=source_file_size
+                    )
+                )
+                continue
+
             with open(f, "br") as content:
                 key = "f{}".format(i)
                 self.secret_files[key] = f


=====================================
snakemake/notebook.py
=====================================
@@ -62,10 +62,8 @@ class JupyterNotebook(ScriptBase):
         if edit is not None:
             logger.info("Opening notebook for editing.")
             cmd = (
-                "jupyter notebook --log-level ERROR --ip {edit.ip} --port {edit.port} "
-                "--no-browser --NotebookApp.quit_button=True {{fname:q}}".format(
-                    edit=edit
-                )
+                "jupyter notebook --no-browser --log-level ERROR --ip {edit.ip} --port {edit.port} "
+                "--NotebookApp.quit_button=True {{fname:q}}".format(edit=edit)
             )
         else:
             cmd = (


=====================================
snakemake/remote/XRootD.py
=====================================
@@ -157,7 +157,25 @@ class XRootDHelper(object):
         matches = [
             f for f in self.list_directory(domain, dirname) if f.name == filename
         ]
-        assert len(matches) == 1
+
+        assert len(matches) > 0
+        if len(matches) > 1:
+            # -- check matches for consistency
+            # There is a transient effect in XRootD
+            # where a file may match more than once.
+            # This is okay as long as the statinfo
+            # is the same for all of them.
+            relevant_properties = [  # we only need to check front-facing attributes
+                x
+                for x in dir(matches[0].statinfo)
+                if not (x[:1] == "_" or x[-2:] == "__")
+            ]
+            assert all(
+                getattr(m.statinfo, p) == getattr(matches[0].statinfo, p)
+                for m in matches[1:]
+                for p in relevant_properties
+            )
+
         return matches[0].statinfo
 
     def file_last_modified(self, filename):


=====================================
snakemake/report/report.html
=====================================
@@ -286,6 +286,10 @@
         max-width: 20vw;
         max-height: 10vh;
       }
+	    
+      .modal-lg {
+    	max-width: 95%;
+      }
     </style>
 
     {% if custom_stylesheet is not none %}


=====================================
snakemake/workflow.py
=====================================
@@ -104,6 +104,7 @@ class Workflow:
         resources=None,
         conda_cleanup_pkgs=None,
         edit_notebook=False,
+        envvars=None,
     ):
         """
         Create the controller.
@@ -201,6 +202,9 @@ class Workflow:
         global checkpoints
         checkpoints = Checkpoints()
 
+        if envvars is not None:
+            self.register_envvars(*envvars)
+
     def lint(self, json=False):
         from snakemake.linting.rules import RuleLinter
         from snakemake.linting.snakefiles import SnakefileLinter
@@ -273,7 +277,7 @@ class Workflow:
         # TODO allow a manifest file as alternative
         try:
             out = subprocess.check_output(
-                ["git", "ls-files", "."], stderr=subprocess.PIPE
+                ["git", "ls-files", "--recurse-submodules", "."], stderr=subprocess.PIPE
             )
             for f in out.decode().split("\n"):
                 if f:
@@ -936,7 +940,7 @@ class Workflow:
         Register environment variables that shall be passed to jobs.
         If used multiple times, union is taken.
         """
-        undefined = [var for var in envvars if var not in os.environ]
+        undefined = set(var for var in envvars if var not in os.environ)
         if undefined:
             raise WorkflowError(
                 "The following environment variables are requested by the workflow but undefined. "


=====================================
tests/test_envvars/Snakefile
=====================================
@@ -1,8 +1,20 @@
 envvars:
     "TEST_ENV_VAR"
 
+rule all:
+    input:
+        "test.out",
+        "test2.out"
+
 rule a:
     output:
         "test.out"
     shell:
         "echo $TEST_ENV_VAR > {output}"
+
+
+rule b:
+    output:
+        "test2.out"
+    shell:
+        "echo $TEST_ENV_VAR > {output}"


=====================================
tests/test_envvars/expected-results/test2.out
=====================================
@@ -0,0 +1 @@
+test


=====================================
tests/tests.py
=====================================
@@ -958,7 +958,8 @@ def test_github_issue78():
 def test_envvars():
     run(dpath("test_envvars"), shouldfail=True)
     os.environ["TEST_ENV_VAR"] = "test"
-    run(dpath("test_envvars"))
+    os.environ["TEST_ENV_VAR2"] = "test"
+    run(dpath("test_envvars"), envvars=["TEST_ENV_VAR2"])
 
 
 def test_github_issue105():



View it on GitLab: https://salsa.debian.org/med-team/snakemake/-/commit/4cdfa957b4d8ec7fa902f342594fa1353c93f56c

-- 
View it on GitLab: https://salsa.debian.org/med-team/snakemake/-/commit/4cdfa957b4d8ec7fa902f342594fa1353c93f56c
You're receiving this email because of your account on salsa.debian.org.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://alioth-lists.debian.net/pipermail/debian-med-commit/attachments/20200509/60a37e6e/attachment-0001.html>