[med-svn] [Git][med-team/snakemake][upstream] New upstream version 6.15.1
Andreas Tille (@tille)
gitlab at salsa.debian.org
Thu Feb 3 10:28:35 GMT 2022
Andreas Tille pushed to branch upstream at Debian Med / snakemake
Commits:
d5ce6eff by Andreas Tille at 2022-02-03T10:38:21+01:00
New upstream version 6.15.1
- - - - -
29 changed files:
- CHANGELOG.md
- docs/executor_tutorial/tutorial.rst
- docs/snakefiles/configuration.rst
- docs/snakefiles/deployment.rst
- docs/snakefiles/rules.rst
- snakemake/__init__.py
- snakemake/_version.py
- snakemake/caching/__init__.py
- snakemake/caching/local.py
- snakemake/dag.py
- snakemake/deployment/conda.py
- snakemake/modules.py
- snakemake/notebook.py
- snakemake/parser.py
- snakemake/report/__init__.py
- snakemake/ruleinfo.py
- snakemake/workflow.py
- + tests/test_cache_multioutput/Snakefile
- + tests/test_cache_multioutput/expected-results/.gitkeep
- + tests/test_default_target/Snakefile
- + tests/test_default_target/expected-results/1.txt
- + tests/test_default_target/expected-results/2.txt
- + tests/test_deploy_hashing/Snakefile
- + tests/test_deploy_hashing/a.post-deploy.sh
- + tests/test_deploy_hashing/a.yaml
- + tests/test_deploy_hashing/b.yaml
- + tests/test_deploy_hashing/expected-results/a.txt
- + tests/test_deploy_hashing/expected-results/b.txt
- tests/tests.py
Changes:
=====================================
CHANGELOG.md
=====================================
@@ -1,5 +1,31 @@
# Changelog
+### [6.15.1](https://www.github.com/snakemake/snakemake/compare/v6.15.0...v6.15.1) (2022-01-31)
+
+
+### Bug Fixes
+
+* consider post-deploy script for env hashing ([#1363](https://www.github.com/snakemake/snakemake/issues/1363)) ([d50efd9](https://www.github.com/snakemake/snakemake/commit/d50efd9d16d029fb0e5b14b182882c71a20552bb))
+
+## [6.15.0](https://www.github.com/snakemake/snakemake/compare/v6.14.0...v6.15.0) (2022-01-29)
+
+
+### Features
+
+* adding default_target directive for declaring default target rules that are not the first rule in the workflow. ([#1358](https://www.github.com/snakemake/snakemake/issues/1358)) ([638ec1a](https://www.github.com/snakemake/snakemake/commit/638ec1a983741cd7ba8faaf1a9dc76ae43d012e5))
+
+
+### Bug Fixes
+
+* Draft notebook filename with wildcards and params. ([#1352](https://www.github.com/snakemake/snakemake/issues/1352)) ([11d4dc8](https://www.github.com/snakemake/snakemake/commit/11d4dc88598ffb901450bd4e076b91f4e27d37b0))
+* proper error message when defining cache eligibility for rules with multiple output files and no multiext declaration. ([#1357](https://www.github.com/snakemake/snakemake/issues/1357)) ([47b5096](https://www.github.com/snakemake/snakemake/commit/47b5096ebbdd3d94a9c99b443064b1b0de389c64))
+
+
+### Documentation
+
+* Command line arguments for configuration files ([#1343](https://www.github.com/snakemake/snakemake/issues/1343)) ([ad8aaa4](https://www.github.com/snakemake/snakemake/commit/ad8aaa4853a150211513baecc474956575d326eb))
+* fix broken link in executor_tutorial/tutorial.rst ([#1360](https://www.github.com/snakemake/snakemake/issues/1360)) ([c9be764](https://www.github.com/snakemake/snakemake/commit/c9be76482d05577c4b1528b0e52ba15fc17a1dd5))
+
## [6.14.0](https://www.github.com/snakemake/snakemake/compare/v6.13.1...v6.14.0) (2022-01-26)
=====================================
docs/executor_tutorial/tutorial.rst
=====================================
@@ -4,7 +4,7 @@
Snakemake Executor Tutorials
============================
-.. _cloud executors: https://snakemake.readthedocs.io/en/stable/executing/cluster-cloud.html
+.. _cloud executors: https://snakemake.readthedocs.io/en/stable/executing/cloud.html
.. _tutorial: https://snakemake.readthedocs.io/en/stable/tutorial/tutorial.html
This set of tutorials are intended to introduce you to executing `cloud executors`_.
=====================================
docs/snakefiles/configuration.rst
=====================================
@@ -37,10 +37,10 @@ In addition to the `configfile` statement, config values can be overwritten via
$ snakemake --config yourparam=1.5
Further, you can manually alter the config dictionary using any Python code **outside** of your rules. Changes made from within a rule won't be seen from other rules.
-Finally, you can use the `--configfile` command line argument to overwrite values from the `configfile` statement.
-Note that any values parsed into the `config` dictionary with any of above mechanisms are merged, i.e., all keys defined via a `configfile`
-statement, or the `--configfile` and `--config` command line arguments will end up in the final `config` dictionary, but if two methods define the same key, command line
-overwrites the `configfile` statement.
+Finally, you can use the ``--configfile`` command line argument to overwrite values from the `configfile` statement.
+Note that any values parsed into the ``config`` dictionary with any of above mechanisms are merged, i.e., all keys defined via a ``configfile``
+statement, or the ``--configfile`` and ``--config`` command line arguments will end up in the final `config` dictionary, but if two methods define the same key, command line
+overwrites the ``configfile`` statement.
For adding config placeholders into a shell command, Python string formatting syntax requires you to leave out the quotes around the key name, like so:
=====================================
docs/snakefiles/deployment.rst
=====================================
@@ -103,7 +103,7 @@ For example, we can easily add another rule to extend the given workflow:
github("snakemake-workflows/dna-seq-gatk-variant-calling", path="workflow/Snakefile", tag="v2.0.1")
config: config
- use rule * from dna_seq
+ use rule * from dna_seq as dna_seq_*
# easily extend the workflow
rule plot_vafs:
@@ -114,7 +114,19 @@ For example, we can easily add another rule to extend the given workflow:
notebook:
"notebooks/plot-vafs.py.ipynb"
-Moreover, it is possible to further extend the workflow with other modules, thereby generating an integrative analysis.
+ # Define a new default target that collects both the targets from the dna_seq module as well as
+ # the new plot.
+ rule all:
+ input:
+ rules.dna_seq_all.input,
+ "results/plots/vafs.svg",
+ default_target: True
+
+Above, we have added a prefix to all rule names of the dna_seq module, such that there is no name clash with the added rules (``as dna_seq_*`` in the ``use rule`` statement).
+In addition, we have added a new rule ``all``, defining the default target in case the workflow is executed (as usually) without any specific target files or rule.
+The new target rule collects both all input files of the rule ``all`` from the dna_seq workflow, as well as additionally collecting the new plot.
+
+It is possible to further extend the workflow with other modules, thereby generating an integrative analysis.
Here, let us assume that we want to conduct another kind of analysis, say RNA-seq, using a different external workflow.
We can extend above example in the following way:
@@ -149,10 +161,20 @@ We can extend above example in the following way:
use rule * from rna_seq as rna_seq_*
-Above, several things have changed. First, we have added another module ``rna_seq``.
-Second, we have added a prefix to all rule names of both modules (``dna_seq_*`` and ``rna_seq_*`` in the ``use rule`` statements) in order to avoid rule name clashes.
-Third, we have added a prefix to all non-absolute input and output file names of both modules (``prefix: "dna-seq"`` and ``prefix: "rna-seq"``) in order to avoid file name clashes.
-Finally, we provide the config of the two modules via two separate sections in the common config file (``config["dna-seq"]`` and ``config["rna-seq"]``).
+
+ # Define a new default target that collects all the targets from the dna_seq and rna_seq module.
+ rule all:
+ input:
+ rules.dna_seq_all.input,
+ rules.rna_seq_all.input,
+ default_target: True
+
+Above, several things have changed.
+
+* First, we have added another module ``rna_seq``.
+* Second, we have added a prefix to all non-absolute input and output file names of both modules (``prefix: "dna-seq"`` and ``prefix: "rna-seq"``) in order to avoid file name clashes.
+* Third, we have added a default target rule that collects both the default targets from the module ``dna_seq`` as well as the module ``rna_seq``.
+* Finally, we provide the config of the two modules via two separate sections in the common config file (``config["dna-seq"]`` and ``config["rna-seq"]``).
----------------------------------
Uploading workflows to WorkflowHub
=====================================
docs/snakefiles/rules.rst
=====================================
@@ -246,11 +246,25 @@ By default snakemake executes the first rule in the snakefile. This gives rise t
.. code-block:: python
rule all:
- input: ["{dataset}/file.A.txt".format(dataset=dataset) for dataset in DATASETS]
+ input:
+ expand("{dataset}/file.A.txt", dataset=DATASETS)
+
+
+Here, for each dataset in a python list ``DATASETS`` defined before, the file ``{dataset}/file.A.txt`` is requested.
+In this example, Snakemake recognizes automatically that these can be created by multiple applications of the rule ``complex_conversion`` shown above.
+It is possible to overwrite this behavior to use the first rule as a default target, by explicitly marking a rule as being the default target via the ``default_target`` directive:
-Here, for each dataset in a python list ``DATASETS`` defined before, the file ``{dataset}/file.A.txt`` is requested. In this example, Snakemake recognizes automatically that these can be created by multiple applications of the rule ``complex_conversion`` shown above.
+.. code-block:: python
+
+ rule xy:
+ input:
+ expand("{dataset}/file.A.txt", dataset=DATASETS)
+ default_target: True
+Regardless of where this rule appears in the Snakefile, it will be the default target.
+Usually, it is still recommended to keep the default target rule (and in fact all other rules that could act as optional targets) at the top of the file, such that it can be easily found.
+The ``default_target`` directive becomes particularly useful when :ref:`combining several pre-existing workflows <use_with_modules>`.
.. _snakefiles-threads:
=====================================
snakemake/__init__.py
=====================================
@@ -591,7 +591,9 @@ def snakemake(
success = True
workflow.include(
- snakefile, overwrite_first_rule=True, print_compilation=print_compilation
+ snakefile,
+ overwrite_default_target=True,
+ print_compilation=print_compilation,
)
workflow.check()
=====================================
snakemake/_version.py
=====================================
@@ -22,9 +22,9 @@ def get_keywords():
# setup.py/versioneer.py will grep for the variable names, so they must
# each be defined on a line of their own. _version.py will just call
# get_keywords().
- git_refnames = " (HEAD -> main, tag: v6.14.0)"
- git_full = "6c505c290facddbc44da69d715e90b7680ad9ceb"
- git_date = "2022-01-26 11:32:12 +0100"
+ git_refnames = " (HEAD -> main, tag: v6.15.1)"
+ git_full = "8c5c5e8261ae12fc40c3c293045dea6af7df0e01"
+ git_date = "2022-01-31 12:10:44 +0100"
keywords = {"refnames": git_refnames, "full": git_full, "date": git_date}
return keywords
=====================================
snakemake/caching/__init__.py
=====================================
@@ -47,6 +47,9 @@ class AbstractOutputFileCache:
)
yield from ((f, f[prefix_len:]) for f in job.output)
else:
+ assert (
+ len(job.output) == 1
+ ), "bug: multiple output files in cacheable job but multiext not used for declaring them"
yield (job.output[0], "")
def raise_write_error(self, entry, exception=None):
=====================================
snakemake/caching/local.py
=====================================
@@ -95,6 +95,12 @@ class OutputFileCache(AbstractOutputFileCache):
if not cachefile.exists():
self.raise_cache_miss_exception(job)
+ logger.debug(
+ "Output file {} exists as {} in the cache.".format(
+ outputfile, cachefile
+ )
+ )
+
self.check_readable(cachefile)
if cachefile.is_dir():
# For directories, create a new one and symlink each entry.
=====================================
snakemake/dag.py
=====================================
@@ -622,7 +622,7 @@ class DAG:
and not job.is_checkpoint
and (
job not in self.targetjobs
- or job.rule.name == self.workflow.first_rule
+ or job.rule.name == self.workflow.default_target
)
):
tempfiles = (
=====================================
snakemake/deployment/conda.py
=====================================
@@ -62,8 +62,14 @@ class Env:
):
self.file = None
self.name = None
+ self.post_deploy_file = None
if env_file is not None:
self.file = infer_source_file(env_file)
+ deploy_file = Path(self.file.get_path_or_uri()).with_suffix(
+ ".post-deploy.sh"
+ )
+ if deploy_file.exists():
+ self.post_deploy_file = infer_source_file(deploy_file)
if env_name is not None:
assert env_file is None, "bug: both env_file and env_name specified"
self.name = env_name
@@ -101,8 +107,9 @@ class Env:
def _get_content_deploy(self):
self.check_is_file_based()
- deploy_file = Path(self.file).with_suffix(".post-deploy.sh")
- return self.workflow.sourcecache.open(deploy_file, "rb").read()
+ if self.post_deploy_file:
+ return self.workflow.sourcecache.open(self.post_deploy_file, "rb").read()
+ return None
@property
def _env_archive_dir(self):
@@ -139,6 +146,9 @@ class Env:
md5hash.update(env_dir.encode())
if self._container_img:
md5hash.update(self._container_img.url.encode())
+ content_deploy = self.content_deploy
+ if content_deploy:
+ md5hash.update(content_deploy)
md5hash.update(self.content)
self._hash = md5hash.hexdigest()
return self._hash
@@ -148,6 +158,9 @@ class Env:
if self._content_hash is None:
md5hash = hashlib.md5()
md5hash.update(self.content)
+ content_deploy = self.content_deploy
+ if content_deploy:
+ md5hash.update(content_deploy)
self._content_hash = md5hash.hexdigest()
return self._content_hash
@@ -317,11 +330,7 @@ class Env:
tmp.write(self.content)
env_file = tmp.name
tmp_env_file = tmp.name
- if (
- Path(self.file.get_path_or_uri())
- .with_suffix(".post-deploy.sh")
- .exists()
- ):
+ if self.post_deploy_file:
with tempfile.NamedTemporaryFile(
delete=False, suffix=".post-deploy.sh"
) as tmp:
@@ -331,8 +340,7 @@ class Env:
tmp_deploy_file = tmp.name
else:
env_file = env_file.get_path_or_uri()
- if Path(env_file).with_suffix(".post-deploy.sh").exists():
- deploy_file = Path(env_file).with_suffix(".post-deploy.sh")
+ deploy_file = self.post_deploy_file
env_path = self.address
=====================================
snakemake/modules.py
=====================================
@@ -86,7 +86,7 @@ class ModuleInfo:
prefix=self.prefix,
replace_wrapper_tag=self.get_wrapper_tag(),
):
- self.workflow.include(snakefile, overwrite_first_rule=True)
+ self.workflow.include(snakefile, overwrite_default_target=True)
def get_snakefile(self):
if self.meta_wrapper:
=====================================
snakemake/notebook.py
=====================================
@@ -13,6 +13,7 @@ from snakemake.logging import logger
from snakemake.common import is_local_file
from snakemake.common import ON_WINDOWS
from snakemake.sourcecache import SourceCache, infer_source_file
+from snakemake.utils import format
KERNEL_STARTED_RE = re.compile(r"Kernel started: (?P<kernel_id>\S+)")
KERNEL_SHUTDOWN_RE = re.compile(r"Kernel shutdown: (?P<kernel_id>\S+)")
@@ -262,6 +263,7 @@ def notebook(
Load a script from the given basedir + path and execute it.
"""
draft = False
+ path = format(path, wildcards=wildcards, params=params)
if edit is not None:
if is_local_file(path):
if not os.path.isabs(path):
=====================================
snakemake/parser.py
=====================================
@@ -484,6 +484,12 @@ class Cache(RuleKeywordState):
return "cache_rule"
+class DefaultTarget(RuleKeywordState):
+ @property
+ def keyword(self):
+ return "default_target_rule"
+
+
class Handover(RuleKeywordState):
pass
@@ -673,6 +679,7 @@ rule_property_subautomata = dict(
group=Group,
cache=Cache,
handover=Handover,
+ default_target=DefaultTarget,
)
=====================================
snakemake/report/__init__.py
=====================================
@@ -281,7 +281,11 @@ class RuleRecord:
self._rule.notebook
):
_, source, language, _ = script.get_source(
- self._rule.notebook, self._rule.workflow.sourcecache, self._rule.basedir
+ self._rule.notebook,
+ self._rule.workflow.sourcecache,
+ self._rule.basedir,
+ wildcards=self.wildcards,
+ params=self.params,
)
language = language.split("_")[1]
sources = notebook.get_cell_sources(source)
=====================================
snakemake/ruleinfo.py
=====================================
@@ -37,6 +37,7 @@ class RuleInfo:
self.cache = False
self.path_modifier = None
self.handover = False
+ self.default_target = False
def apply_modifier(
self, modifier, prefix_replacables={"input", "output", "log", "benchmark"}
=====================================
snakemake/workflow.py
=====================================
@@ -155,7 +155,7 @@ class Workflow:
self.global_resources["_nodes"] = nodes
self._rules = OrderedDict()
- self.first_rule = None
+ self.default_target = None
self._workdir = None
self.overwrite_workdir = overwrite_workdir
self.workdir_init = os.path.abspath(os.curdir)
@@ -466,8 +466,8 @@ class Workflow:
self._rules[rule.name] = rule
if not is_overwrite:
self.rule_count += 1
- if not self.first_rule:
- self.first_rule = rule.name
+ if not self.default_target:
+ self.default_target = rule.name
return name
def is_rule(self, name):
@@ -644,7 +644,9 @@ class Workflow:
return map(relpath, filterfalse(self.is_rule, items))
if not targets:
- targets = [self.first_rule] if self.first_rule is not None else list()
+ targets = (
+ [self.default_target] if self.default_target is not None else list()
+ )
if prioritytargets is None:
prioritytargets = list()
@@ -1148,7 +1150,7 @@ class Workflow:
def include(
self,
snakefile,
- overwrite_first_rule=False,
+ overwrite_default_target=False,
print_compilation=False,
overwrite_shellcmd=None,
):
@@ -1164,7 +1166,7 @@ class Workflow:
self.included.append(snakefile)
self.included_stack.append(snakefile)
- first_rule = self.first_rule
+ default_target = self.default_target
code, linemap, rulecount = parse(
snakefile,
self,
@@ -1185,8 +1187,8 @@ class Workflow:
exec(compile(code, snakefile.get_path_or_uri(), "exec"), self.globals)
- if not overwrite_first_rule:
- self.first_rule = first_rule
+ if not overwrite_default_target:
+ self.default_target = default_target
self.included_stack.pop()
def onstart(self, func):
@@ -1541,6 +1543,14 @@ class Workflow:
rule.is_handover = True
if ruleinfo.cache is True:
+ if len(rule.output) > 1:
+ if not rule.output[0].is_multiext:
+ raise WorkflowError(
+ "Rule is marked for between workflow caching but has multiple output files. "
+ "This is only allowed if multiext() is used to declare them (see docs on between "
+ "workflow caching).",
+ rule=rule,
+ )
if not self.enable_cache:
logger.warning(
"Workflow defines that rule {} is eligible for caching between workflows "
@@ -1550,11 +1560,20 @@ class Workflow:
self.cache_rules.add(rule.name)
elif not (ruleinfo.cache is False):
raise WorkflowError(
- "Invalid argument for 'cache:' directive. Only true allowed. "
+ "Invalid argument for 'cache:' directive. Only True allowed. "
"To deactivate caching, remove directive.",
rule=rule,
)
+ if ruleinfo.default_target is True:
+ self.default_target = rule.name
+ elif not (ruleinfo.default_target is False):
+ raise WorkflowError(
+ "Invalid argument for 'default_target:' directive. Only True allowed. "
+ "Do not use the directive for rules that shall not be the default target. ",
+ rule=rule,
+ )
+
ruleinfo.func.__name__ = "__{}".format(rule.name)
self.globals[ruleinfo.func.__name__] = ruleinfo.func
@@ -1615,6 +1634,13 @@ class Workflow:
return decorate
+ def default_target_rule(self, value):
+ def decorate(ruleinfo):
+ ruleinfo.default_target = value
+ return ruleinfo
+
+ return decorate
+
def message(self, message):
def decorate(ruleinfo):
ruleinfo.message = message
=====================================
tests/test_cache_multioutput/Snakefile
=====================================
@@ -0,0 +1,7 @@
+rule a:
+ output:
+ "1.txt",
+ "2.txt",
+ cache: True
+ shell:
+ "touch {output}"
\ No newline at end of file
=====================================
tests/test_cache_multioutput/expected-results/.gitkeep
=====================================
=====================================
tests/test_default_target/Snakefile
=====================================
@@ -0,0 +1,11 @@
+rule a:
+ output:
+ "{sample}.txt"
+ shell:
+ "echo test > {output}"
+
+
+rule b:
+ input:
+ expand("{sample}.txt", sample=[1, 2])
+ default_target: True
\ No newline at end of file
=====================================
tests/test_default_target/expected-results/1.txt
=====================================
@@ -0,0 +1 @@
+test
=====================================
tests/test_default_target/expected-results/2.txt
=====================================
@@ -0,0 +1 @@
+test
=====================================
tests/test_deploy_hashing/Snakefile
=====================================
@@ -0,0 +1,19 @@
+rule all:
+ input:
+ expand("{s}.txt", s=["a", "b"])
+
+rule a:
+ output:
+ "a.txt"
+ conda:
+ "a.yaml"
+ shell:
+ "touch {output}"
+
+rule b:
+ output:
+ "b.txt"
+ conda:
+ "b.yaml"
+ shell:
+ "touch {output}"
\ No newline at end of file
=====================================
tests/test_deploy_hashing/a.post-deploy.sh
=====================================
@@ -0,0 +1,3 @@
+#!/bin/bash
+
+echo "test" > $CONDA_PREFIX/a.txt
\ No newline at end of file
=====================================
tests/test_deploy_hashing/a.yaml
=====================================
@@ -0,0 +1,5 @@
+channels:
+ - bioconda
+ - conda-forge
+dependencies:
+ - python =3.10
=====================================
tests/test_deploy_hashing/b.yaml
=====================================
@@ -0,0 +1,5 @@
+channels:
+ - bioconda
+ - conda-forge
+dependencies:
+ - python =3.10
=====================================
tests/test_deploy_hashing/expected-results/a.txt
=====================================
=====================================
tests/test_deploy_hashing/expected-results/b.txt
=====================================
=====================================
tests/tests.py
=====================================
@@ -434,6 +434,12 @@ def test_deploy_script():
run(dpath("test_deploy_script"), use_conda=True)
+ at skip_on_windows
+def test_deploy_hashing():
+ tmpdir = run(dpath("test_deploy_hashing"), use_conda=True, cleanup=False)
+ assert len(next(os.walk(os.path.join(tmpdir, ".snakemake/conda")))[1]) == 2
+
+
def test_conda_custom_prefix():
run(
dpath("test_conda_custom_prefix"),
@@ -1397,3 +1403,12 @@ def test_modules_ruledeps_inheritance():
@skip_on_windows
def test_conda_named():
run(dpath("test_conda_named"), use_conda=True)
+
+
+ at skip_on_windows
+def test_default_target():
+ run(dpath("test_default_target"))
+
+
+def test_cache_multioutput():
+ run(dpath("test_cache_multioutput"), shouldfail=True)
View it on GitLab: https://salsa.debian.org/med-team/snakemake/-/commit/d5ce6effe383044853259316c374a35244766f06
--
View it on GitLab: https://salsa.debian.org/med-team/snakemake/-/commit/d5ce6effe383044853259316c374a35244766f06
You're receiving this email because of your account on salsa.debian.org.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://alioth-lists.debian.net/pipermail/debian-med-commit/attachments/20220203/795072ab/attachment-0001.htm>
More information about the debian-med-commit
mailing list