Snakemake is indeed a very useful tool for workflow development and management. However, in certain scenarios, it can also bring some issues. Coincidentally, I discovered a rather unconventional way of using it: only use Snakemake’s dependency handling and task management, while generating scripts separately.
The reason was that when writing the workflow, I had to place an actual executable script at a specific location so that users could read these scripts and modify specific steps before resubmitting for execution.
The original intention was good: allowing users who are not familiar with Snakemake to easily modify analysis parameters for specific samples. However, the problem is that Snakemake does not support exporting actual executable scripts as files; it can only output the execution content directly in the log using the -p
parameter. Moreover, before finding a solution, I was unaware that wildcards could be used in cluster submission commands… So, after several twists and turns, I found this rather unconventional solution.
- Divide the entire workflow into several parts and write programs to generate all required scripts.
- When submitting to SGE, use the
-sync y
parameter withqsub
so that the submission does not end immediately but waits for the completion/failure of the submitted job. If the task fails, it will return a non-zero value. - Use Snakemake to build workflow dependencies as usual, specifying
input
andoutput
normally. Add the execution script toinput
, and only write the submission command inshell:
.
This way, we can solve the problem of not being able to directly generate scripts by generating them ourselves and submitting them.
At the same time, we can also take advantage of Snakemake’s dependency resolution and task management features for progress monitoring and breakpoint continuation.
However, this approach also digs a huge pit: the script execution content and file names are not specified within Snakemake. If they do not match, the workflow will fail, and there is no good method to locate such issues; it relies entirely on the writer’s caution.
For example, you write a script test.sh
like this:
1 | #$ -sync y |
And your Snakemakefile is like this:
1 | rule all: |
When using Snakemake to run, it will never succeed… because the script generates a result file name that does not match the target file name in Snakemake.
By default, since rule test
is determined to have failed, other result files used for judgment (if they exist) will be deleted…
In other words, using this method requires you to ensure consistency between the target file names. Excluding this drawback, this approach can fully leverage Snakemake’s advantages in task management while avoiding the time required to understand the wildcard mechanism when initially getting started with Snakemake.