Why TAFFISH Is Not Another Workflow Engine

1. The confusion is reasonable

TAFFISH has tools. TAFFISH has flows. TAFFISH can organize bioinformatics analyses. From the outside, it is natural to ask whether TAFFISH is simply another workflow engine.

More precisely, no. TAFFISH can help build workflows, but it begins with a smaller object: the command someone would otherwise type, paste, or place inside a larger system.

A workflow engine asks how steps should be connected. TAFFISH asks what kind of command each step is made from.

2. What workflow engines are good at

Workflow engines such as Nextflow and Snakemake solve important problems. They say what should run after what, decide how steps are grouped, and help an analysis move from a few commands to a larger plan that can run on real computing infrastructure.

Galaxy solves a different but equally important problem: it opens a web door into analysis tools. CWL, WDL, Boutiques, and other descriptor systems give tool interfaces and runs a more explicit written form.

We are not arguing against those layers. TAFFISH lives in the same world they do: command-line tools, containers, package records, scripts, and users who need a command to mean the same thing tomorrow.

3. TAFFISH works at the command layer

A workflow task usually ends with a command. The command may look simple:

samtools sort input.bam -o sorted.bam

But the visible line does not tell the whole story. Which samtools is this? Where does it come from? What must be present on the machine? What can another person check before trusting the line?

TAFFISH treats that line as something worth giving a sturdier form. A TAFFISH app lets the same kind of tool call become installable while still looking like a shell command:

taf install samtools
taf-samtools samtools sort input.bam -o sorted.bam

The command stays familiar, but it now has a name, a release trail, backend hints, platform notes, and Hub records behind it.

4. A TAFFISH command is still a command

The practical change can be very small. If an old shell, Perl, or Python script calls samtools, the first TAFFISH step is not to rewrite the whole script into a new language. It can be as simple as replacing one bare tool call with a versioned TAFFISH command:

Old script line:

samtools sort input.bam -o sorted.bam

Updated script line:

taf-samtools-v1.23.1-r1 samtools sort input.bam -o sorted.bam

The updated line is still an ordinary command from the viewpoint of shell, Perl, Python, Make, an HPC job script, or a workflow task. The surrounding script can stay the same: the command can still be called with system(...), subprocess.run(...), a pipe, a loop, or a larger script in the same way other command-line tools are called.

What changes is not how users work, but how much execution context the command carries. The app name pins the TAFFISH package, the version and release identify the wrapper state, and the container backend provides a more controlled runtime than an accidental host installation.

This does not make biological inputs, hardware, or upstream tools disappear as sources of variation. Inputs, reference files, CPU architecture, backend behavior, external databases, time, and randomness can still matter. But when the input is the same, the TAFFISH app version is fixed, the architecture and backend are comparable, and the upstream tool is deterministic, the command is much more likely to mean the same thing on another machine.

That is the kind of portability we want TAFFISH to provide: not a demand that users learn a new workflow DSL or Docker command line first, but a small shell-level change that makes an existing script easier to carry forward.

5. Stable workflows need stable parts

A workflow can be beautifully written and still be fragile. A tool may be installed differently. An image may disappear. A helper program may be missing. A command that looked obvious on one server may mean something else on another.

TAFFISH starts one level earlier. Before asking how to arrange many steps, it asks how one step can become something reusable enough to be carried into the next project.

Once the small pieces are easier to install, inspect, pass around, and compose, higher-level workflows have less instability under them. The workflow file still matters; the commands inside it are simply less mysterious.

6. TAFFISH can work with workflow systems

TAFFISH commands are meant to be usable wherever ordinary commands are already accepted:

directly in an interactive terminal;
inside ordinary shell scripts;
inside lightweight taf flows;
inside HPC job scripts;
inside Nextflow, Snakemake, Galaxy, or other workflow tasks when TAFFISH is available in that task context.

In that sense, TAFFISH is not a higher layer above workflow engines. It is a way of making the commands they call less exposed.

7. We work at a smaller layer

This boundary is not a rejection of larger systems. TAFFISH works at a different and smaller layer. It is a small tool focused on a narrow part of the stack: the command-level boundary where a tool is packaged, versioned, installed, invoked, and shared.

For now, our work is to make that small layer solid. If a bioinformatics command can be easier to package, install, inspect, run, share, and compose without pulling researchers away from shell, then TAFFISH is fulfilling its role.

Let workflow engines arrange the steps; we make each step less fragile.

That is why we do not describe TAFFISH as another workflow engine. We are working on the smaller command layer underneath workflows, and trying to make that layer easier to carry forward.

1. 这个误会很合理

TAFFISH 有 tools，也有 flows，还可以组织生物信息学分析。从外部看，一个很自然的问题是：TAFFISH 是不是又一个 workflow engine？

更准确地说，不是。TAFFISH 可以帮助构建 workflow，但它先从一个更小的对象开始：那一行原本会被人敲进终端、复制进脚本，或者放进更大系统里的命令。

workflow engine 问的是步骤如何连接；TAFFISH 问的是每个步骤到底由什么样的命令构成。

2. workflow engine 擅长什么

Nextflow 和 Snakemake 这类 workflow engine 解决的是重要问题：说明哪些步骤先跑、哪些步骤后跑，把一组任务组织起来，并帮助分析从几行命令扩展到可以在真实计算环境中运行的计划。

Galaxy 解决的是另一类重要问题：它给分析工具打开了一个 Web 入口。CWL、WDL、Boutiques 等描述系统，则让工具接口和执行过程拥有更明确的书面形式。

我们并不是反对这些层。TAFFISH 面对的是同一个世界：命令行工具、容器、包记录、脚本，以及希望一行命令明天仍然表示同一件事的用户。

3. TAFFISH 工作在命令层

一个 workflow task 到最后通常会落成一行命令。这行命令看起来可能很简单：

samtools sort input.bam -o sorted.bam

但可见的这一行并没有说明全部问题。这里的 samtools 到底是哪一个？它从哪里来？机器上必须有什么？另一个人在信任这行命令之前能检查什么？

TAFFISH 把这行命令当作值得加固的对象。一个 TAFFISH app 可以让同样的工具调用变成可安装的形式，但使用时仍然像 Shell 命令：

taf install samtools
taf-samtools samtools sort input.bam -o sorted.bam

命令表面上仍然熟悉，但背后多了名字、发布记录、运行后端提示、平台说明和 Hub 记录。

4. 一个 TAFFISH 命令仍然是命令

实际改动可以很小。如果旧的 Shell、Perl 或 Python 脚本调用了 samtools，TAFFISH 的第一步不是要求用户把整个脚本改写成一门新语言。它可以只是把其中一条裸工具调用换成带版本的 TAFFISH 命令：

原脚本中的一行：

samtools sort input.bam -o sorted.bam

更新后的同一行：

taf-samtools-v1.23.1-r1 samtools sort input.bam -o sorted.bam

从 Shell、Perl、Python、Make、HPC job script 或 workflow task 的角度看，更新后的这一行仍然是一条普通命令。脚本周围的结构可以不变：它仍然可以像其他命令行工具一样，被 system(...)、subprocess.run(...)、pipe、loop 或更大的脚本调用。

发生变化的不是用户工作的方式，而是这条命令携带的执行上下文。App 名称限定了 TAFFISH package，version 和 release 限定了 wrapper 状态，容器 backend 则比偶然存在于宿主机上的安装环境更可控。

这并不会把生物学输入、硬件差异或上游工具本身从可变因素中抹去。输入文件、参考文件、CPU 架构、backend 行为、外部数据库、时间和随机性仍然可能影响结果。但如果输入相同、TAFFISH app 版本固定、架构和 backend 可比、上游工具本身也是确定性的，那么这条命令在另一台机器上就更可能表达同一件事。

我们希望 TAFFISH 提供的正是这种可移植性：不是先要求用户学习新的 workflow DSL 或 Docker 命令行，而是在 Shell 层做一个很小的替换，让原有脚本更容易被带到以后。

5. 稳定流程先需要稳定零件

workflow 文件可以写得很漂亮，但如果里面的零件不稳定，整个流程仍然会脆弱。工具可能安装方式不同，镜像可能消失，辅助程序可能缺失，一行看似普通的命令也可能在另一台机器上变了意思。

TAFFISH 先从更低一层开始。在讨论如何安排很多步骤之前，它先问：一个步骤怎样才能变成足够稳定、可以带到下一个项目里的东西？

当这些小零件更容易安装、检查、传递和组合时，更高层的 workflow 底下就少了一些不稳定。workflow 文件仍然重要，只是它内部的命令不再那么神秘。

6. TAFFISH 可以和 workflow systems 一起工作

TAFFISH 命令应当能出现在普通命令可以出现的地方：

直接出现在交互式终端里；
出现在普通 Shell 脚本里；
出现在轻量 taf flow 里；
出现在 HPC job script 里；
当 task 上下文中能使用 TAFFISH 时，也可以出现在 Nextflow、Snakemake、Galaxy 或其他 workflow task 里。

从这个角度看，TAFFISH 不是 workflow engine 上方的更高一层。它是在让 workflow engine 会调用的那些命令不再那么暴露和孤立。

7. 我们工作的层级更小

这个边界不是因为我们否定更大的系统，而是因为 TAFFISH 工作在另一个更小的层级。它是一个小工具，聚焦在计算栈里很窄的一层：一个工具如何被打包、固定版本、安装、调用和分享的命令层边界。

至少在目前，我们致力于把这个小层级做好。如果一个生物信息学命令可以更容易被打包、安装、检查、运行、分享和组合，同时不把研究者从 Shell 里拉走，那么 TAFFISH 就在履行它的角色。

让 workflow engine 安排步骤；我们让每个步骤少一点脆弱。

这就是为什么我们不把 TAFFISH 描述成又一个 workflow engine。我们是在 workflow 下方更小的命令层工作，并努力让这个层级更容易被带到以后。