What Is TAFFISH?

1. Bioinformatics often begins as shell commands

In practice, many bioinformatics analyses do not begin with a workflow file. They begin with a line typed into a terminal, a small script copied from an older project, or a command found in a README:

fastqc sample.fq.gz
bwa mem ref.fa reads.fq.gz
samtools sort aln.bam
blastp -query protein.fa -db nr
minimap2 ref.fa reads.fa

Over time, those lines end up in server directories, shell history, methods sections, GitHub README files, lab notes, container recipes, HPC module systems, and informal knowledge passed between people. They may be enough to finish one analysis, but not enough to make that analysis easy to rerun somewhere else.

The problem underneath workflow design is that many everyday commands have no durable execution object: no fixed version, no closed runtime, no clear distribution path, and no easy way to inspect what will actually run.

2. TAFFISH is a shell-native execution and delivery layer

This is why TAFFISH is best described at the command execution layer. It is not primarily another workflow system, a conventional package manager, a container wrapper, or a web platform. Its role is narrower and, for daily bioinformatics work, more basic: turn a shell-based tool invocation into an executable package that can be installed, inspected, rerun, and shared.

The name TAFFISH stands for Tools And Flows Framework Intensify SHell.

TAFFISH is a shell-native reproducible execution and delivery layer for bioinformatics command-line tools and lightweight workflows.

The goal is not to pull users out of shell. The goal is to make the commands already used in shell carry enough structure to be reproduced by someone else.

3. Why TAFFISH stays close to Shell

Shell is still the floor of much bioinformatics computing. Even when a project uses Nextflow, Snakemake, Galaxy, CWL, Python, R, or an HPC scheduler, many tasks eventually become command-line tool invocations.

Wrapping command-line tools is not new. CWL, Boutiques, Galaxy wrappers, Snakemake wrappers, nf-core modules, shpc, and BioContainers all address this space from different directions. TAFFISH takes a particular route: it keeps shell as the first working surface.

TAFFISH brings reproducibility back to shell commands instead of moving those commands into a separate working surface first.

A TAFFISH command still looks like a normal shell command:

taf update
taf install samtools
taf info samtools
taf-samtools samtools view --help

The visible shape is familiar. What changes is the information attached to the command: version, container, parameters, backend, metadata, and Hub distribution records.

4. TAFFISH complements existing systems from a lower layer

The comparison is not a ranking. These systems solve different parts of reproducible computing. TAFFISH is meant to cover a lower layer that many of them eventually depend on: the command that actually runs.

System or technology	Main layer	Relationship to TAFFISH
Nextflow / Snakemake	Workflow orchestration	TAFFISH commands can serve as reproducible command bricks inside workflow tasks.
Galaxy	Web platform	Galaxy brings tools into a web platform; TAFFISH brings reproducibility back to shell commands.
Conda / Bioconda	Software installation	Bioconda installs software; TAFFISH packages how software is reproducibly executed as a command.
Docker / Podman / Apptainer	Runtime environment	Containers provide the runtime closure; TAFFISH binds that closure to command interfaces and Hub metadata.
CWL / WDL / Boutiques	Descriptor standards	These systems describe portable tools; TAFFISH emphasizes shell-native use and executable package delivery.
shpc / BioContainers	Containerized command exposure	TAFFISH exposes commands while also binding parameters, metadata, releases, trust signals, and flow interfaces.

5. The core model is the executable package

The important object in TAFFISH is not just a software binary, and not just an executable inside a container. The important object is the command as something that can be installed and run with known execution semantics.

In practice, a TAFFISH executable package can bind together:

tool identity and command interface;
parameter schema and help-facing semantics;
version and release metadata;
container image, backend resolution, mounts, and workdir behavior;
platform constraints and dependency information;
smoke metadata, digest information, and Hub trust records;
an installable shell command and optional composable flow interface.

A traditional package manager mostly asks how software is installed. TAFFISH asks one step later: after the software exists, how does it become a reproducible command?

6. What TAFFISH is not

The boundaries matter. TAFFISH has flows, but it is not primarily a workflow orchestrator. Complex DAG scheduling, task caching, cloud execution, and scheduler-level resource orchestration remain the territory of workflow engines.

It is also not a claim that the whole shell universe can become perfectly reproducible. Shell is too open for that: filesystems, networks, randomness, time, permissions, hardware, and user state still matter. The claim is more precise:

TAFFISH provides reproducible execution packaging for shell-based bioinformatics commands.

7. Core use cases

For users, the first benefit is simple: tools become ordinary-looking taf-* commands, but their runtime environment and package metadata are already attached.

In everyday use, the change can be deliberately small. A naked command such as blastp ... can be replaced by a TAFFISH-provided entry such as taf-blast-v2.17.0-r1 blastp ..., or by the command name declared by that app. To a shell script, Perl script, Python subprocess call, or host-level Nextflow/Snakemake task, it is still just a shell command; the difference is that the command is now bound to TAFFISH metadata, container resolution, and release semantics.

For tool developers, TAFFISH turns a command invocation, its parameters, container runtime, release, and validation metadata into something more durable than a README example.

For workflow builders, TAFFISH commands can be assembled into lightweight taf flows, embedded in shell scripts, or called from Nextflow and Snakemake as ordinary commands with stronger execution semantics. If the surrounding workflow task itself runs inside another container, the TAFFISH command must be available at that execution boundary; TAFFISH does not automatically enter an unrelated closed container.

8. TAFFISH-HUB makes executable packages discoverable

TAFFISH-HUB is where these executable packages become discoverable. App repositories provide structured metadata, versioned releases, container image references, dependencies, platform constraints, smoke metadata, upstream source metadata, and trust signals.

The generated index is consumed by local taf commands. Users update the index, inspect package metadata, install apps, and run commands on their own machines or servers. The current Hub is intentionally static and GitHub-based. That is not the only possible future, but it keeps the publishing path transparent and auditable today.

9. In one sentence

A compact way to describe TAFFISH is this: it is a command-level reproducibility layer for bioinformatics tools and lightweight workflows.

TAFFISH brings reproducibility back to the shell commands bioinformaticians already use.

Its core is not replacement, but completion. It gives tools, flows, Hub metadata, security checks, AI-facing inspection, and future ecosystem work a more stable command layer to stand on.

1. 生物信息学常常从 Shell 命令开始

很多生物信息学分析并不是从一个完整的工作流文件开始的。更常见的情况是：先在终端里试一行命令，从旧项目里复制一段小脚本，或者从某个 README 里找到一个能跑的示例：

fastqc sample.fq.gz
bwa mem ref.fa reads.fq.gz
samtools sort aln.bam
blastp -query protein.fa -db nr
minimap2 ref.fa reads.fa

这些命令后来会散落在服务器目录、Shell 历史、论文方法、GitHub README、课题组文档、容器配方、HPC module 系统，以及人与人之间传递的经验里。它们足够完成一次分析，却不一定足够让另一个人在另一台机器、另一年之后重新稳定运行。

工作流设计下面还有一个更基础的问题：很多日常命令并没有形成稳定的执行对象，没有固定版本、没有封闭运行环境、没有清晰分发路径，也不容易提前知道真正会运行什么。

2. TAFFISH 是 shell-native 的执行与交付层

因此，TAFFISH 更适合从命令执行层来理解。它不是传统意义上的工作流系统，也不是普通包管理器、容器包装器或 Web 平台。它做的事情更窄，也更贴近日常分析：把基于 Shell 的生物信息学工具调用，变成可以安装、检查、复现和分享的可执行包。

TAFFISH 全称为 Tools And Flows Framework Intensify SHell。

TAFFISH 是一个面向生物信息学命令行工具与轻量流程的 shell-native 可复现执行交付层。

换句话说，TAFFISH 不是要把用户从 Shell 里拉出去，而是让已经在 Shell 里使用的命令带上足够的结构，使别人也能更可靠地复现它。

3. 为什么 TAFFISH 贴近 Shell

Shell 仍然是许多生物信息学计算的地面。即使一个项目使用 Nextflow、Snakemake、Galaxy、CWL、Python、R 或 HPC 调度系统，很多任务最终也会落到命令行工具调用上。

“封装命令行工具”本身并不是新事情。CWL、Boutiques、Galaxy wrappers、Snakemake wrappers、nf-core modules、shpc 和 BioContainers 都从不同方向处理过这个问题。TAFFISH 选择的是另一条路：把 Shell 保留为第一工作现场。

TAFFISH 不是先把命令带到另一个工作界面里，而是把可复现性带回 Shell 命令本身。

一个 TAFFISH 命令看起来仍然像普通 Shell 命令：

taf update
taf install samtools
taf info samtools
taf-samtools samtools view --help

表面上它仍然很熟悉。变化发生在命令背后：版本、容器、参数、运行后端、元数据和 Hub 分发记录都被绑定了进去。

4. TAFFISH 从更底层补全现有系统

下面的比较不是排名。不同系统解决的是可复现计算中的不同部分。TAFFISH 想补的是更底层的一块：那些最终真正运行起来的命令。

系统或技术	主要层级	与 TAFFISH 的关系
Nextflow / Snakemake	工作流编排	TAFFISH 命令可以作为可复现命令积木嵌入工作流任务。
Galaxy	Web 平台	Galaxy 把工具带入 Web 平台；TAFFISH 把可复现性带回 Shell 命令。
Conda / Bioconda	软件安装	Bioconda 解决软件如何安装；TAFFISH 解决软件如何作为命令被可复现地执行。
Docker / Podman / Apptainer	运行环境	容器提供运行闭包；TAFFISH 将这个闭包绑定到命令接口和 Hub 元数据。
CWL / WDL / Boutiques	描述标准	这些系统描述可移植工具；TAFFISH 更强调 shell-native 使用和可执行包交付。
shpc / BioContainers	容器化命令暴露	TAFFISH 暴露命令，同时绑定参数、元数据、release、可信信号和 flow 接口。

5. 核心模型是可执行包

在 TAFFISH 里，重要对象不只是软件二进制，也不只是容器里的某个可执行文件。真正重要的是“命令”本身：它如何被安装、如何运行，以及运行时有哪些已知语义。

具体来说，一个 TAFFISH 可执行包可以绑定以下内容：

工具身份和命令接口；
参数模式和面向帮助信息的语义；
版本与 release 元数据；
容器镜像、运行后端解析、挂载和工作目录行为；
平台约束和依赖信息；
smoke 元数据、digest 信息和 Hub 可信记录；
可安装的 Shell 命令，以及可选的可组合 flow 接口。

传统包管理器主要问“软件如何安装”。TAFFISH 多问一步：软件已经存在之后，它怎样成为一个可复现命令？

6. TAFFISH 不是什么

边界同样重要。TAFFISH 可以有 flow，但它并不首先是工作流编排器。复杂 DAG 调度、任务缓存、云端执行和调度器级资源编排，仍然是工作流引擎的主战场。

TAFFISH 也不是宣称整个 Shell 世界都能被绝对复现。Shell 太开放，文件系统、网络、随机数、时间、权限、硬件和用户状态都会影响执行。TAFFISH 的目标更具体：

TAFFISH 为基于 Shell 的生物信息学命令提供可复现执行封装。

7. 核心用途

对普通用户来说，最直接的变化很简单：工具变成看起来像普通命令的 taf-* 命令，但运行环境和包元数据已经跟着命令一起准备好了。

在日常使用中，这个变化可以非常小。原来脚本里的裸命令 blastp ...，可以替换成 TAFFISH 提供的入口，例如 taf-blast-v2.17.0-r1 blastp ...，或者替换成该 app 元数据中声明的命令名。对于 Shell 脚本、Perl 脚本、Python subprocess 调用，甚至宿主环境中的 Nextflow/Snakemake task 来说，它仍然只是一个 Shell 命令；区别在于，这个命令已经绑定了 TAFFISH 元数据、容器解析和 release 语义。

对工具开发者来说，TAFFISH 把工具调用、参数、容器运行环境、release 和验证元数据变成比 README 示例更稳定的可分发执行对象。

对流程构建者来说，TAFFISH 命令可以组合成轻量 taf flow，也可以嵌入 Shell 脚本，或作为更强语义的普通命令被 Nextflow 和 Snakemake 调用。如果外层 workflow task 本身运行在另一个容器内部，那么 TAFFISH 命令也必须在那个执行边界内可用；TAFFISH 不会自动进入一个无关的封闭容器。

8. TAFFISH-HUB 让可执行包可发现

TAFFISH-HUB 负责让这些可执行包变得可发现。App 仓库提供结构化元数据、版本化 release、容器镜像引用、依赖关系、平台约束、smoke 元数据、上游来源元数据和可信信号。

生成的索引由本地 taf 命令消费。用户更新索引、查看包元数据、安装 app，并在自己的电脑或服务器上运行命令。当前 Hub 有意保持静态并基于 GitHub 实现；这不一定是唯一未来，但它让今天的发布路径更加透明和可审计。

9. 一句话定位

用一句话概括，TAFFISH 是一个面向生物信息学工具与轻量流程的命令级可复现执行层。

TAFFISH 将可复现性带回生物信息学用户已经在使用的 Shell 命令。

它的核心不是替代，而是补全。它为工具、flow、Hub 元数据、安全检查、面向 AI 的只读检查，以及未来生态工作提供一个更稳定的命令层。