TAFFISH History

1. 2022-2023: Prelude and origin from biology to reusable work

TAFFISH did not begin directly as a programming language or a Hub. Its 2022 stage is better understood as a prelude: the starting point was biological research, especially attempts to connect different omics layers in a meaningful analysis. During this work, a practical problem became visible: many steps consumed time and energy even though they were repetitive, transferable, and suitable for preservation.

That realization led to a more explicit tool idea in 2023. The first multi-omics tool plan, dated 2023-06-21, described multi-omics analysis as a field with many data types, scattered tools, inconsistent formats, and repeated manual integration work. The target was not yet a package manager or a DSL. It was a framework for preserving reusable procedures, bringing existing tools together, cleaning intermediate data, and supporting downstream biological interpretation.

A second milestone came with the 2023-08-27 three-step plan. The direction became more restrained: start from a single-omics system, especially Hi-C or 3D genome analysis, then generalize toward multi-omics workflows. This plan already emphasized modularity: TAD and loop callers could be embedded, while the new work would focus on input, output, and intermediate format conversion.

This stage was still not TAFFISH, but several ideas that later survived were already present: tool integration, workflow reuse, modularity, reducing repeated work, and turning command-line procedures into reusable artifacts.

2. 2024: Workflow and tool hub prototypes and the environment problem

The next obstacle was not whether a workflow could be written. It was whether the same workflow could survive another server, another operating system, or another software version. Around 2024 this pushed the project toward two early prototypes: one for workflow description and one for a tool hub.

The workflow prototype focused on describing a single bioinformatics workflow. The tool hub prototype began to carry the meaning of a tool library and platform. By 2024, the platform direction had framed the core pain points clearly: installation was hard, command-line interfaces were inconsistent, tools were hard to combine, environments could conflict, and workflows were difficult to pass from one person to another.

This was the decisive turn. The project was no longer only about a multi-omics analysis method. It became a system for managing tools, workflows, environments, and standards.

3. 2024-10 to 2025-03: TAFFISH as a Shell-oriented DSL and Hub

From late 2024 to early 2025, these early workflow and hub prototypes evolved into TAFFISH. The name TAFFISH was settled in October 2024 during patent preparation, and by January 2025 it was used under the title “TAFFISH (2024)”. The project was then placed in the context of difficult installation, fragile environments, workflow construction, portability, reproducibility, collaboration, and inheritance.

TAFFISH was defined as Tools And Flows Framework Intensify SHell: a domain-specific language based on Shell for command-line workflows. By March 2025, the system had taken shape as two coordinated parts: the TAFFISH language and the TAFFISH-HUB. The language described tools and flows. The Hub stored taf tools and taf flows so that users could install and run them in a manner reminiscent of apt, yum, conda, or pip.

The core syntax ideas were already visible: pass parameters with command-line options, bind placeholders such as ::name::, use tags such as <container:...> for runtime environments, and expose apps as taf-xxx commands.

4. 2025-03: Working prototype, package manager, and early apps

By early 2025, TAFFISH had a working interpreter and a taf package manager. It supported Debian 12, Ubuntu 18.04.1 or newer, and Apple Silicon Macs. The early ecosystem already contained more than 20 taf apps, including bedtools, samtools, BLAST, bowtie2, fastp, juicer, STAR, subread, trim_galore, and example flows for RNA-seq and gene family search.

This proved the central claim: command-line bioinformatics tools could be wrapped as taf scripts, tied to containerized runtime environments, installed by a package manager, and invoked as normal commands.

It was still a prototype. Some early tags were too heavy, several features were broader than the core problem, and GUI experiments arrived before the CLI semantics were fully stable. Later design work deliberately reduced the system back toward a smaller center: keep Shell composability, add only the tags needed for environment and parameter semantics, and let the Hub grow from that stable core.

5. 2025: Ecosystem growth, GitHub migration, and preprint

In 2025, TAFFISH moved from a prototype into an expanding ecosystem. Development continued around the interpreter and package manager, broader operating-system support, and a growing app collection covering genomics, proteomics, transcriptomics, 3D genomics, base images, and selected GUI tools.

The Hub also changed shape. What began as a private deployment gradually moved toward GitHub repositories and static indexes. This made the system easier to publish, inspect, mirror, and reuse without maintaining a dedicated backend for every stage of the project.

The preprint TAFFISH: A lightweight, modular, and containerized workflow framework for reproducible bioinformatics analyses marked the project as a coherent research output rather than only a local engineering tool.

6. 2026: Common Lisp, LispWorks, and a full-system refactor

TAFFISH has been built in Common Lisp from its early stage. A Rust rewrite was considered for distribution and systems engineering, but the project returned to Common Lisp because the language fits TAFFISH's core work: DSL design, interpreter structure, compiler passes, interactive development, and rapid refactoring.

This refactor also introduced LispWorks as the Linux delivery route. This solved a major portability problem on Linux and removed the need to maintain multiple Linux-specific installation packages as in the old version. macOS packaging has not yet been carried into this new route, but remains a later adaptation target.

The 2026 refactor was not a narrow cleanup. It was a full-system rebuild that reached from the compiler core to the app project model, taf-cli, package metadata, Hub indexes, and the GitHub-based publishing structure. The codebase was separated into clearer layers: taffish-core for lexing, parsing, parameter binding, emitters, and compilation; taffish-cli for the terminal entry point; taf-core and taf-cli for project management, installation, build, and publish flows; and taffish-hub for the app and flow ecosystem.

With Linux portability separated from app logic, the project could focus more directly on the language, package manager, app structure, and Hub index rather than on parallel installer maintenance.

A key change was the relationship between taf apps and Shell. Earlier recursive taf-app calls could create compile-time and runtime conflicts. The new model uses shell wrappers and delayed compilation, so commands such as taf-test can participate in ordinary pipelines:

echo 123 | taf-test cat

The <taffish> tag and [[taf: ...]] syntax remain meaningful because they let a flow explicitly declare TAFFISH app nodes. Their purpose is not merely to run commands, but to compile related taf apps before a long flow starts, catching missing apps, parameter problems, or compilation errors earlier.

7. 2026-05: Positioning snapshot

By this point, TAFFISH had become a reproducible command-line ecosystem for bioinformatics tools and workflows. Shell kept the barrier to entry low and preserved composition. Containers bound the runtime environment. taf scripts described tools and flows. The taf package manager installed and distributed apps. TAFFISH-HUB made those apps discoverable through a static index.

The boundary had also become clearer. TAFFISH was not simply a replacement for Nextflow, Snakemake, or Galaxy, and it was not only a container launcher. It focused on a smaller command-level problem: making command-line tools exist in a more unified, reproducible, and inheritable form across systems, users, and projects.

2022 Biology-driven research prelude: multi-omics questions revealed repetitive computational work worth preserving.

2023 First multi-omics tool plan, then the three-step plan, single-omics modules, database thinking, and modular integration.

2024 Early workflow and tool hub prototypes focused on tools, workflows, environments, and standards; the TAFFISH name was settled in October.

2025 TAFFISH became public-facing through its name, DSL, package manager, Hub migration, ecosystem growth, and preprint.

2026 Full-system refactor from compiler core to Hub indexes, LispWorks-based Linux portability, delayed compilation, pipeline-friendly apps, and standard taf project structure.

This page was written on 2026-05-08. For future updates, keep the dated sections as historical records, add new milestones before this positioning-snapshot section, and then rewrite this final section to reflect the updated stable view.

1. 2022-2023：前传与起点，从生物学问题到可传承工作

TAFFISH 并不是一开始就以现在的名字和形态出现的。2022 年更适合作为前传来理解：最初的出发点不是编程语言或 Hub，而是围绕生物学研究中的多组学联合分析，尝试把不同组学层面的信息联系起来。

在这一研究方向中，一个实际问题逐渐浮现出来：多组学分析里存在大量耗时的计算工作，它们并不是每次都应该重新摸索的内容，而是可以被保存、复用和传承的流程、参数与操作知识。这一问题推动方向从单次分析任务，转向分析流程与工具化结构的沉淀。

到 2023-06-21，多组学工具化构想开始成形。当时的问题背景已经很清楚：多组学分析面对数据类型多、工具分散、格式混乱、流程不统一等现实困难，因此需要对旧工具进行整合，并在此基础上开发新的分析工具。这个构想把分析过程拆成数据预处理、利用已有工具或自研工具处理数据、结果清理整合，以及后续生物学分析等步骤。

2023-08-27 的“三步走”计划进一步把方向收缩为：先从单组学工具入手，尤其是从较熟悉的 Hi-C 或三维基因组分析入手，再逐步扩展到其他组学，最终形成多组学联合分析流程。这一计划也明确提出，整体工具应当是模块化的，已有的 TAD、loop 等分析工具可以嵌入进来，主要工作在于处理中间格式转换和输入输出调整。

这一阶段还不是 TAFFISH，但几个后来持续保留的核心思想已经出现：工具整合、流程复用、模块化、减少重复操作，以及把已有命令行工具纳入统一流程。

2. 2024：流程与工具 Hub 原型，从流程到环境

后续实践中，早期工具开始面对一个更直接的问题：流程本身可以写出来，但运行环境很难统一。不同服务器、不同系统、不同软件版本之间的差异，会让同一套流程在迁移时出现问题。

2024 年前后，这一方向逐渐发展出两个早期原型：一个偏向对单个流程文件的描述，另一个开始具有工具库和平台管理的含义。平台方向也逐渐聚焦到生信分析中的安装困难、使用困难、组合困难、缺乏管理、缺乏传承和缺乏标准等问题。

这一阶段，项目的重点从“写一个多组学分析工具”转向“如何管理工具、流程和运行环境”。这也是 TAFFISH 后来走向容器化、Hub 化和 DSL 化的直接基础。

3. 2024-10 至 2025-03：TAFFISH，Shell 风格 DSL 与 Hub

2024 年下半年到 2025 年初，项目逐渐从早期流程与 Hub 原型转向 TAFFISH。TAFFISH 名称在 2024 年 10 月为专利提交准备而确定，到 2025-01-15，“TAFFISH (2024)” 这一标题让这个名称进入公开语境。TAFFISH 也被放在软件安装难、环境配置难、工作流搭建难、可移植和复现难、团队协作和传承困难等问题背景下进行说明。

在这一阶段，TAFFISH 被定义为 Tools And Flows Framework Intensify SHell，也就是一种面向命令行工作流的、基于 Shell 的领域专用语言。到 2025 年 3 月，TAFFISH 已经包含两个主要部分：一是 TAFFISH 领域专用语言，二是 TAFFISH-HUB 平台。前者负责以 Shell 为基础描述命令行工作流，后者负责储存由 taf 脚本撰写的 taf tool / taf flow，并像 apt、yum、conda 或 pip 一样进行安装和使用。

当时的 TAFFISH 已经具备几个重要设计：命令行中用 --xxx 传递参数，脚本中用 ::xxx:: 进行变量替换，通过 <container:...> 这类 tag 调用容器化环境，并通过 taf-xxx 的形式让用户像使用普通命令一样使用 taf 软件。

4. 2025-03：可运行原型、包管理与早期工具生态

到 2025 年初，TAFFISH 已经形成较完整的可运行原型。当时已经完成 taffish 解释器和 taf 包管理系统，并适配 Debian 12、Ubuntu 18.04.1 以上版本和 Mac M 系列芯片环境。与此同时，生态中已经有 20 多个 taf apps，包括 bedtools、samtools、blast、bowtie2、fastp、juicer、STAR、subread、trim_galore 等工具，以及 RNA-seq 和 gene-family-search 等流程示例。

这一阶段的 TAFFISH 已经可以实际使用。它证明了一个基本方向：命令行工具可以通过 taf 脚本和容器环境封装起来，再通过 taf 包管理系统统一安装和调用。

但这一阶段的实现仍然带有明显的原型特征。tag 设计较复杂，部分功能较重，早期 GUI 尝试也提前进入。后来的开发中，这些设计被逐步收缩。TAFFISH 的核心逐渐回到更简单的方向：保留 Shell 的使用方式，用少量 tag 增加环境和参数语义，而不是把系统做成一个过于庞杂的平台。

5. 2025：生态扩展、GitHub 迁移与预印本

2025 年，TAFFISH 进入生态扩展阶段。系统继续维护 taffish 解释器和 taf 包管理系统，支持 Debian 12、Ubuntu、Mac 等环境；taf apps 从早期工具集合扩展为覆盖基因组学、蛋白组学、转录组学、三维基因组学和基础镜像等方向的更大工具生态。

这一阶段还完成了若干重要工作：官网和 GitHub 组织建设，TAFFISH-HUB 从私有云部署逐渐迁移到 GitHub 仓库，基础镜像层开始补齐，部分 GUI 软件如 PyMOL 被适配，CLI 功能和底层逻辑继续优化。

2025 年，TAFFISH 的论文预印本也完成并上线，题为 TAFFISH: A lightweight, modular, and containerized workflow framework for reproducible bioinformatics analyses。从这一刻开始，TAFFISH 不只是一个本地工程工具，也成为可以被论文和社区共同指向的研究成果。

6. 2026：Common Lisp、LispWorks 与全系统重构

TAFFISH 从早期开始就使用 Common Lisp 开发。开发过程中曾考虑过使用 Rust 重写核心系统，以获得更好的系统级分发能力和工程生态。但经过尝试和权衡后，项目重新回到 Common Lisp 路线。

这次重构还引入了 LispWorks 作为 Linux 端交付路线，从而彻底解决 Linux 端可移植性问题，不再需要像旧版本那样维护多种 Linux 安装包。macOS 端暂时还没有纳入这条新的交付路线，但后续仍可继续适配。

这一选择使 TAFFISH 保持了较高的交互式开发效率，也保留了 Lisp 在 DSL、解释器、编译器结构和快速重构方面的优势。2026 年的重构不是局部整理，而是从编译器核心、app 项目模型、taf-cli、包元数据、Hub 索引，到基于 GitHub 的发布结构都进行了全系统重建。TAFFISH 被拆分为更清晰的几个层次：taffish-core 负责词法分析、解析、参数绑定、emitter 和编译逻辑；taffish-cli 负责终端入口；taf-core / taf-cli 负责项目管理、安装、构建和发布；taffish-hub 负责工具与流程生态。

Linux 端可移植性从 app 逻辑中分离出来以后，项目可以把更多注意力放回语言语义、包管理、app 项目结构和 Hub 索引本身，而不是并行维护多套安装包。

2026 年新版的一个关键变化，是重新处理 taf-app 与 Shell 的关系。早期版本中，taf-app 的递归调用可能造成编译和运行层面的冲突。新版中，taf-app 通过 shell wrapper 延迟编译和执行，使 taf-xxx 可以像普通命令一样参与管道。

echo 123 | taf-test cat

同时，<taffish> tag 和 [[taf: ...]] 语法被保留下来，用于显式声明 TAFFISH app 节点。它的主要意义不是让命令“能够运行”，而是在流程执行前统一编译相关 taf-app，提前发现缺失、参数或编译错误，避免长流程运行到后段才失败。

这一阶段还引入了标准化项目结构，例如 taffish.toml、src/main.taf、docs、docker、target 和 README.md。taf new、taf check、taf build、taf publish 等命令将围绕这一结构逐步形成。

7. 2026-05：定位快照

到这一阶段，TAFFISH 已经从一个多组学分析工具构想，发展为一套面向命令行工具和工作流的可复现系统。

它的核心定位可以概括为：用 Shell 保留命令行的低门槛和组合能力；用容器限定运行环境；用 taf 脚本描述工具与流程；用 taf 包管理系统安装和分发 app；用 TAFFISH-HUB 建立可复现工具生态。

到这里，TAFFISH 的边界也更清楚了：它不是简单替代 Nextflow、Snakemake 或 Galaxy，也不是只做一个容器启动器。它更关注一个更小的命令层问题：如何让命令行工具在不同系统、不同用户和不同项目之间，以更统一、更可复现、更容易传承的方式存在。

2022 生物学研究前传：多组学问题让重复、耗时但可传承的计算工作逐渐显现。

2023 提出多组学工具化构想，并形成“三步走”计划，强调单组学工具、数据库和模块化工具整合。

2024 早期流程与工具 Hub 原型阶段，开始关注流程、工具库和环境问题；TAFFISH 名称在 10 月确定。

2025 TAFFISH 名称进入公开语境，系统结构继续形成，推进解释器、taf 包管理、Hub 迁移、app 扩展和 bioRxiv 预印本。

2026 从编译器核心到 Hub 索引推进全系统重构，通过 LispWorks 解决 Linux 端可移植性，并支持 taf-app 管道、延迟编译、标准项目结构和新的 taf-cli 体系。

本页编写时间为 2026-05-08。未来维护时，建议保留已有时间段作为历史记录，在本节之前继续追加新的时间节点和事件，然后重写最后的“定位快照”。