Personal essay

My TAFFISH Story

TAFFISH did not begin as a platform. It began with biological questions, repeated computational work, and a simple but persistent wish to preserve experience instead of losing it in scripts and memory.

1. From genome and disease questions to reusable work

TAFFISH was not first imagined as a software engineering project, and it was not designed as a workflow platform from the beginning. Its origin was a biological question.

My training moved across several borders: physics as an undergraduate, bioinformatics during my master's study, and later electronic information and system design during doctoral work. That path kept me at an intersection. On one side were biological questions. On the other were data, tools, workflows, and computing environments.

The earliest question was not about tools. It was about the relationship between genome information and disease. If the genome state of a cell or sample is complex and multi-layered, could multiple omics layers help recover a more realistic picture of it?

At that stage, I roughly understood genome information through several layers: one-dimensional sequence information such as WGS, SNP, and CNV; three-dimensional spatial information such as Hi-C or 3C-derived technologies; and epigenetic information such as DNA methylation and histone modifications. These layers are not isolated. Sequence changes may affect three-dimensional structure, three-dimensional structure may affect gene expression, and epigenetic marks participate in regulatory processes.

That was a natural biological research route. But once the work began, another problem quickly became visible: too much time was spent on data cleaning, tool installation, parameter adjustment, format conversion, and repeating processes that should have been preserved after being solved once.

That was the first root of TAFFISH.

2. From biological questions to tool problems

In bioinformatics, researchers often care most about biological conclusions, but much of the actual effort goes into the computational machinery around those conclusions. A project may combine Hi-C, RNA-seq, variant data, and annotations. Each data type has its own format. Each step has its own tool. Each tool has its own dependencies, versions, and parameters.

A workflow may run on one server and fail on another. One person may know how to run it, while the next person has to rediscover the paths, environments, and scripts. Gradually, I realized that this work has inheritance value. If a research process accumulates experience, that experience should not remain only in personal scripts or memory. It should be saved, reused, and passed on.

So the first idea was simple: build a multi-omics analysis tool, integrate existing tools, collect common processing steps, and make them easier for myself and nearby users to run. At the same time, I was learning Common Lisp and writing small data-processing programs, for example to search nearby information around selected genes or loci across multi-omics results.

This was not TAFFISH yet, but its shadow was already there: command-line work, data processing, tool integration, packaged execution, and a lower barrier for reuse.

3. Looking at Nextflow and Snakemake

I had also looked at existing workflow systems, especially Nextflow and Snakemake. Looking back, I see them as mature, powerful, and important tools. They solve many large workflow management problems in bioinformatics.

At that time, I was still learning Linux, Shell, bioinformatics tools, server environments, and programming while starting from biological questions. I could feel that Nextflow and Snakemake were powerful, but it was still difficult for me to make those complete systems part of my daily analysis practice.

So I chose the simplest route available to me at that stage: write Shell, call local tools, and use Common Lisp for helper programs. Shell was not perfect, and those systems were not wrong. Shell was the tool I could use immediately. It was direct, close to the command-line tools themselves, and it ran.

Later, this Shell-plus-Lisp style revealed its own limits. Scripts record processes but do not manage environments well. Small tools simplify operations but do not unify installation. Running locally does not mean running elsewhere. Being understandable to one person does not mean being maintainable by a team. The problem returned in a new form.

4. From single tools to modular thinking

The original multi-omics tool idea was too broad to finish in one step. I began to break it down: start with a single omics field, especially familiar Hi-C or 3D genome analysis, and then expand toward other omics layers.

This was important because it showed that a sustainable system cannot be completed by one large design. It has to be modular. Existing tools should be embedded. Intermediate formats should be converted. Inputs and outputs should be adjusted. The system should not bind itself to one fixed tool, but should allow replacement, composition, and extension.

This direction has stayed with TAFFISH ever since. The goal is not to replace every existing tool, but to make existing command-line tools easier to package, reuse, distribute, and inherit.

5. BioFlow and BioHub: early forms of workflows and a tool library

As tool integration continued, BioFlow and BioHub appeared. BioFlow focused on workflows. BioHub started to mean a tool library and platform. The problems became clearer: installation was hard, usage was hard, composition was hard, management was weak, inheritance was fragile, and standards were missing.

At that point, the question was no longer only how to write a workflow. It was how to manage many tools and many workflows across environments and users. This is why TAFFISH-HUB later became necessary.

6. TAFFISH as a lightweight enhancement over Shell

BioFlow and BioHub gradually evolved into TAFFISH. One important realization was that “bio” was the current application scene, not the whole core of the project.

TAFFISH means Tools And Flows Framework Intensify SHell. The most important word in that name is Shell. TAFFISH should not leave Shell behind completely. In bioinformatics, many tools are already command-line tools. If reproducibility requires users to learn a heavy new language, the system creates another burden.

The direction became: keep the shape of Shell, and add semantics only where needed.

ARGS
  <(--/-i)input>
RUN
<container:ghcr.io/taffish/example:1.0.0>
  tool --input ::input::

The command still looks close to Shell. ::input:: performs parameter substitution. <container:...> declares the runtime environment. TAFFISH can then handle parameters, containers, paths, and compilation behind the scenes.

TAFFISH is not meant to replace Shell. It is meant to strengthen Shell where Shell is weak: environment boundaries, parameter structure, app packaging, and distribution.

7. Old TAFFISH: usable prototype and engineering boundaries

The old TAFFISH soon became runnable. It could compile taf scripts into shell, run tools through containers, and expose wrapped tools as taf-xxx commands so users could call them like ordinary commands.

That proved the original idea was feasible. The old version also exposed the engineering boundaries of the prototype stage. Early tag design was relatively complex, and the feature set tried to cover too many cases at once. Later, many of those functions were deliberately folded back into a smaller core, so the language semantics and runtime model could become easier to maintain and easier to understand.

GUI was an exploration that arrived a little too early. I thought about GUI very early, because many bioinformatics users are not familiar with the command line. If a graphical interface could select tools, fill parameters, and organize workflows, TAFFISH would be easier to use. But later I realized that before the CLI semantics and project structure were stable, a GUI would amplify the uncertainty underneath it.

So I temporarily put GUI aside and returned to CLI and core language design. Later, this proved necessary. Only when the CLI and core become stable can GUI be built on the right foundation.

8. Common Lisp, Rust, LispWorks, and engineering delivery

TAFFISH has mostly been written in Common Lisp. That is not the most common choice today, but it fits the problem. TAFFISH is not a regular business application. It is a small language system: it needs lexing, parsing, parameter binding, semantic expansion, and shell code generation.

Common Lisp's interactive development model, symbolic processing, and abstraction ability made it possible to test syntax ideas quickly and keep refactoring as the system changed. For TAFFISH, Common Lisp is not only an implementation language. It also shapes the way the system grows.

I also seriously considered rewriting TAFFISH in Rust. Rust has excellent engineering qualities, convenient binary distribution, and a type system suitable for long-term maintenance. That attempt was not wasted. Even though Rust did not become the implementation language, it influenced the newer TAFFISH structure.

taf new
taf check
taf build
taf run
taf publish

These commands carry a project-management mindset similar to Cargo. taffish.toml also acts as a project description file. The newer separation between core and cli layers, the project structure, the error boundaries, and the command behavior all benefited from that engineering perspective.

Eventually, I returned to Common Lisp. That was not a return to the starting point, but a confirmed choice after comparison. The core problem of TAFFISH is language design, semantic expansion, script generation, and interactive evolution. Common Lisp remains a strong fit for that.

LispWorks later became important for another reason: delivery. Early TAFFISH could rely on SBCL-generated executables for prototypes and personal use, but a tool intended for broader use has to be packaged, installed, updated, and run reliably in other users' environments. LispWorks gave TAFFISH a way to keep Common Lisp while solving Linux portability more seriously. The macOS side is not yet in that new route, but it can be adapted later.

LispWorks also made me think more seriously about future delivery, deployment, and possible graphical interfaces while still staying with Common Lisp. TAFFISH is currently centered on command-line tools, but if GUI work returns later, LispWorks may become an important candidate route.

From this perspective, TAFFISH has also been a process of re-understanding Common Lisp. At first, I used Lisp because it was expressive and good for rapid development. Later, I kept using Lisp because it fit a DSL and compiler-like system. Then, when I began considering LispWorks, I realized that Common Lisp could also enter a more formal software delivery stage.

TAFFISH eventually kept Lisp's flexibility while absorbing the engineering structure that Rust had taught me. It did not fully follow one language paradigm. It found its own shape through repeated trials. For me, TAFFISH is also a proof that Common Lisp can still be used to build new, real, usable scientific software systems.

9. New TAFFISH: from prototype to engineering code

The real turning point was the new refactor. This time, the goal was not only to prove that features could run. It was to reorganize the system boundaries and engineering structure:

lexer     read taf source
parser    parse ARGS, RUN, and tags
binder    bind parameters and context
emitter   compile each tag into shell fragments
compiler  generate final shell
cli       receive terminal input and call the core

After that, TAFFISH began to look more like a real compiler than a temporary scripting tool. <shell>, <container>, <taf-app>, and <taffish> are no longer scattered special cases. They are handled through emitters, making future tags and backends easier to add.

The newer design also supports scoped parameters such as @xxx:, which makes staged parameters, defaults, and context binding more natural in complex flows. It also prepares a foundation for a future CLI-GUI integrated route.

After this refactor, TAFFISH had a clearer engineering foundation.

10. Pipe support: bringing container tools back to Shell

One of the most important design confirmations in the new version is pipe support. Containerized tools can easily become boxes: they solve environment problems but lose the natural connection to Shell. TAFFISH should do the opposite. It should bring containerized tools back into ordinary Shell composition.

echo 123 | taf-test cat

When this worked, it confirmed an important design judgment: TAFFISH had not moved away from Shell. It had strengthened Shell. A taf app can be a stable, reproducible, distributable, and composable command-line part. Stable parts make stable workflows possible.

TAFFISH is not trying to replace Shell. TAFFISH strengthens Shell. It is also not trying to replace Nextflow, Snakemake, or Galaxy. TAFFISH focuses on another entry point: how each command-line tool can be packaged, installed, reproduced, and composed. When the parts are stable, workflows can become stable. When tools can be passed on, workflows can be passed on. When environments can be reproduced, results have a better chance of being reproduced too.

11. How my thinking shaped TAFFISH

TAFFISH has also been shaped by the way I think about systems. I have long been interested in Daoist thought, Zhuangzi, and the I Ching. They do not directly guide code implementation, but they do affect how I think about boundaries, position, growth, and change.

A good system is not necessarily the system with the most features. It is a system whose parts know what they are responsible for, whose boundaries are clear, and whose structure can keep growing.

Shell keeps composition.
Containers bound runtime environments.
taf-app packages tools.
taf-flow organizes workflows.
TAFFISH-HUB distributes tools and flows.
taf-cli manages creation, build, installation, and publish.

This is why some early complicated tags were removed, why GUI work was paused, and why an over-large design was avoided. TAFFISH does not need to swallow everything. It needs tools, containers, scripts, the Hub, and the CLI to each stay in the right place.

If TAFFISH has changed constantly over the years, what has not changed is the problem it wants to solve: making command-line tools and workflows in research easier to reproduce, migrate, distribute, and inherit. What has changed is the way I have tried to answer that problem.

12. From a tool to an ecosystem

TAFFISH is now more than a .taf compiler. It is becoming a layered ecosystem:

taffish-core      compiler core
taffish-cli       terminal entry point
taf-core / taf-cli project management, install, build, publish
taffish-hub       tool and flow ecosystem

In the future, users should be able to create projects with taf new, check project structure with taf check, build local commands with taf build, publish tools or flows with taf publish, and install apps from the Hub with taf install.

The road has included several turns. The project changed names, rejected designs, postponed attractive features, and went through code cleanup and heavy refactoring. At times, the pace of building such a system mostly alone felt slow, but those stages also made the system's boundaries clearer.

But every time a real tool runs, every time a cross-environment execution succeeds, and every time a messy process becomes a callable taf-xxx command, the answer becomes clearer. TAFFISH is not only a doctoral project to me. It is a road from biological questions to reusable scientific software.

TAFFISH began from a practical need: while doing biological research and multi-omics analysis, I did not want to repeatedly spend time on work that could be preserved and inherited. Later it became a larger question: can command-line tools in research be better packaged, reproduced, distributed, and passed on?

TAFFISH is still evolving. But it is no longer only an idea. It is now a system that can run, be tested, and be extended. It grew from biological thinking about genome and disease, passed through multi-omics tool ideas, BioFlow, and BioHub, and arrived at today's TAFFISH.

It also records my own changing understanding of research, engineering, languages, tools, and long-term work.

That road is still long. But now it is no longer only an idea. It has reached the ground underfoot.