TAFFISH-USER-MANUAL

The full name of TAFFISH is "Tools And Flows Framework Intensify SHell". The term "FISH" also embodies the philosophy "Teach a man to fish".

Doc's Language:

1 Introduction

1.1 What is "TAFFISH"?

From a technical perspective, TAFFISH is essentially a cross-device and cross-platform "App Store" tailored both for users (who simply use the tools) and developers (who create their own tools and workflows). It mainly consists of three parts:

So, how do wu use TAFFISH? It’s actually quite simple:

  1. Install container management software (Apptainer/Podman/Docker);
    # Choose the installation method according to your operating system
    sudo apt update
    # Select the appropriate container management software according to your needs
    sudo apt install apptainer
    
  2. Install TAFFISH;

    More information: [2. Install]

  3. Use "taf" to install the "taf app" you want to use;
    taf install xxx
    
  4. The "taf app" which you just installed can be used just like a normal command line tool:
    taf-xxx -h
    taf-xxx ...
    

    More information about installing and using: [2. Install] & [3. Quick Start(an example)]

  5. (For Developer) You can use any editor (vim/emacs/...) to make your "taf app" and run & manage them by "taffish" and "taf":
    vim xxx.taf
    taffish xxx.taf ...
    taf ...
    

    More Information: TAFFISH-DEVELOPMENT-MANUAL

Yse, our "taffish" can be seen as a "software package management system" just like "apt"/"nmp"/"homebrew"/"conda"/... But "taffish": No environment change/rely & Reproducible & Portable

1.2 What "TAFFISH" good at?

At present, the main field of "TAFFISH" is bioinformatics, because this field requires frequent use of command-line tools and makeing/use of command-line workflows, and often involves cross-device, multi-device computation and cooperation and other related issues, which is what "TAFFISH" excels in:

1.3 Is "TAFFISH" the right fit for you?

"TAFFISH" now is just made for users:

1.4 TAFFISH's Philosophy & Logic

2. Install

2.1 (optional but recommended) Install container management software

A lot of the software in our "taf-hub/taf-app-store" is implemented on top of container management software, so it is highly recommended to install the relevant container management software first! It is also recommended to install Apptainer (formerly known as Singularity, which is suitable for users with high-performance computing needs, but only supports Linux), Podman (a non-rooted version of Docker for multi-user/non-root users), and Docker (for individual Windows/Mac users with root privileges).

We recommend going to the corresponding official website to install the corresponding container management software by yourself:

Also, it's a good idea to make sure you have "curl" included in your computer, so that you can use the "curl" in the command line to implement the automatic installation in the next step! ("taffish" also need curl to work)

Depending on your system, you can choose to install curl yourself using package management software such as apt or brew.

2.2 Install "TAFFISH"

At present, our Taffish is only compatible with the following operating systems (if your device is not included, then we are very sorry, you can send an email to submit your device situation (operating system and hardware architecture), and we will consider adding an appropriate installation package for your device at our discretion!)

Note: If you need to install "TAFFISH" for all users on your computer, then use the root account or add the "sudo" command to install it!

sudo sh -c "..." -n

Note: During the installation process, you may also have some errors that need to update/install some related library dependencies, you can choose to install the corresponding dependencies by yourself with package management software such as apt or brew according to your system.

You can add parameters at the end to make some different automatic installation settings:

  • -n, --no-ask :: Default installation, installing/updating software but not overwriting config files, etc (all issues are skipped automatically, using default options)
  • -y, --yes :: (Use Caution) Force installation, install/update software and force overwrite of configuration files, etc. (select "yes" option for all issues)

2.3 Test whether the installation is successful

After installation, you will have two commands/executables in your system, namely "taffish" (interpreter) and "taf" (package management system), you can enter the following code to check whether the installation is successful:

taf -v; taffish -v

If the installation is successful, you should return two pieces of information with the version number, similar to the following (the version may differ from the update date):

taf      1.0.0-beta  KaiyuanHan(HermitHan)  2025-03-16
taffish  1.0.0-beta  KaiyuanHan(HermitHan)  2025-03-15

If the display matches the above format, then congratulations, the installation is successful!

You can use "taf -h" or "taffish -h" to see more details on how to use taf and taffish!
By the way, you can change help language(English/Chinese) by config file (root: /usr/local/etc/taffish/config.taffish.taf & local: ~/.config/taffish/config.taffish.taf)

2.4 (Optional) usage of the taf command

$ taf -h
taf      1.0.0-beta  KaiyuanHan(HermitHan)  2025-03-16
-----------------------------------------------
Usage:
    taf [options] [commands] ...
Options:
    -h, --help          show this help
    -v, --version       show taf's version
Commands:
    history                         <show all history of taf and taffish>
    update-taf                      <update taf and taffish>                             # Show how to install (command line order) on your computer
    search    [options] [app]       <Search for the app from the official website>       # Regular expressions are supported
                  -a, --all          ... Additional detailed descriptions of each app package are displayed
    install   [options] [app]       <Install the app>                                    # The non-root users are only installed locally
                  -f, --file [file]  ... Install the app from the local app-taf file or app-tar-gz file
                  -y, --yes          ... Use "yes" to all selections, no need any select
                  -n, --no           ... Use "yes" to all selections, no need any select
    taf-xxx                         <Use the app>                                        # You can use installed apps just by taf-[app]
                  -h, --help         ... show help of the xxx app
    uninstall [options] [app]       <Uninstall the app>                                  # The non-root user only uninstalls the local app
                  -y, --yes          ... Use "yes" to all selections, no need any select
    upgrade   [options] [app]       <Upgrade the app>                                    # Upgrade all apps if you don't give any app
                  -y, --yes          ... Use "yes" to all selections, no need any select
    clean     [options] [app]       <Clean apps' local things (Container and Image)>
                  [NULL]             ... If give nothing, it will clean something which are created during downloading
                  all                ... Delete all apps's things(the things are still controled by -a/--al)
                  -a, --all          ... If no -a/--all, it will only remove Container, and will remove Image too with -a/--all
    apps      [options]             <Displays all apps that are currently installed>
                  -g, --global       ... Displays the global public repositories
                  -l, --local        ... Showcase your personal local repository
    help/info [options] [app]       <View information about an app>                       # Regular expressions are supported
                  -g, --global       ... Match apps from a global public repository
                  -l, --local        ... Match apps from a local personal repository
    pull      [options] [app]       <Get the app from the installed app repository>
                  -g, --global       ... Get it from a global public repository
                  -l, --local        ... Get it from your local personal repository
* More Information: https://taffish.com

More Information: TAFFISH-DEVELOPMENT-MANUAL

2.5 (Optional) usage of the taffish command

$ taffish -h
taffish  1.0.0-beta  KaiyuanHan(HermitHan)  2025-03-15
-----------------------------------------------
Usage:
    taffish [options] [taf-file] [--args-name  args-value] ...

Options:
    -h, --help          show this help
    -v, --version       show taffish's version
    -t, --template      show a .taf template to help users to write their own .taf file
    -n, --dry-run       just show shell orders which are translated by .taf file
    -f, --force        [Carefully] ignore errors and still translate and run shells
    -s, --silent-run    silent run, silent all output which was automatically run by taffish

* More Information: https://taffish.com

More Information: TAFFISH-DEVELOPMENT-MANUAL

2.6 (Optional) Find out what "tTAFFISH" did during the installation process

During our installation, we mainly did the following things:

  1. Create a Home Folder directory/structure for Taffish:
    1. (root) global install: /usr/local/share/taffish/
    2. (nonroot)local install: ~/.taffish/
  2. Install the executable/binary (taf & taffish) to a specific location (environment path):
    1. (root) global install: /usr/local/bin/
    2. (nonroot)local install: ~/.taffish/bin/

      Local installation may require you to add the corresponding path to the end of your shrc file (such as "~/.bashrc" or "~/.zshrc", etc.), the specific method and process will have corresponding prompts in the installation, and you can execute it according to the operation, generally the code you need to add is:

      export PATH=~/.taffish/bin/:$PATH
      
  3. Adapt the Autocomplete script to the executable
    1. (root) global install: /etc/bash_completion.d/
    2. (nonroot)local install: ~/.taffish/completion/

      Same as the local installation in the previous step, you may need to manually add the code of the source corresponding to the autocomplete script file in the shrc file during the local installation, and the specific operation will be given during the installation process. And after adding it for the first time, you may need to manually source or restart the terminal to achieve local autocompletion.

  4. Add the taffish config files to a specific path
    1. (root) global install: /usr/local/etc/taffish/
    2. (nonroot)local install: ~/.config/taffish/

    More information about config: TAFFISH-DEVELOPMENT-MANUAL

  5. (Optional) Add a vim highlight match (.taf end file)

    This step may not be successful, depending on the suitability of the operating system.

3. Quick Start (An Example)

Now that you have successfully installed "TAFFISH", we will go through a simple example to help you understand and use "TAFFISH" even further!

Hello, I'm fishka, a researcher in bioinformatics, and today a botanist friend of mine brought me some data:

  1. Data on some protein sequences detected in a plant sample (AT, Arabidopsis thaliana)
  2. Protein sequence data for a family of proteins (p450) associated with the synthesis of a medicinal ingredient

This is just a demonstration (castrated version), and the real situation may be some kind of medicinal plant and other protein families or gene families that have not been studied and sequenced much, or the genetic sequence and some disease sequence data in the patient's body

He wanted me to analyze which proteins in Arabidopsis thaliana might be proteins of the p450 family from the perspective of bioinformation. And it's better to give him the corresponding protein ID directly, rather than complex file information! I'm going to show you how I can use Taffish to accomplish such a task!

3.1 Clarify the solution and the tools you need to use

First, we should clarify how to solve this problem and what bioinformatics tools we will use in this problem:

The second step can actually use local tools such as cut and uniq, but this may reduce portability and reproducibility, so here we will use taf-app to complete all the steps.

So let's get hands-on with the problem.

3.1.1 Give a rough solution code first

First of all, before starting the detailed process, I will give all the code needed to deal with the problem directly, and then expand the process step by step, so as to give the user an intuitive concept:

taf update
taf install blast debian
taf-blast --cmd blastp --dbin ./p450.fasta --in ./at.fasta
taf-debian cut -f 1 ./blast-out/out.blastp_matches_my-blast-db.txt \| sort \| uniq

That's right, it only takes three lines of code to install the software, to use the tools, to fix the problem, and as long as you have the correct installation of Taffish and any of the containerized software (and the device is "functional" networked), the above process can be easily replicated on any supported device without any conflicts in the software installation environment!

Taffish's software installation is different from other software installations, our software installation only downloads the TAF script (plain text file) of the corresponding tool from the Internet, and does not involve the system environment at all, so the installation process of any TAF-APP can basically be completed quickly.

So let's start showing the logic behind the above process in detail!

3.2 Install the tools you want to use (taf install)

The software under the "TAFFISH" system is based on container management software, so as long as you have at least one container management system installed correctly [2.1 (optional but recommended) Install container management software], then we don't have to worry about any dependencies or environments in the software installation process, we just need to find the corresponding software in "Taf-Hub/App-Store", and then install it with a single line of commands!

taf update
taf search blast debian

In fact, taf-blast includes tools such as cut and uniq, but not all of these taf-app environments will include those general tools.

You might get an output similar to the following:

[All apps searched]:
blast debian

As you can see, we have the corresponding tools in the "App Store", so let's quickly install these two tools:

taf install blast debian

If this step has already been installed, you may be asked if you want to overwrite and reinstall.

If the installation is successful, you may see something like this:

[√] blast ..................................... [Installed]
[√] debian .................................... [Installed]

Congratulations, the installation is successful! Then we can start using these two tools:

3.3 Sequence alignment was performed using TAF-BLAST(taf-blast)

Now that we have successfully installed taf-blast, we can use the following command to see how the tool works:

taf-blast -h

You might get something like this:

# <blast:latest | KaiyuanHan | 2025-01-06>
### Optional ##############################################################################
    <main>
        if ( echo '::dbin::' | grep "\.fasta$" > /dev/null 2>&1 ); then makeblastdb ::db-opts::; fi;
        ::cmd:: ::opts::
    <outdir>
        ::*WORKDIR*::blast-out
    <dbout>
        "::outdir::/my-blast-db/::dbtitle::"
    <db-opts>
        -in       ::dbin::       # (auto)                 for building blast-database
        -dbtype   ::dbtype::     # (default: auto)        database type ,prot=>protain,nucl=nucleic-acid
        -title    ::dbtitle::    # (default: my-blast-db) database title
        -out      ::dbout::      # (default: "./blast-out/my-blast-db/::dbtitle::") database output
        ::db-opts-add::
    <opts>
        -db               ::db::              # (need) database for blast
        -query            ::in::              # (need) blast seqs' file
        -out              ::out::             # (default: "./blast-out/out.cmd_matches_::dbtitle::.txt") output
        -evalue           ::evalue::          # (default: 1e-5) e-value
#       -num_aligntments  ::num-align::       # (default: 10)   seqs' number for blast
#       -max_target_seqs  ::blast-maxnum::    # (default: 4)    most target seqs' mapped number
#       -perc_identity    ::identity::        # (default: 90)   perc identity
        -num_threads      ::threads::         # (default: 4)    cpu threads
        -outfmt=::outfmt::                    # (default: 6)    output format: 0~18, usually 0,5,6,7
                                              # [0: same to online] [5: XML] [6: table] [7: table with anno]...
        ::opts-add::
    <out>
        "::outdir::/out.::cmd::_matches_::dbtitle::.txt"
### NEED ##############################################################################
    <cmd>
        # [blastn:DNA=>DNA-db] [blastp:Protain=>Protain-db] [blastx:DNA=>Protain-db] ...
    <dbin>
        # fasta file for building blast-database
    <in>
        # input fasta file
### RUN ##############################################################################
    <container:taf-blast:docker.io/ncbi/blast:latest>
        ::*MAIN*::

In fact, the core content is the NEED parameter and the RUN command, we only need to provide:

  1. cmd: What contrast tools will we use (protein: blastp, nucleic acid: blastn, ...);
  2. dbin: The sequence file used for comparison, that is, the library file, that is, the disease sequence information file;
  3. in: The sequence file to be compared, i.e. the patient's sequence file;
  4. The rest of the parameters are also available if you are interested in setting it up in more detail.

So let's use taf-blast like any other command-line tool:

taf-blast --cmd blastp --dbin ./p450.fasta --in ./at.fasta

From a runtime perspective, the taf script will run all the code in the RUN, and any '::xxx::' parameter involved in it can be assigned from the command line using 'taf-cmd --xxx xxx-value' (regardless of whether the default value is given in ARGS or not) (except for the built-in parameters)

The first run may have a process of getting the image from the official, and then it will not be run again.

In this run, we showed the "developer-suggested usage" of taf-app, that is, the built-in usage that the developer has optimized, rather than the original usage of the tool, which requires users to know and learn the tool usage additionally, and supports taf-app that has been "optimized" for the tool developed by themselves. In the next step, we'll show another more universal and general usage.

When we finish running it and use ls we should see that there is an additional folder ./blast-out/ under the current working path, and the comparison result we want for the two data is in this folder: ./blast-out/out.blastp_matches_my-blast-db.txt. We can use the less -SN command or your preferred command to see the results of our comparison.

sp|Q9ASR3|C7091_ARATH	tr|Q0X087|Q0X087_SOLLC	38.178	516	307	5	6	517	8	515	7.07e-135	396
sp|Q9ASR3|C7091_ARATH	sp|O48786|C734A_ARATH	37.739	522	297	8	9	519	13	517	3.25e-133	392
sp|Q9ASR3|C7091_ARATH	sp|Q05047|SLS1_CATRO	38.760	516	296	8	15	518	16	523	1.95e-118	354
……

It can be seen that BLAST aligns each AT sequence with the sequence in the p450 library, and it is possible that one AT sequence is highly similar to multiple p450 sequences. In the blast in the previous step, our default parameters have already screened for sequences with relatively high similarity, so now we only need to use cut to get the AT sequence of the first column and use sort + uniq to remove the duplicates to get the AT protein sequence that may be P450!

3.4 Use taf-debian for statistics and data display (taf-debian)

If in a normal environment, then our code should look like this:

cut -f 1 ./blast-out/out.blastp_matches_my-blast-db.txt | sort | uniq

However, in order to ensure portability and customer service as much as possible, we use cut sort and uniq in the official Debian image to achieve this, namely:

taf-debian cut -f 1 ./blast-out/out.blastp_matches_my-blast-db.txt \| sort \| uniq

Comparing the two, we have made two changes:

  1. Add a taf-debian at the top: this step is equivalent to putting the later code in the environment to run, then containerized software such as docker ensures that this step is portable and reproducible;
  2. Changed the pipe | to \| so that the pipe can also be passed to taf-debian as part of the code, rather than being recognized by the local shell as a pipe to pass to the local sort and uniq;

    So if you still use | it's probably fine, but then it will use your local sort and uniq

After running it, we might get something like this:

sp|O65785|C71B3_ARATH
sp|P92994|TCMO_ARATH
sp|Q9ASR3|C7091_ARATH
tr|Q9LEX2|Q9LEX2_ARATH
tr|Q9STI1|Q9STI1_ARATH

And so we have completed the task of our botanist friends!

3.5 (Optional) Use taffish to build the process (taffish)

But what if he comes to us again? It's better to package the above process directly, so that he can also implement the above steps himself! Usually we rarely share our work directly with them and let them calculate it by themselves, because most of the time our computer environment and software installation are very different, and if we want our scripts to run smoothly on their devices, we often need to make a lot of effort (environmental monitoring, software installation, let's do it...... )。 BUT TAFFISH WILL CHANGE THAT, WE ONLY NEED "CONTAINER MANAGEMENT SOFTWARE" + "TAFFISH" TO REPRODUCE OUR WORK ON MOST DEVICES!

If you don't give your tool both x86 and arm64 architectures, then there may be a minor compatibility issue.

So again, let's code first, and then explain!

3.5.1 Let's start with the code for the taf script

+FLOW:blast-get-IDs
ARGS
    <cmd>
        blastp
    <dbin>
        ./db.fasta
    <in>
        ./in.fasta
RUN
    <auto-flow>
        taf-blast --cmd ::cmd:: --dbin ::dbin:: --in ::in:: > /dev/null 2>&1
        taf-debian cut -f 1 ./blast-out/out.::cmd::_matches_my-blast-db.txt \| sort \| uniq

In fact, it is easy to see that compared to the original work, we have only made the following changes:

  1. Some default values are given: even if not, when the user uses it, it can be passed in from the command line via --cmd ...
  2. The RUN-<auto-flow> tag is set: there is no need to manually install the software, and taf-blast and taf-debian will be installed automatically
  3. In the taf-blast step we added the code at the end: > /dev/null 2>&1, which clears the output of this step on the screen, ensuring that only the final ID result we want is output on the screen
  4. The path of the file was changed in the taf-debian step, as the output filename of taf-blast will change somewhat depending on cmd (see taf-blast -h for details)

Then all we need to do is copy and paste this script file/code and send it to our botanical friend and ask him to run it directly next time! There is no need to install any additional software, do any environment configuration, and do not need to modify any code. In this way, we almost directly took a few simple lines of commands while we were working and turned them into a portable, reproducible, easily shareable and installable software!

I wonder if this case has given you a deeper understanding of Taffish? If you've already seen this, then try using Taffish to easily use, build, and share your tools/processes and work!

3.5.2 Introduction to TAFFISH Grammarly

Now I'm going to give you a brief introduction to taffish as a language, and if there are some parts of the above that you don't understand, then I think it will be easier to understand the above after reading the rest of the content.

The Taffish language is a similar markup language, the syntax is simple, it separates the elements of each line by line breaks, and then separates the functional structure of the work code with specific tags (single-line elements), and the functional structure of the code can be divided into four levels from top to bottom:

  1. taf declaration at the beginning of the file: at the beginning of the file, use "+TOOL:tool-name" or "+FLOW:flow-name" to declare what kind of file it is, which is required, and generally a file will only appear once at the beginning of the file;
  2. First-level tags: "ARGS" or "RUN" or "LOAD", etc.: This first-level label needs to occupy a separate line in full uppercase, among which we will only use "ARGS" and "RUN" this time, in fact, most taf scripts will only involve these two first-level labels;
    1. ARGS: This is the parameter space, which stores all the parameters and their default values. But it's important to note that our TAFFISH has some built-in default parameters, which they can be called directly via '::xxx::' without needing to be passed in ARGS or from the command line, and cannot be modified and redefined in any way, they have:
      1. *WROKDIR*: The current user's work path
      2. *USER*: The username of the current user
      3. *CPUS*: The number of threads on the current device
      4. *CMD-ARGS*: All command-line parameters of this run (including taffish configuration parameters, such as --force, --silent-run, etc.)
      5. *APP-ARGS*: The parameters accepted by the app in this run (excluding the taffish configuration parameters, only the parameters accepted by the taf-app)
      6. *LOAD-DIR*: The path where the taffish file is run this time (if it is taf-app, that is the path where the taf script of the app is located, sometimes some run-related code will form a certain relative path relationship with the script, and you can set a stable call path through this parameter)
      7. *MAIN*: (1) If the user adds *APP-ARGS*, that is, a subsequent parameter, after calling taf-app, and the first item of the parameter starts with --, then the parameter will be replaced with ::main::, and the user needs to define ARGS-<main> parameter to customize the usage; (2) If the first item of the user's subsequent parameter does not start with '--', then replace this parameter with ::*APP-ARGS*::, that is, directly use the user's input as the running code (the use of taf-debian in 3.4 relies on this method); (3) If the user does not give any app parameters, that is, use them in a way like taf-blast, then this parameter will be replaced with ::else::, and the user can set the code when there is no input through ARGS-<else>.

      These are the main built-in parameters, and it is recommended to use *MAIN* in every taf-app to write code!

    2. RUN: This is the run/code space, where all the code to be run and its runtime environment (tag) are stored, each tag is a way to deal with the code below, and the proper use of tags can help us simplify our code.
  3. Secondary labels: Included by a pair of angle brackets such as <>, the specific functions and meanings of the contents of the package are different under different first-level labels, and now briefly explain the meaning under the two first-level labels of "ARGS" and "RUN":
    1. ARGS: The content corresponding to the secondary tag under ARGS is the variable name, e.g. if there is a taf code like this:

      +TOOL:test
      ARGS
          <xxx>
              -a     1
              --name 23
      ……
      

      Then you can use ::xxx:: (wrap the xxx variable with a pair of :: double colons) at anywhere else in the taf file and replace it with the corresponding position, and it should be noted that the content under the secondary tag will automatically replace the line break with a space when the variable is replaced, that is, if you have a code like this: echo "::xxx::", then it will be replaced with echo "-a 1 --name 23". Therefore, "ARGS" actually constitutes a parameter substitution system;

    2. RUN: As the name suggests, in fact, RUN is the code snippet we are going to run, so the taf language is actually somewhat similar to some static languages, you need to declare variables (ARGS) first, and then call variables to write code (RUN), but some differences are that our variables are stored under the "ARGS" tag, and when we run, we can use the secondary tag of "RUN" to make different processing of the code, and finally all the processed code will be saved to a " shell script" (in fact, there is not really such a file, but the underlying shell code is passed directly to the bash command to run):

      1. <local>/<sh>/<shell>: The code under this tag is copied into the final shell script as if it were shell code;
      2. <container/apptainer/podman/docker($cmd):taf-container-name:docker-image>: The code under this tag will add the shell code for automatic generation/running of the container according to the container name and docker image name you provide, and pass the shell code under the second-level tag to the corresponding cmd in the container (bash by default) through heredoc;
      3. <flow>: The shell code under this tag can call taf-app and convert it into the corresponding shell code and embed it into the current shell, instead of simply keeping taf-app in the current shell and handing it over to the shell to run;
      4. <auto-flow>: This tag <flow> will automatically detect whether the corresponding taf-app is installed, and if not, the corresponding taf-app will be automatically installed.
  4. Level 3 content: Under the secondary tag, it generally corresponds to the direct content, and the content will have different meanings and corresponding different treatment methods under different tags, under "ARGS" is the value of the variable, and under "RUN" it is generally code.

    "RUN" is usually code, and it is usually shell code, but for tags like <python> are code for other languages, and there are even some custom secondary RUN tags under which there can be special code structures, etc. Users can also programmatically define their own secondary RUN tags, you can learn it at: TAFFISH-DEVELOPMENT-MANUAL

More Information about TAFFISH: TAFFISH-DEVELOPMENT-MANUAL