How to make a tool
This chapter is long and requires you to read and understand it carefully. If possible, we recommend that you synchronize operations while reading to help understand. If you encounter problems, you can contact us flowhub_team@flowhub.com.cn。
Creation of local base image/tool image
After you understand docker, you will know that the production of docker image can be done by making dockerfile. The main function of dockerfile is to tell us docker which system to choose as the image system (such as ubuntu16.04), and required dependencies of the tool, and then pull the code of the tool from where to complete the deployment of the tool (such as github). If you don't know how to use dockerfile, you can also make tool image in the following way:
Pull images from remote warehouses
For specific commands about the use of the docker tool, please refer to the document of ‘ What is docker?’. here we will focus on some basic images. For more images,you can visit hhttps://registry.hub.docker.com/. For docker related to biological information, you can visit https://biocontainers.pro/. In terms of the system, we recommend using Alpin. The smallest Docker image is only 5MB.,it provide APK package management tool ,and It is very convenient to search, install, delete, and upgrade the software. It is very suitable for container : due to its small size and complete functions, it is very suitable as the basic image of the container.
Operating system image – This image contains the operating system you want, and you can develop on this basis
Name of image | Command of get the image |
---|---|
Ubuntu:16.04 | docker pull ubuntu:16.04 |
Ubuntu:18.04 | docker pull ubuntu:18.04 |
Centos7 | docker pull centos:7 |
Centos8 | docker pull centos:8 |
Alpine (The smallest linux system) | docker pull alpine |
Debian | docker pull debian |
The image which had installed part of the development environment – contains the operating system you want and part of the development environment such as R, python environment, etc.
Name of image | Command of get the image |
---|---|
latest version of R language | docker pull r-base:latest |
Latest version of Python | docker pull python:latest |
The latest version of Perl | docker pull perl:latest |
The image which had installed part of the analysis software – contains the operating system and analysis software you want
Name of image | Command of get the image |
---|---|
Opencue Image rendering manager | docker pull opencue/rqd |
gatk4 Bioinformatics analysis process | docker pull quay.io/biocontainers/gatk4:4.2.0.0--hdfd78af_1 |
Develop analysis tools based on existing images
If the docker image already contains the software you need and can run successfully, you can skip this step. If you need to perform secondary packaging based on the existing image, you can do it in two ways.
Method 1: Use the image to start a container and run it, enter the container, write and install analysis software and scripts. After the test is successful, exit the container and package the container into a new image to complete the creation of the tool image. Part of the content can refer to the document of What is docker
Method 2:Complete tool image packaging based on dockerfile. This method can manage docker images hierarchically, minimize the size of docker images, and facilitate transmission.
Method 3: Directly pull mature tool images based on the Pull Docker
function in the flowhub tool list. For details, please read Tool development - pull images directly, after reading You can skip the instructions for uploading the local tool below and directly read the parameter configuration of the tool after uploading
Upload local tool
Upload the image to the FlowHub platform. Let's take the production of the md5sum verification tool as an example to illustrate. md5sum is a built-in tool of most Linux systems, so the tool image of md5sum is very easy to obtain. You only need the Linux system image. Here we are Just use Alpine mirroring. Since the Alpine image has a built-in md5sum command, we first use docker to pull the Alpine image.
sudo docker pull alpine:latest
docker images
# REPOSITORY TAG IMAGE ID CREATED SIZE
# alpine latest 28f6e2705743 4 weeks ago 5.61MB
We can find that this image is surprisingly small, only 5.61MB. Next we will use the fkit tool to upload the tool image. If you haven't installed the fkit tool yet or you don't know how to use the fkit tool, you can visit the Fkit command line tool introduction. After the upload is successful, return to the corresponding item on the FlowHub platform, refresh the tool list, and you will find a new tool md5sum has been added to the list.
# note:Use fkit to log in to the specified project,fkit login -k [AccessKey] -s [AccessSecret]
# AccessKey AccessSecret It can be obtained in the user center at the top right of the platform webpage
fkit login -k dd3e5b06896711eb8cfb12420bfe23a5 -s ccd4bf88bb0411ebb2227ae21af08499
# login successful
# index: 1 Project id: aa9e9899985311ebbbf302420bfe23a5 Project Name: test/demo
# -----------Please select project id--------------
# Select the corresponding item, there is only one item, the corresponding id is 1, so enter 1, and press Enter
1
# note:After successful login, the tool image in the local image warehouse will be pushed to the corresponding project, and the tool name will be named
# fkit createTool [IMAGE_NAME] [TOOL_NAME]
fkit createTool alpine:latest md5sum
# Current project:demo
# {'status': 'The push refers to repository [develop.flowhub.com.cn:5000/81406e14897711ebacec3946668b50ad]'}
# {'status': 'Preparing', 'progressDetail': {}, 'id': 'cb381a32b229'}
# {'status': 'Mounted from 7bbc246c858d11eba9b4fb4dea426e31', 'progressDetail': {}, 'id': 'cb381a32b229'}
# {'status': 'latest: digest: sha256:4661fb57f7890b9145907a1fe2555091d333ff3d28db86c3bb906f6a2be93c87 size: 528'}
# {'progressDetail': {}, 'aux': {'Tag': 'latest', 'Digest': 'sha256:4661fb57f7890b9145907a1fe2555091d333ff3d28db86c3bb906f6a2be93c87', 'Size': 528}}
# push successful
The upload is successful, return to the corresponding project on the FlowHub platform, refresh the tool list, and you will find a new tool named md5sum has been added to the list.
Tool parameter configuration after upload
We hope to provide you with a more convenient tool configuration operation, so when the tool is created, you only need to upload the tool image. After the tool is uploaded, you can operate through the interface and click the tool to enter the details page to edit the parameter information in the tool. Click the edit button (current version) on the right to start editing the tool. The editing content mainly includes tool description/tool command line/tool input/tool output/tool running configuration, etc.
Tool description
Using the markdown editor, you can visit markdown to learn about the specific syntax. We will provide a default content template, you can refer to this template for content editing
Fill tool command line
We need to write a command line so that the tool can run according to the command line. Take the md5sum tool as an example. In a Linux system, we usually run the following commands to use it.
md5sum -t /input/data.txt > /output/md5sum.result.txt
------ ----------------- --------------- -----------------------
Tool command parameter input file output file
In the editing of the FlowHub tool, we need to capture and set the input&output files/ tool parameters, so we have agreed on such a command line writing regulation, to make the tool use more flexible.
type | command line writing regulation |
---|---|
input file/output file and the path of files | #{prefix<:>VALUE<|>io<:>KEY<|>option} |
parameter of string type | #{prefix<:>VALUE<|>string<:>KEY<|>option} |
parameter of number type | #{prefix<:>VALUE<|>number<:>KEY<:>VALUE<|>option} |
In the editing of the FlowHub tool, we need to capture and set the input&output files/ tool parameters, so we have agreed on such a command line writing regulation, to make the tool use more flexible.
#{}
is capture symbols, use #{}
to capture the input files/ output files/ tool parameters that we want to expose from the command line.
prefix<:>VALUE
is ‘flag’ identifier, it is not necessary.
string<:>KEY<:>VALUE
number<:>KEY<:>VALUE
is used as the parameter value identifier, and it can be omitted if it is not available.
io<:>KEY
are used as ‘I/O’ identifiers.
option
as an option identifier
, If it do not have a flag identifier and a required item, you need not to fill it.
<|>
As a separator
, since there are multiple identifiers in one capture symbol, we need to use this symbol to split.
Therefore, the above tool command can be written as:
md5sum #{prefix<:>-t} #{io<:>inputkey} > #{io<:>outputkey}
------ -------------- ------------------- -----------------------
Tool command parameter input file output file
For more detailed instructions, refer to the table below
Field | Type | Description |
---|---|---|
prefix<:>VALUE | prefix identifier | prefix is fixed, and VALUE is the content of the prefix, such as -s, -N, etc., such as prefix<:>-s, prefix<:>-N, note that this is a splicing operation, so the prefix and the specific content If it is a space connection, you need to include the space in the VALUE, if it is the sign, you need to include the = in the VALUE, such as prefix<:>-N= |
string<:>KEY<:>VALUE number<:>KEY<:>VALUE |
Parameter identification | string means the character type parameter, number means the number type parameter, KEY is the command line in the parameter list ,you needs to guarantee the uniqueness of each parameter, and the default value of the value, parameter in the parameter list on the right side of the command line should be consistent with the previous string or number type |
io<:>KEY | I/O Logo | TTYPE has two optional values: input means input, output means output, input, the KEY parameter in the output tag identifies uniqueness, and all KEYs cannot be repeated |
option | Optional | he presence of option in the capture identifier indicates that it is optional, and the presence of require in the capture identifier indicates that it is a required option |
Since it is more complicated to write the entire crawling mode completely, we also agreed on some shorthands:
- The parameter is just a prefix identifier, without a value or other content after it. For example, the
-t
parameter in md5sum means to read a text file, and there is no need to bring a parameter later, it can be expressed as#{prefix<:>-t}
, if it is optional, it can be expressed as#{prefix<:>-t <|>option}
. - The parameter does not have a prefix identifier, but only needs to fill in the value or character, it can be expressed as
#{string<:>KEY<:>VALUE}
#{number<:>KEY<:>VALUE}
. - If there is no prefix identifier for input and output, it can be expressed as
#{TYPE<:>KEY}
. - If option is a required option, you can omit this part.
input / output setting
After writing the command line, the parameter list will be automatically generated according to the command line, but we also need to configure the corresponding files in the input and output.
First introduce how to fill in the input list:
field | description |
---|---|
dirName | Corresponds to the KEY of the io parameter, indicating that the file will be placed in the location of the folder in the container. If in cmd is checked, the absolute file path of the file is put in the command line. If in cmd is not checked, the absolute directory path of the file is put in the command line, excluding the file name |
port | The port name exposed by the tool in the flowchart, you can check the process creation for details |
portType | There are three options, item、 array and dir. item indicate that the input type is a single file, and array indicates that the input type is multiple files. If in cmd is checked, the absolute file paths are spelled in and separated by spaces. dir means the input file is a folder. |
pattern | * Accept any format file input XXX Accept files ending with XXX, such as jpg, png, txt, fasta |
IO path type | There are two options: filePath and dirPath filePath: the full path containing the file name dirPath: enter the path of the directory where the input file is located |
Then introduce how to fill in the output list:
field | description |
---|---|
dirName | Corresponds to the io parameter, indicating that the file will be placed in the location of the folder in the container. If in cmd is checked, the absolute file path of the file is put in the command line. If in cmd is not checked, the absolute directory path of the file is put in the command line, excluding the file name |
path | Subdirectory path, which can be empty. If there is also a subfolder under the dirName path, you can fill in the subdirectory below the dirName location here, such as result/data |
port | The port name exposed by the tool in the flowchart, you can check the process creation for details |
portType | There are three options, item、 array and dir. item indicate that the input type is a single file, and array indicates that the input type is multiple files. If in cmd is checked, the absolute file paths are spelled in and separated by spaces.dir means the output file is a folder. |
pattern | * Accept file input in any format XXX Accept files ending in XXX, such as jpg, png, txt, fasta If the port type is selected item and in cmd is checked, then the specific file name port type must be selected here It is array and in cmd is checked. This combination will not be supported by the system. |
IO path type | There are two options: filePath and dirPath filePath: the full path containing the file name dirPath: enter the path of the directory where the output file is located |
Let's go back to the md5sum tool. When the command line on the tool edit page is the following command, the corresponding input list and output list will be filled in as shown in the figure below to achieve the purpose of building the tool.
md5sum #{prefix<:>-t} #{io<:>inputkey} > #{io<:>outputkey}
Command line preview
On the tool configuration page, we provide the command configuration section, and built-in some shortcut buttons for command line configuration, such as inserting string parameters, numerical parameters, or IO parameters. At the same time, in order to facilitate users to quickly understand the real command line situation, we also synchronously display the corresponding actual allowable command parameter information.
Model selection
Select the machine type that is turned on by default when the tool is running. This is the developer’s estimate of the tool’s performance. Choose the best model. If the tool’s memory cost and CPU cost are determined by the file size, then the task will be delivered according to the actual situation. Make another modification. Since md5sum is relatively simple, we can choose the minimum configuration here. You can also apply GPU, if your tool requires GPU configuration machine to run.
Machine network settings
During the task delivery calculation process in the FlowHub platform, in order to protect the security of user data, The computing task machine will not be connected to the network by default. If your tool needs to obtain data online, you need to find the Network option on the project Setting page , Select to enable the networking protocol, after enabling, you can see the section of network settings on the tool editing page, and you can set the corresponding bandwidth according to the needs of the tool.
When you finish editing the tool, click the Update button at the top right to finish saving the tool configuration. Such a tool is ready. Looking forward to your use, if you have any other questions during the operation, please contact us or leave a message by email, flowhub_team@flowhub.com.cn,we will give help and feedback as soon as possible.