What is Docker?
Through this section, you can learn about the use of Docker image and the basic operation about image in this platform.
The origin of Docker
In the days when there was no docker, suppose we were developing an artificial intelligence project on "how to make people laugh". This project required more dependencies, required a lot of compilation libraries, different language packages, and a lot of software and services. It may take a day to build an environment from beginning to end, and then start writing the code. After writing the code, we have to give the code to another partner for testing. At this time, if the testing partner cannot use your development environment (you can only give other code, such as github), he needs to start to build this environment from beginning to end. At this time, due to operating system differences, system version issues, dependent package version issues, and various reasons, the test will eventually take a lot of time. The code is not running, or there are a lot of strange problems in the running process. This is what we, as developers, can only weakly reply to "Obviously it can run in other people's environment". This is just a test. When it comes online, when the product is released, or when it is shared with other people, they need to build it again. As you can imagine, the variables are still quite large.
From the whole process, we can see that not only we repeatedly build three sets of environments, it was a waste of time and efficiency. Smart programmers will never meet the status quo. Therefore, it is time for programmers to change the world. Container technology coming.
Docker is an open source project implemented in Go language. It is a powerful open software platform. By encapsulating the software into a Docker image (image), the software can be run on different platforms and environments. At the same time, people can put Docker image in public (Docker Hub) or private Docker warehouses (repository) and share them with each other. These greatly facilitate the installation, sharing and operation of the software.
Docker installation
Users can learn about the installation of Docker through the link below. Docker provides users with download and installation tutorials for each platform of linux / windows / macOS. You can select the specific platform to view the installation details. https://docs.docker.com/engine/install/
Linux Ubuntu system installation Docker example
Take the Ubuntu system installation as an example, the installation tutorial can be accessed https://docs.docker.com/engine/install/ubuntu/.
Operating system minimum requirements
In order to ensure the normal operation of docker, your system version needs to meet the following arbitrary requirements
>= Ubuntu Groovy 20.10
>= Ubuntu Focal 20.04 (LTS)
>= Ubuntu Bionic 18.04 (LTS)
>= Ubuntu Xenial 16.04 (LTS)
Delete old version
Older Docker versions are called docker, docker.io or docker-engine. If these are installed, uninstall them:
sudo apt-get remove docker docker-engine docker.io containerd runc
Docker dependency installation
Before installing Docker for the first time, you need to install the Docker repository. After that, you can install and update Docker from the Docker repository.
sudo apt-get update
sudo apt-get install \
apt-transport-https \
ca-certificates \
curl \
gnupg \
lsb-release
Add Docker's official GPG key (if it fails, please visit https://docs.docker.com/engine/install/ubuntu/):
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg
Set a stable image source. If your system is x86_64 / amd64, you can use the command below. For other systems, please refer to the tutorial above.
echo \
"deb [arch=amd64 signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/ubuntu \
$(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
Docker installation
sudo apt-get update
sudo apt-get install docker-ce docker-ce-cli containerd.io
To install a specific version of Docker Engine, list the available versions in the repository, then select and install:
# list the available versions
apt-cache madison docker-ce
# return the result
# docker-ce | 5:18.09.1~3-0~ubuntu-xenial | https://download.docker.com/linux/ubuntu xenial/stable amd64 Packages
# docker-ce | 5:18.09.0~3-0~ubuntu-xenial | https://download.docker.com/linux/ubuntu xenial/stable amd64 Packages
# docker-ce | 18.06.1~ce~3-0~ubuntu | https://download.docker.com/linux/ubuntu xenial/stable amd64 Packages
# docker-ce | 18.06.0~ce~3-0~ubuntu | https://download.docker.com/linux/ubuntu xenial/stable amd64 Packages
Use the version string in the second column to install a specific version, for example 5:18.09.1~3-0~ubuntu-xenial
sudo apt-get install docker-ce=<VERSION_STRING> docker-ce-cli=<VERSION_STRING> containerd.io
After the installation is complete, you can test whether the installation is successful:
sudo docker run hello-world
Use of Docker tools
For a detailed Docker tool reference tutorial, you can visit https://docs.docker.com/get-started/, we will show a simple command demonstration here.
Pull basic image from remote warehouse
# dockr pull the name of image:version
# Pull the basic image from the remote warehouse ubuntu: 16.04
docker pull ubuntu:16.04
# Pull the basic image with R language installed from the remote warehouse r-base: 3.6.3
docker pull r-base:3.6.3
# Pull the bioinformatics analysis tool named fastaqc from the remote warehouse, you can visit https://biocontainers.pro/registry for more bioinformatics tool image
docker pull biocontainers/fastqc:v0.11.9_cv7
view local image warehouse
# View local image
docker images
# The returned result is a list of image, the first line is the column name, including the warehouse name / label / image ID / creation time / image size
# REPOSITORY TAG IMAGE ID CREATED SIZE
# biocontainers/fastqc v0.11.9_cv7 e5e3008d2bd1 5 months ago 834MB
# r-base 3.6.3 cec2502269fb 11 months ago 682MB
# ubuntu 16.04 dec3202189t1 1 months ago 1.5GB
Use the image to start a container and run
# Start r-base: 3.6.3 image, map the /data directory under the server with the /data directory in the image container, and set the container name to DEMO
docker run -it --name DEMO -v /data:/data r-base:3.6.3 /bin/bash
# If you want to exit
exit
View the list of all containers
docker ps -a
# The returned result is a list of containers, the first line is the container ID / dependent image name / command / creation time / status / port / container name
# CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
# 8759b9205aba r-base:3.6.3 "/bin/bash" 9 seconds ago Exited (0) 6 seconds ago DEMO
# If you want to restart the exited container docker start container ID
docker start 8759b9205aba
docker ps -a
# The returned result is a list of containers, the first line is the container ID / dependent image name / command / creation time / status / port / container name
# CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
# 8759b9205aba r-base:3.6.3 "/bin/bash" 9 seconds ago Up 4 seconds DEMO
# If you want to enter the started container in command line mode docker exec -it container ID /bin/bash
docker exec -it 8759b9205aba /bin/bash
Make the container become a new image
# docker commit Container ID New image name: set version
docker commit 8759b9205aba new_r:1
docker images
# Added a new tool image
# REPOSITORY TAG IMAGE ID CREATED SIZE
# new_r 1 1877eedea49d 1 second ago 682MB
# biocontainers/fastqc v0.11.9_cv7 e5e3008d2bd1 5 months ago 834MB
# r-base 3.6.3 cec2502269fb 11 months ago 682MB
# ubuntu 16.04 dec3202189t1 1 months ago 1.5GB
Delete container
#If the container is running, stop running first, docker stop Container ID
docker stop 8759b9205aba
# Delete the container, docker rm container ID
docker rm 8759b9205aba
Delete image
# If there are containers that depend on the image to be deleted, you need to delete these containers first, refer to Deleting Containers
#Delete image docker rmi name of image: version
docker rmi new_r:1
Install image based on DockerFile
Dockerfile is a text file used to build a image. The text contains instructions and instructions for building a image. Since the production of tools may not only rely on some basic images, we also need to further modify these basic images and add some programs to realize the normal operation of the tools. If you open a container in the base image, then enter the container and save it as a new image after modification, the size of such an image will be relatively large, which is not conducive to transmission. We recommend using DockerFile to create a tool image. For specific introduction, please visit https://docs.docker.com/develop/develop-images/dockerfile_best-practices/
What is dockerfile?
Dockerfile is a text document containing commands for combining images. You can use any command by the command line. Docker automatically generates an image by reading the instructions in the Dockerfile. The docker build command is used to build an image from a Dockerfile.
Users can use the -f flag in the docker build command to point to a Dockerfile anywhere in the file system, and -t is used to specify the docker name and version number. example:
# docker build -f /path/to/a/Dockerfile -t fastp:latest
The basic structure of Dockerfile
Dockerfile is generally divided into four parts: basic image information, maintainer information, image operation instructions, and execution instructions when the container is started. # is a comment in the Dockerfile.
Dockerfile file description
Docker runs the instructions of the Dockerfile from top to bottom. In order to specify the base image, the first instruction must be FROM. A statement beginning with the # character is considered a comment. You can use RUN, CMD, FROM, EXPOSE, ENV and other instructions in Docker files. Some commonly used commands are listed here.
FROM description
FROM:Specify the base image, which must be the first command format:
For example:FROM <image>
For example:FROM <image>:<tag>
For example:FROM <image>@<digest>
For example:
FROM gatk:4.0
Note: tag or digest is optional, if these two values are not used, the latest version of the basic image will be used
MAINTAINER description
MAINTAINER: Maintainer information format:
MAINTAINER <name>
Example:
MAINTAINER flowhub MAINTAINER flowhub_team@flowhub.com.cn
MAINTAINER flowhub <flowhub_team@flowhub.com.cn>
ADD description
ADD:Add local files to the container, tar files will be automatically decompressed (network compressed resources will not be decompressed), you can access network resources, such as wget format:
ADD <src>... <dest>
ADD ["<src>",... "<dest>"] is used to support paths containing spaces
Example:
ADD hom* /mydir/
# Add all files starting with "hom"
ADD hom?.txt /mydir/
# ? Replace a single character, for example: "home.txt"
ADD test relativeDir/
# Add "test" to `WORKDIR`/relativeDir/
ADD test /absoluteDir/
# Add "test" to /absoluteDir/
COPY description
COPY:The function is similar to ADD, but the file will not be automatically decompressed, and network resources cannot be accessed.
RUN description
RUN:The command RUN executed when the image is built to execute commands in the image container. There are two command execution methods:
shell execution format:
RUN <command>
exec execution format: :
RUN executable param1 param2
Example:
RUN executable param1 param2
RUN apk update
RUN /etc/execfile arg1 arg1
Note:The intermediate image created by the RUN instruction will be cached and will be used in the next build. If you do not want to use these cache images, you can specify the --no-cache parameter when building, such as: docker build --no-cache
ENV description
ENV:Set the environment variable format
ENV <key> <value>
#Everything after #<key> will be considered as part of its <value>, so only one variable can be set at a time
ENV <key>=<value> ...
#Multiple variables can be set. Each variable is a key-value pair of "<key>=<value>". If the <key> contains spaces, you can use \ to escape, or you can use "" to mark ; In addition, the backslash can also be used for continuation
Example:
ENV myName John Doe
ENV myCat=fluffy
The following is a small example of making a pytorch tool image based on dockerfile:
# After setting the variable, you can get this variable through ${xxx}
ARG BASE_IMAGE=ubuntu:18.04
ARG PYTHON_VERSION=3.8
#The image is built with ubuntu:18.04
FROM ${BASE_IMAGE} as dev-base
# The RUN command can be understood as executing the command directly in the terminal
RUN --mount=type=cache,id=apt-dev,target=/var/cache/apt \
apt-get update && apt-get install -y --no-install-recommends \
build-essential \
ca-certificates \
ccache \
cmake \
curl \
git \
libjpeg-dev \
libpng-dev && \
rm -rf /var/lib/apt/lists/*
RUN /usr/sbin/update-ccache-symlinks
RUN mkdir /opt/ccache && ccache --set-config=cache_dir=/opt/ccache
# Set conda environment variables
ENV PATH /opt/conda/bin:$PATH
# Multiple FROM instructions are not used to generate multi-rooted layer relationships. The final generated image is still based on the last FROM. The previous FROM will be discarded. So what is the meaning of the previous FROM?
# Each FROM instruction is a construction stage, and multiple FROM is a multi-stage construction, although the final generated image can only be the result of the last stage.
# However, the ability to copy files in the pre-stage to the latter stage is the greatest significance of multi-stage construction.
# The biggest usage scenario is to separate the compilation environment and the runtime environment
FROM dev-base as conda
ARG PYTHON_VERSION=3.8
RUN curl -fsSL -v -o ~/miniconda.sh -O https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh && \
chmod +x ~/miniconda.sh && \
~/miniconda.sh -b -p /opt/conda && \
rm ~/miniconda.sh && \
/opt/conda/bin/conda install -y python=${PYTHON_VERSION} conda-build pyyaml numpy ipython&& \
/opt/conda/bin/conda clean -ya
FROM dev-base as submodule-update
# Set the current working directory, which can be understood as cd /opt/pytorch
WORKDIR /opt/pytorch
# Copy the file of the currently executed command into the build directory of docker
COPY . .
RUN git submodule update --init --recursive
FROM conda as build
WORKDIR /opt/pytorch
COPY --from=conda /opt/conda /opt/conda
COPY --from=submodule-update /opt/pytorch /opt/pytorch
RUN --mount=type=cache,target=/opt/ccache \
TORCH_CUDA_ARCH_LIST="3.5 5.2 6.0 6.1 7.0+PTX 8.0" TORCH_NVCC_FLAGS="-Xfatbin -compress-all" \
CMAKE_PREFIX_PATH="$(dirname $(which conda))/../" \
python setup.py install
FROM conda as conda-installs
ARG PYTHON_VERSION=3.8
ARG CUDA_VERSION=11.1
ARG CUDA_CHANNEL=nvidia
ARG INSTALL_CHANNEL=pytorch-nightly
ENV CONDA_OVERRIDE_CUDA=${CUDA_VERSION}
RUN /opt/conda/bin/conda install -c "${INSTALL_CHANNEL}" -c "${CUDA_CHANNEL}" -y python=${PYTHON_VERSION} pytorch torchvision torchtext "cudatoolkit=${CUDA_VERSION}" && \
/opt/conda/bin/conda clean -ya
RUN /opt/conda/bin/pip install torchelastic
FROM ${BASE_IMAGE} as official
ARG PYTORCH_VERSION
LABEL com.nvidia.volumes.needed="nvidia_driver"
RUN --mount=type=cache,id=apt-final,target=/var/cache/apt \
apt-get update && apt-get install -y --no-install-recommends \
ca-certificates \
libjpeg-dev \
libpng-dev && \
rm -rf /var/lib/apt/lists/*
COPY --from=conda-installs /opt/conda /opt/conda
ENV PATH /opt/conda/bin:$PATH
ENV NVIDIA_VISIBLE_DEVICES all
ENV NVIDIA_DRIVER_CAPABILITIES compute,utility
ENV LD_LIBRARY_PATH /usr/local/nvidia/lib:/usr/local/nvidia/lib64
ENV PYTORCH_VERSION ${PYTORCH_VERSION}
WORKDIR /workspace
FROM official as dev
# Should override the already installed version from the official-image stage
COPY --from=build /opt/conda /opt/conda