Docker Overview
December 24, 2020
Check out my project code @ Github
Abstract
We often try to simplify things but usually end up making them much more difficult. Similar is the case with code. We code, install additional dependencies, and remove redundancies. With this 3-step process, we sometimes end up with a very difficult process to explain how to reproduce the results and rerun the experiments. This blog explains Docker, which is a tool designed to make it easier to create, deploy, and run applications by using containers.
Earlier Work
Before Docker was introduced, virtualization of resources was used which provided independent virtual machines for clients to work upon. But this came with the price of heavy operating systems which may easily exceed over 1GB despite supporting light applications (around 300MB). This drawback led to the advent of containers (Docker).
What is Docker?
Docker is based on containers which run on shared resources of your PC but in isolation as shown in the following architecture. A container is an efficient mechanism to keep your software components together and maintainable. You can also run multiple containers at the same time to support a service. Docker also provides a mechanism to start all the containers concerned with that service with one command using Docker Compose. We will talk about it later.
Dockerfile
A Dockerfile is a simple text file that contains a list of commands that the Docker client calls while creating an image.
Working Dockerfile for conda environemnt.
FROM continuumio/miniconda3
WORKDIR /app
# Create the environment:
COPY environment.yml .
RUN conda env create -f environment.yml
# Make RUN commands use the new environment:
SHELL ["conda", "run", "-n", "myenv", "/bin/bash", "-c"]
RUN python -c "import numpy"
# The code to run when container is started:
COPY run.py .
ENTRYPOINT ["conda", "run", "-n", "myenv", "python", "run.py"]
- FROM creates a layer from the continuumio/miniconda3 Docker image.
- COPY adds files from your Docker client’s current directory.
- RUN builds your application with make.
- CMD specifies what command to run within the container.
- ENTRYPOINT is to set the image’s main command, allowing that image to be run as though it was that command.
- SHELL instruction allows the default shell used for the shell form of commands to be overridden. The default shell on Linux is ["/bin/sh", "-c"], and on Windows is ["cmd", "/S", "/C"]
Container Image is built using
$ docker build -t <app_name>:<label_name> .
To make sure certain package is installed we can add,
RUN echo "Make sure flask is installed:"
RUN python -c "import flask"
The image defined by your Dockerfile generate containers that have ephemeral states. It gets destroyed as soon as process is over. To access files in container we may either bash into container to run the command which generates a new file which we need on our local file system and then use $docker cp
to tranfer to local file system. And if we wish to use files from our local system in docker container we may mount those files either using COPY
when generating container or we may mount it at runtime using volume. We can also specify volume which can be utilized by both local file system and docker.
$ docker build -t dockerized-run .
$ docker run --rm -it -v <PATH-TO_IMAGES>/images:/app/images --entrypoint=/bin/bash dockerized-run
(base) root@b74706db6f68:/app# ls
environment.yml images
(base) root@b74706db6f68:/app# cd images
(base) root@b74706db6f68:/app/images# ls
out.png test.png
Volume
- Mounting volume at runtime.
$ docker run --rm -it -v <source-path>:<target-path> <docker-container-name>
The above command will run docker with specified volume (-v) mounted in the
- Mounting volume at build time using docker compose.
version: "3.9"
services:
deeplearning:
build: .
volumes:
- ./images:/app/images
Here volume
keyword specifies to mount current directory on local file system to /images on container. So the changes made to those files mounted at /images will also be reflected in local file system.
Docker Compose
It is used to start multiple containers as a single service. You may start services like react and flask server together as a service.
version: "3.9"
services:
web:
build: .
ports:
- "5000:5000"
redis:
image: "redis:alpine"
Taken from docker compose example at docker compose docs. Here we are starting 2 services web and redis. Web
is build using dockerfile as specified by . (dot)
pointing towards dockerfile and port
binds the container and the host machine to the exposed port, 5000. This can also be done using dockerfile by using EXPOSE
.
version
is used to specify that we want the details of the version of Docker Compose.
Application
Lets also talk about dockers application in AI research. Now given todays deep learning systems and its other applications, the need for using sameversion of library becomes necessary for inducing reproducibilty in these models. The most common package manager used for python is anaconda.
name: myenv
channels:
- conda-forge
dependencies:
- python=3.8
- numpy
We build it with below specified dockerfile.
FROM continuumio/miniconda3
COPY environment.yml .
RUN conda env create -f environment.yml
ENTRYPOINT ["conda", "run", "-n", "example", \
"python", "-c", \
"import numpy; print('success!')"]
In this environment, we install Python 3.8 and NumPy, and when we run the image it imports NumPy to make sure everything is working. This can bloat upto 950MB. Where is all the disk space being going?
- Conda caches downloaded packages.
- Conda base environment where toolchain is installed takes huge space. For example, when we install
continuumio/miniconda3
, it comes with its own python which we dont intend use.
First problem can be solved by removing those cached files. Second problem is conda specific and hence unavoidable but we can do away with it at runtime.
# The build-stage image:
FROM continuumio/miniconda3 AS build
# Install the package as normal:
COPY environment.yml .
RUN conda env create -f environment.yml
# Install conda-pack:
RUN conda install -c conda-forge conda-pack
# Use conda-pack to create a standalone enviornment
# in /venv:
RUN conda-pack -n example -o /tmp/env.tar && \
mkdir /venv && cd /venv && tar xf /tmp/env.tar && \
rm /tmp/env.tar
# We've put venv in same path it'll be in final image,
# so now fix up paths:
RUN /venv/bin/conda-unpack
# The runtime-stage image; we can use Debian as the
# base image since the Conda env also includes Python
# for us.
FROM debian:buster AS runtime
# Copy /venv from the previous stage:
COPY --from=build /venv /venv
# When image is run, run the code with the environment
# activated:
SHELL ["/bin/bash", "-c"]
ENTRYPOINT source /venv/bin/activate && \
python -c "import numpy; print('success!')"
The above solutions is provided here.
Follow me on twitter @theujjwal9