Containers & Kubernetes

Docker Fundamentals


Learning Objectives

  • You know how Docker works and know how to build and run Docker images.
  • You understand how Docker images, containers, volumes, and networks work.
  • You understand the connection between Docker and Docker Compose.
  • You know how to clean up Docker resources.

Docker and docker engine

Docker uses client-server architecture. The Docker daemon (dockerd) runs on the host machine and manages images, containers, networks, and storage. Interaction with the daemon happens through the Docker client (docker). The daemon builds and runs containers, and communicates with Docker registries (e.g. Docker Hub) to pull or push container images.

Docker uses containerd to manage containers and to handle their lifecycle actions (create, start, stop, etc.). Containerd delegates the work to runC, which is a lower-level runtime that can be used to create isolated Linux processes. The high-level architecture is shown in Figure 1.

Figure 1 — Docker’s client-server architecture. Docker CLI sends commands to the Docker daemon, which manages images and containers.

For example, when starting a container, the Docker client sends a request to the daemon, which then creates and starts a container with the help of containerd and runC. The container uses an image that has the necessary information to run the application.

Loading Exercise...

Docker images

Docker images are packaged snapshots of an application. Images include the application code or binaries, runtime, libraries, and file system with configuration. Docker images are identified by a unique identifier (a hash) and typically a human-readable name and tag (e.g., myapp:1.0).

Images can be listed with docker images and inspected with docker inspect. Under the hood, images are stored on the host’s filesystem.

By default, in Linux, Docker image data is stored under /var/lib/docker/overlay2/.

Technically, an image is a read-only multilevel filesystem that is formed from a series of layers. This mechanism allows having multiple images where (some of) the underlying layers are shared.

Dockerfile and layers

Docker images are built from a Dockerfile, which is a text file that contains a series of instructions for creating the image. Dockerfile might look as follows:

FROM denoland/deno:alpine-2.0.2

WORKDIR /app

COPY deno.json .

RUN DENO_FUTURE=1 deno install

COPY . .

CMD [ "run", "--allow-env", "--allow-net", "--watch", "app-run.js" ]

Each instruction (e.g. FROM, WORKDIR, COPY, RUN, etc.) in a Dockerfile creates a new layer that builds on top of the previous one. The layers are cached: whenever an image is built, Docker can reuse layers if the instructions up to the specific layer have not changed. This speeds up the build process.

The caching mechanism is based on commands and contents of files, i.e. if the contents of a file change, the cache is invalidated.

For the above image, if we would e.g. change the contents of the deno.json file, Docker would rebuild the image starting from the COPY deno.json . instruction, but would reuse the layers up to that point. This is why it is common to put instructions that change less frequently (e.g. installing dependencies) at the top of the Dockerfile.

Building an image

Images are built with the command docker build -t myapp:latest ., where -t tags the image with a name and tag, and . specifies the build context (the directory containing the Dockerfile). The tag latest is a common tag for the most recent version of an image.

It is sensible to use versioned tags for stability in production, e.g. myapp:1.0.

Images are stored in public and private registries and on the local host machine. When you reference an image with docker run or in a Dockerfile using FROM, Docker pulls it from a registry and caches it.

It is also possible to push images to a registry.

Loading Exercise...

Docker containers

A Docker container is a running (or stopped) instance of an image. For example, docker run myapp:latest will start a container from the image. The container will run the command specified in the Dockerfile (CMD instruction).

The image must be compatible with the host’s architecture and OS. For example, an image built for Linux will not run on Windows.

When you run a container from an image, Docker adds a thin writable layer on top of the image layers to allow the container to write data. Each container runs in isolation from the host system. It has its own filesystem, network, and process space. Containers can be stopped, started, paused, and removed. When a container is removed, any data within the writable layer of the container is lost.

Common container lifecycle commands are shown in the table below.

CommandDescription
docker runCreate and start a new container
docker stopStop a running container
docker startStart an existing (stopped) container
docker rmRemove a container
docker psList running containers
docker ps -aList all containers (running and stopped)

The commands also take arguments, where the last argument is typically the image name. For example, docker run -d myapp:latest starts a container in detached mode (in the background).

While running, the standard output and error output of the container are captured by Docker. You can view the logs with docker logs <container>. Similarly, the resource usage of a container can be monitored with docker stats. If the main process crashes or exits, the container stops with an exit code.

Loading Exercise...

Docker volumes

Docker volumes are used to persist data. As mentioned earlier, when a container is removed, its writable layer is also discarded. This means that any changes made in the writable layer are lost. Volumes provide a way to store data outside the container’s filesystem, typically on the host filesystem, and mount it into the container.

A mounted volume appears as a directory inside the container, although in reality, the data is stored on the host.

There are two main types of volumes in Docker, named (or managed) volumes and bind mounts.

Named volumes

Named volumes are created and managed by Docker, and Docker chooses where to store them (usually under /var/lib/docker/volumes/). Named volumes can be created with the command docker volume create mydata, and the volume can be mounted to a container using the -v flag when starting the container, e.g.

docker run -v mydata:/path/in/container myapp:latest

Named volumes abstract away the host path, making it easier to share volumes without worrying about the exact host directory.

Bind mounts

Bind mounts allow mounting a specific directory or file from the host filesystem into a container. Bind mounts are also used with the -v flag. Instead of providing a name of the volume, we provide the local path, e.g.

docker run -v /path/on/host:/path/in/container myapp:latest

The above command mounts the host directory /path/on/host into the container at /path/in/container.

Volumes and applications

Using volumes also requires some knowledge of how the applications handle their files. As an example, if we would want to bind the data from PostgreSQL to a volume, we would have to know that PostgreSQL stores its data in the /var/lib/postgresql/data directory within the container. That is, to persist the data, we would run the PostgreSQL container with a volume mounted to that directory, e.g.

docker run -v /path/on/host:/var/lib/postgresql/data postgres:latest
Loading Exercise...

Docker networks

Containerized web applications often consist of multiple services (for example, an API and a database) that need to communicate. Docker has a networking feature that allows creating a network through which containers can communicate with each others. There are three types of networks:

  • none - A network that isolates a container completely, providing no external network access.

  • bridge - The default network for containers. If a container is run without specifying a network, the container will be in the bridge network. Containers in the same bridge network can communicate with each other.

  • host - Host network mode shares the network stack with the host computer. In Linux, that means the container process uses host’s IP addresses directly.

For web development, the most common approach is using user-defined bridge networks, which allow containers to communicate with each other by name. That is, when a container is attached to a user-defined network, it can refer to other containers by their service name — this way, there’s no need to hardcode IP addresses that can change.

Creating and using networks

Networks are created with the command docker network create mynetwork, where mynetwork is the name of the network. By default, the type of the network is bridge. When running a container, it can be given a name with the --name flag, and a network with the --network flag. The name indicates the service name that can be used for communication.

As an example, to create a network and run two containers on that network, we could do:

docker network create -d bridge mynetwork
docker run -d --name backend --network mynetwork myapp/backend:latest
docker run -d --name frontend --network mynetwork -p 8080:80 myapp/frontend:latest

In the above example, the frontend container can communicate with the backend container by using the hostname backend, while the frontend container is exposed to the host machine on port 8080.

Individual containers can be attached to multiple networks if needed.

Networks and connectivity

When running multiple containers on a network, it is possible to expose services to the outside world by publishing ports of individual containers. This is done with the -p flag when running a container. The flag maps a port on the host machine to a port on the container, where the ports are separated by a colon.

In general, only containers that need to be accessed from the outside world should have their ports published. For internal communication between containers, it is sufficient to have them on the same network.

Exposing only necessary ports reduces the attack surface of the application.

As the ports are exposed from the host machine, it is necessary to ensure that there are no port conflicts. If two containers try to use the same port on the host machine, one of them will fail to start.

At the same time, there are no restrictions on using a specific port within a container. If two containers both have a server that responds to requests on the port 8080, it is completely fine, as the containers have their own local network. That is, the port 8080 in one container is not the same as the port 8080 in another container.

Often, load balancers like Traefik are used to route traffic to the correct container based on the request.

Loading Exercise...

Docker compose

Docker compose is a tool that simplifies the management of multi-container applications. It uses a configuration file (e.g. compose.yaml) to configure the application’s services, networks, and volumes. The following example shows a configuration that has a PostgreSQL database and a web server.

services:
  server:
    build: server
    restart: unless-stopped
    env_file:
      - project.env
    volumes:
      - ./server:/app

  database:
    image: postgres:17.0
    restart: unless-stopped
    env_file:
      - project.env
    volumes:
      - ./pgdata:/var/lib/postgresql/data

Above, the configuration defines two services, server and database. The server service is built from the server directory, and the database service uses the postgres:17.0 image. The volumes section mounts the local directories to the containers, and the env_file section loads environment variables from a file.

The above configuration also assumes that /app is defined as the working directory for the server service.

By default, Docker compose creates a user-defined bridge network for the services, so they can communicate with each other by name. If there are two projects that have their own compose.yaml files, running them leads to separate networks, ensuring isolation.

Docker compose can also be used to set resource constraints for services, such as limiting the amount of memory or CPU a service can use. This is done by adding a deploy section to the service configuration. Below, the server service is limited to using at most 1GB of memory.

services:
  server:
    build: server
    restart: unless-stopped
    deploy:
      resources:
        limits:
          memory: 1G
    env_file:
      - project.env
    volumes:
      - ./server:/app

There are plenty of advantages to using Docker compose over plain Docker commands:

  • simplified multi-container management — multiple services can be started and stopped with a single command.
  • documentation and configuration-as-code — the compose.yaml file serves as a configuration file for the project, which can be checked into version control.
  • isolation — networks or volumes are not mixed with other projects.
  • reusability and extensibility — Compose files can be shared or overridden for different environments.
Docker compose and systemd

In a local Linux machine, a Dockerized application can be deployed e.g. through systemd, which is a system and service manager for Linux. Systemd can be used to start, stop, and manage services, and it can be configured to start Docker containers as services — or the full Docker compose application as a service. With systemd, the application can be started automatically on boot and managed like any other system service.

Cleaning up resources

Docker can accumulate stopped containers, unused images, and volumes over time. To check how much disk space Docker is using, run the command docker system df. The command shows the disk usage of images, containers, and volumes.

To clean up unused resources, run the command docker system prune. The command removes stopped containers, unused images, and networks. To remove all unused resources, the command can be run with the -a flag, e.g. docker system prune -a.

Be cautious when running prune commands, as they remove resources permanently.

Further, to remove volumes, a flag --volumes can be added to the docker system prune command. Alternatively, the docker volume prune command can be used.

Overall, it is sensible to cleanup regularly to keep the system clean and to avoid running out of disk space.

Loading Exercise...