Server-Side Scalability

Statelessness and Horizontal Scaling with Containers

Learning Objectives

You revisit the concepts of stateful and stateless applications.
You know how to scale a web application horizontally using Docker.
You know some of the reasons why containers are used for scalability.

Stateful and Stateless Applications

Stateful applications maintain state across multiple requests or interactions, while stateless applications treat each request as an independent transaction. HTTP is a stateless, which means that each request is independent of any previous request at the protocol level.

There are a range of applications that need to maintain state across requests. Real-time chat applications, gaming servers, financial transactions, and applications that provide personalized user experiences are a few examples. Even this platform needs to maintain state across requests to keep track of your progress through the course.

Maintaining state with HTTP effectively means adding information to the request or response that allows the server to recognize the client and maintain context across requests. This can be done using cookies, sessions, or tokens.

Cookies are small pieces of data stored in the client’s browser that are sent with each request to the server. They can be used to store session IDs, user preferences, or other information.
Sessions store session data on the server, either in-memory or in external session stores like databases or caching systems. The server generates a session ID that is sent to the client, and the client sends the session ID with each request to the server. This is often handled with cookies.
Tokens, such as JSON Web Tokens (JWTs), are self-contained tokens that store session data. They are sent with each request to the server and can be used to manage state across requests. Tokens are signed with a secret key known only to the server, which allows verifying the token’s authenticity.

Tokens are often used in stateless applications to maintain state across requests without the need for server-side storage. This makes them ideal for scaling applications horizontally, as each request can be handled by any server in the cluster without relying on shared state.

Loading Exercise...

As discussed in the Web Software Development course, JWT tokens can also be sent as cookie values. As cookies are sent with each request by the browser, using JWTs as cookie values can reduce the need for additional headers or query parameters to send the token with each request.

Loading Exercise...

Horizontal Scaling with Docker

Sooner or later, when scaling web software to meet a growing demand, there is a need for horizontal scaling.

Classic horizontal scaling

As a classic example of horizontal scaling, the following frame contains a snapshot of NASA’s website from late 1996 when people were actively looking for information about the Mars Pathfinder Mission.

To meet a high demand, NASA’s website provided a list of links to servers, where each server hosted a copy of the actual NASA website (i.e. a mirror of the original site). Jointly, the listed servers could handle slightly more than 80 million hits per day, where the largest individual server at Silicon Graphics could handle approximately 20 million hits per day. The term “hit” meant an individual page load, including loading resources on the page (e.g. images). When averaged out, the 80 million hits per day corresponds to approximately 926 hits per second.

Traffic on modern websites? 🤔

The company Similarweb offers insight into the traffic that websites receive. Based on their data, in December 2024, the most popular website in the world was google.com, receiving approximately 83.3 billion visits during December 2024. The second most popular website was youtube.com with some 30.3 billion visits, and the third most popular website was facebook.com with some 12.3 billion visits.

Visits to modern websites are rather different to visits to websites in the 1990s. In the 1990s, when servers were mainly responsible for delivering static data, a page hit consisted of loading a page and its contents. With modern web sites, a single page visit may consist of hundreds of requests, and modern web applications also offer dynamic functionality such as streaming search results, media, or other types of content to the users. When considering the terms “visit” and “hit”, a single visit may include traversing multiple pages, where each page load could be considered a “hit” in the traditional terminology.

In practice, the 83.3 billion visits per month received by Google in December 2024 would not have been handled even by a thousandfold increase in the number of servers similar to those that were used to handle the 80 million hits per day in the 1990s.

Horizontal scaling with Docker

Let’s again take a peek at the walking skeleton that we worked on in the previous part. The walking skeleton has a service called server that responds to requests at port 8000. The service server is defined in the compose.yaml file as follows.

services:
# ... additional content
  server:
    build: server
    restart: unless-stopped
    volumes:
      - ./server:/app
    ports:
      - 8000:8000
    env_file:
      - project.env
    depends_on:
      - database
# ... additional content

The folder server — relative to the compose.yaml file — is used as a volume for the service server within the Docker image. The definition also contains a port mapping; the port 8000 will be used to expose the port 8000 from within the service.

Docker Compose comes with a configuration related to the deployment and running of services. Using deploy configuration, we can specify a number of container replicas for a service.

Let’s adjust the configuration for the service server to match the following. That is, using the deploy configuration, we state that the service server has a single replica.

services:
# ... additional content
  server:
    build: server
    restart: unless-stopped
    volumes:
      - ./server:/app
    ports:
      - 8000:8000
    env_file:
      - project.env
    depends_on:
      - database
    deploy:
      replicas: 1
# ... additional content

When you run the command docker compose up, the application starts up as expected and responds to requests at port 8000.

Adding replicas

To scale the application horizontally, we increase the number of replicas for the service server. Change the number of replicas to two and run the command docker compose up again.

services:
# ... additional content
  server:
    build: server
    restart: unless-stopped
    volumes:
      - ./server:/app
    ports:
      - 8000:8000
    env_file:
      - project.env
    depends_on:
      - database
    deploy:
      replicas: 2
# ... additional content

Now, the application tries to start up, but fails. Upon trying to start the second server, Docker is unable to allocate a port to it, as the port is already in use by the first server.

docker compose up
// ..
Error response from daemon: driver failed programming
  external connectivity on endpoint dab-walking-skeleton-server-1 (...):
  Bind for 0.0.0.0:8000 failed: port is already allocated

This makes sense — each port can be used by only one service at a time.

Allocating port ranges

Docker allows allocating port ranges for a service. A port range is given as a string, where the range is defined as "<start>-<end>:<port>", where start and end indicate the external ports that can be used for the service, and port indicates the internal port that the service uses.

Modify the configuration for the service server to use a port range for the service. The following configuration uses the port range 8000-8001:8000, which means that the service server can use either port 8000 or 8001 for exposing the port 8000 from within the service.

services:
# ... additional content
  server:
    build: server
    restart: unless-stopped
    volumes:
      - ./server:/app
    ports:
      - 8000-8001:8000
    env_file:
      - project.env
    depends_on:
      - database
    deploy:
      replicas: 2
# ... additional content

Now, when we start the server, the application starts up successfully, and we can access the application at both ports 8000 and 8001. The port 8000 is used by the first server, and the port 8001 is used by the second server.

curl localhost:8000
{"message":"Hello world!"}%
curl localhost:8001
{"message":"Hello world!"}%

Loading Exercise...

Verifying the replicas

With this, we have two replicas of the server. To verify that they are indeed replicas, let’s adjust the app.js file, and add a middleware that adds a header indicating the server that is responding to the request.

const REPLICA_ID = crypto.randomUUID();

app.use("*", async (c, next) => {
  c.res.headers.set("X-Replica-Id", REPLICA_ID);
  await next();
});

With this in use, when we access the servers at the different ports, we see that the responses have different replica headers, and hence come from different servers.

curl -v localhost:8000
// ...
< x-replica-id: 8d5409f3-be35-40d0-8302-ab94fd68cbff
// ...
{"message":"Hello world!"}%
curl -v localhost:8001
// ...
< x-replica-id: ada3c27b-6f60-4ad1-8c47-59957b45f492
// ...
{"message":"Hello world!"}%

Loading Exercise...

Swarm mode for Docker

As the number of replicas increases, additional servers will be needed for running them. Docker has a Swarm mode that allows managing a cluster of Docker Engines (i.e. a swarm in Docker language). We’ll look into this and similar concepts when looking into container orchestration.

Using Containers for Scalability

Although we started the Web Software Development course with setting up a walking skeleton, one might still wonder why we are using containers for scalability. Alternative options for scaling include using virtual machines and traditional physical servers. Here are some reasons for choosing containers:

Lightweight: Containers are lightweight compared to virtual machines and traditional physical servers, as they share the host system’s operating system kernel. This allows for higher density and faster deployment.
Portability and consistency: Containers are portable and can run on any system that supports the container runtime, making them suitable for multi-cloud and hybrid cloud environments. Containers ensure consistency across different environments, simplifying deployment and debugging.
Quick to start: When contrasted with setting up a virtual machine or a traditional physical server, containers are fast to start up, especially when they have been built beforehand. This makes them suitable for situations that need adapting to varying workloads.
Resource efficiency: Containers optimize resource utilization by sharing the host system’s resources, allowing for better resource management and cost savings.

There are also alternatives to containers, such as serverless computing, which abstracts the infrastructure management further. We will look into serverless computing in later parts of the course.

Loading Exercise...

← HTTP and Caching

Traffic Distribution and Load Balancing →