Scaling applications manually

Learning objectives

You know how to manually scale applications with Kubernetes.

Kubernetes supports both manual and automatic scaling of applications. Manual scaling is done by adjusting the number of replicas in the deployment (similar to what we previously did with Docker Compose). Automatic scaling, on the other hand, requires collecting metrics from the application and using them to adjust the number of replicas.

We continue here from our first Kubernetes deployment -- to start it up again, we launch minikube and apply the two the configurations.

minikube start
😄  minikube v1.29.0 on Ubuntu 22.04
...
🏄  Done! kubectl is now configured to use "minikube" cluster and "default" namespace by default
kubectl apply -f kubernetes/my-app-deployment.yaml,kubernetes/my-app-service.yaml
deployment.apps/my-app-deployment created
service/my-app-service created

When we list the pods, we see one pod.

kubectl get pods
NAME                                 READY   STATUS    RESTARTS   AGE
my-app-deployment-85bcb74bcd-fpwr4   1/1     Running   0          38s

And, when we ask for url of the service from minikube, we receive an address that responds to requests.

minikube service my-app-service --url
http://192.168.49.2:32512

curl http://192.168.49.2:32512
Hello from server: 3746
curl http://192.168.49.2:32512
Hello from server: 3746

Manual scaling of applications with Kubernetes is easy. We can either use the kubectl scale command to set a number of replicas, or adjust the configuration file to specify a number of replicas.

Using kubectl scale

Let's first try the kubectl scale command.

The kubectl scale command takes the number of replicas and the configuration file of the deployment (or the name of the deployment) as arguments. In our case, we wish to adjust the deployment that was created from the my-app-deployment.yaml file -- let's create two replicas for the deployment.

kubectl scale --replicas=2 -f kubernetes/my-app-deployment.yaml 
deployment.apps/my-app-deployment scaled

Now, when we list the pods, we see two pods instead of one. As you notice, one of them is older, while one of them has been recently started.

ubectl get pods
NAME                                 READY   STATUS    RESTARTS   AGE
my-app-deployment-85bcb74bcd-fpwr4   1/1     Running   0          10m
my-app-deployment-85bcb74bcd-wtbnc   1/1     Running   0          33s

As our service has been created as a load balancer, Kubernetes takes care of balancing the load between the pods for us. When we send requests to the server, we notice that the response is coming from both pods (our application creates a random number that it always includes in the responses).

curl http://192.168.49.2:32512
Hello from server: 8320
curl http://192.168.49.2:32512
Hello from server: 8320
curl http://192.168.49.2:32512
Hello from server: 3746
curl http://192.168.49.2:32512
Hello from server: 8320
curl http://192.168.49.2:32512
Hello from server: 3746
curl http://192.168.49.2:32512
Hello from server: 3746

With this approach, our configuration file has not changed however, and our deployment configuration is still as follows.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app-deployment
  labels:
    app: my-app
spec:
  selector:
    matchLabels:
      app: my-app
  template:
    metadata:
      labels:
        app: my-app
    spec:
      containers:
        - name: my-app
          image: my-app:latest
          imagePullPolicy: Never
          ports:
            - containerPort: 7777

Defining replicas in configuration

An alternative approach to manual scaling is adjusting the configuration file to include a number of replicas, and then applying the configuration file again. The number of replicas is defined in the spec section of the configuration file -- if the value is not set, the value one is used by default. Below, we adjust the deployment configuration, setting the number of replicas to three.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app-deployment
  labels:
    app: my-app
spec:
  replicas: 3
  selector:
    matchLabels:
      app: my-app
  template:
    metadata:
      labels:
        app: my-app
    spec:
      containers:
        - name: my-app
          image: my-app:latest
          imagePullPolicy: Never
          ports:
            - containerPort: 7777

Now, if we apply the above configuration and list the pods again, we see that there are now three pods.

kubectl apply -f kubernetes/my-app-deployment.yaml 
deployment.apps/my-app-deployment configured
kubectl get pods
NAME                                 READY   STATUS    RESTARTS   AGE
my-app-deployment-85bcb74bcd-fpwr4   1/1     Running   0          21m
my-app-deployment-85bcb74bcd-g7d6j   1/1     Running   0          5s
my-app-deployment-85bcb74bcd-wtbnc   1/1     Running   0          12m

Similarly to before, the requests to the pods are balanced. Now, when we make requests to the servers, we see that there is a third server that is responding to the requests.

curl http://192.168.49.2:32512
Hello from server: 3746
curl http://192.168.49.2:32512
Hello from server: 8320
curl http://192.168.49.2:32512
Hello from server: 8320
curl http://192.168.49.2:32512
Hello from server: 5831
curl http://192.168.49.2:32512
Hello from server: 3746
curl http://192.168.49.2:32512
Hello from server: 3746

Configuration files over command line use

When we scale the deployment using the kubectl scale command, the changes are not reflected in the configuration file. For the purposes of maintaining a specific scale, it is meaningful to modify the configuration file and use it. This way, we can always use the same configuration file to create deployments, which can also be stored into e.g. a Git repository.