Scaling applications automatically

Learning objectives

You know how to automatically scale applications with Kubernetes.

Kubernetes also supports automatic scaling of applications. Similar to manual scaling, there is the option to use kubectl to provide scaling details and the option to adjust a configuration file to specify the scaling.

Scaling with kubectl

Let's first try the kubectl autoscale command.

The kubectl autoscale command takes the configuration file of the deployment (or the name of the deployment), the minimum and maximum number of replicas, and the target average CPU utilization. In our case, we again wish to adjust the deployment that was created from the my-app-deployment.yaml file. Let's create an autoscaling policy where the minimum number of replicas is 1, the maximum number of replicas is 5, and the target average CPU utilization is 5%.

kubectl autoscale -f kubernetes/my-app-deployment.yaml --min=1 --max=5 --cpu-percent=5
horizontalpodautoscaler.autoscaling/my-app-deployment autoscaled

We intentionally use a low average CPU utilization of 5% to make it easier to see the scaling in action. In production, the target would likely be somewhere around 50-75%, of course depending on the actual scenario.

The available scaling deployments can be listed with the command kubectl get hpa.

kubectl get hpa
NAME                REFERENCE                      TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
my-app-deployment   Deployment/my-app-deployment   1%/5%     1         5         1          3m27s

The output shows that the autoscaling policy is applied to the deployment my-app-deployment, the current average CPU utilization is 1% (which is lower than the target of 5%), and the minimum and maximum number of replicas is 1 and 5, respectively. The number of currently running replicas is 1.

Testing automatic scaling

To test the automatic scaling, we can create a k6 script, which we also used for measuring performance. To get the URL of the service to test, we use the command minikube service my-app-service --url.

minikube service my-app-service --url
http://192.168.49.2:32512

With the above URL, a simple test script would be as follows.

import http from "k6/http";

export const options = {
  duration: "30s",
  vus: 10,
};

export default function () {
  http.get("http://192.168.49.2:32512");
}

Let's save the above file to a folder called k6-tests with the name k6-test.js. Now, with k6 installed, we can run the test with the command k6 run k6-tests/k6-test.js.

k6 run k6-tests/k6-test.js

With the command running in one terminal window, we can open another terminal and check the state of the application. Using the command kubectl get hpa, we can see that the number of replicas is increased to 5.

kubectl get hpa
NAME                REFERENCE                      TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
my-app-deployment   Deployment/my-app-deployment   100%/5%   1         5         5          18m

Scaling takes a while. The scaling decisions are done based on the metrics collected by the metrics server, which collects data periodically.

Once the k6 tests have finished and we wait for an additional while, we notice that the number of replicas has been decreased to 1.

kubectl get hpa
NAME                REFERENCE                      TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
my-app-deployment   Deployment/my-app-deployment   1%/5%     1         5         1          22m

To clean up the autoscaling configuration, we run the command kubectl delete hpa my-app-deployment.

kubectl delete hpa my-app-deployment
horizontalpodautoscaler.autoscaling "my-app-deployment" deleted

Configuring automatic scaling

Automatic scaling is configured in a separate file that defines the deployment to scale and the scaling targets. The example below outlines this; we scale a deployment called my-app-deployment, defining the minimum number of replicas as 1, the maximum number of replicas as 5, and set the target CPU utilization rate to 10%.

apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
  name: my-app-deployment-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app-deployment
  minReplicas: 1
  maxReplicas: 5
  targetCPUUtilizationPercentage: 10

Save the above contents to a file called my-app-deployment-hpa.yaml and place it to the kubernetes folder. To apply the configuration, we run the command kubectl apply -f kubernetes/my-app-deployment-hpa.yaml.

kubectl apply -f kubernetes/my-app-deployment-hpa.yaml 
horizontalpodautoscaler.autoscaling/my-app-deployment-hpa created

Now, when we check the status of the autoscaling configuration, we see that the configuration is applied to the deployment my-app-deployment.

kubectl get hpa
NAME                    REFERENCE                      TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
my-app-deployment-hpa   Deployment/my-app-deployment   1%/10%    1         5         1          59s

HorizontalPodAutoscaler

The term HPA that we have used a few times is an shorthand for HorizontalPodAutoscaler. The HorizontalPodAutoscaler is a Kubernetes resource and controller that reads in metrics (from the metrics server) and adjusts the deployment to match the need (within the bounds of our given configuration).

Here, your task is to read the Kubernetes documentation on Horizontal Pod Autoscaling at https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/ and to create one question on it.

For writing the question, refer also to the notes on good questions.

Write the question using the widget shown below.

Question not found or loading of the question is still in progress.

Once you have created the question, answer three or more peer-authored questions below. After each question, you are given a possibility to rate the question -- please, rate each question that you answer.

Question not found or loading of the question is still in progress.

Vertical scaling

Note that although we have focused on horizontal scaling, vertical scaling is also possible. The VerticalPodAutoscaler can be used to achieve this.