Scaling applications automatically
Learning objectives
- You know how to automatically scale applications with Kubernetes.
Kubernetes also supports automatic scaling of applications. Similar to manual scaling, there is the option to use kubectl to provide scaling details and the option to adjust a configuration file to specify the scaling.
Scaling with kubectl
Let's first try the kubectl autoscale command.
The kubectl autoscale
command takes the configuration file of the deployment (or the name of the deployment), the minimum and maximum number of replicas, and the target average CPU utilization. In our case, we again wish to adjust the deployment that was created from the my-app-deployment.yaml
file. Let's create an autoscaling policy where the minimum number of replicas is 1, the maximum number of replicas is 5, and the target average CPU utilization is 5%.
kubectl autoscale -f kubernetes/my-app-deployment.yaml --min=1 --max=5 --cpu-percent=5
horizontalpodautoscaler.autoscaling/my-app-deployment autoscaled
We intentionally use a low average CPU utilization of 5% to make it easier to see the scaling in action. In production, the target would likely be somewhere around 50-75%, of course depending on the actual scenario.
The available scaling deployments can be listed with the command kubectl get hpa
.
kubectl get hpa
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
my-app-deployment Deployment/my-app-deployment 1%/5% 1 5 1 3m27s
The output shows that the autoscaling policy is applied to the deployment my-app-deployment
, the current average CPU utilization is 1% (which is lower than the target of 5%), and the minimum and maximum number of replicas is 1 and 5, respectively. The number of currently running replicas is 1.
Testing automatic scaling
To test the automatic scaling, we can create a k6 script, which we also used for measuring performance. To get the URL of the service to test, we use the command minikube service my-app-service --url
.
minikube service my-app-service --url
http://192.168.49.2:32512
With the above URL, a simple test script would be as follows.
import http from "k6/http";
export const options = {
duration: "30s",
vus: 10,
};
export default function () {
http.get("http://192.168.49.2:32512");
}
Let's save the above file to a folder called k6-tests
with the name k6-test.js
. Now, with k6 installed, we can run the test with the command k6 run k6-tests/k6-test.js
.
k6 run k6-tests/k6-test.js
With the command running in one terminal window, we can open another terminal and check the state of the application. Using the command kubectl get hpa
, we can see that the number of replicas is increased to 5.
kubectl get hpa
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
my-app-deployment Deployment/my-app-deployment 100%/5% 1 5 5 18m
Scaling takes a while. The scaling decisions are done based on the metrics collected by the metrics server, which collects data periodically.
Once the k6 tests have finished and we wait for an additional while, we notice that the number of replicas has been decreased to 1.
kubectl get hpa
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
my-app-deployment Deployment/my-app-deployment 1%/5% 1 5 1 22m
To clean up the autoscaling configuration, we run the command kubectl delete hpa my-app-deployment
.
kubectl delete hpa my-app-deployment
horizontalpodautoscaler.autoscaling "my-app-deployment" deleted
Configuring automatic scaling
Automatic scaling is configured in a separate file that defines the deployment to scale and the scaling targets. The example below outlines this; we scale a deployment called my-app-deployment
, defining the minimum number of replicas as 1, the maximum number of replicas as 5, and set the target CPU utilization rate to 10%.
apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
name: my-app-deployment-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: my-app-deployment
minReplicas: 1
maxReplicas: 5
targetCPUUtilizationPercentage: 10
Save the above contents to a file called my-app-deployment-hpa.yaml
and place it to the kubernetes
folder. To apply the configuration, we run the command kubectl apply -f kubernetes/my-app-deployment-hpa.yaml
.
kubectl apply -f kubernetes/my-app-deployment-hpa.yaml
horizontalpodautoscaler.autoscaling/my-app-deployment-hpa created
Now, when we check the status of the autoscaling configuration, we see that the configuration is applied to the deployment my-app-deployment
.
kubectl get hpa
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
my-app-deployment-hpa Deployment/my-app-deployment 1%/10% 1 5 1 59s
HorizontalPodAutoscaler
The term HPA that we have used a few times is an shorthand for HorizontalPodAutoscaler. The HorizontalPodAutoscaler is a Kubernetes resource and controller that reads in metrics (from the metrics server) and adjusts the deployment to match the need (within the bounds of our given configuration).
Here, your task is to read the Kubernetes documentation on Horizontal Pod Autoscaling at https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/ and to create one question on it.
For writing the question, refer also to the notes on good questions.
Write the question using the widget shown below.
Question not found or loading of the question is still in progress.
Once you have created the question, answer three or more peer-authored questions below. After each question, you are given a possibility to rate the question -- please, rate each question that you answer.
Question not found or loading of the question is still in progress.
Vertical scaling
Note that although we have focused on horizontal scaling, vertical scaling is also possible. The VerticalPodAutoscaler can be used to achieve this.