Metrics and Autoscaling
Learning Objectives
- You know what the metrics server is and you know how to enable it in Minikube.
- You know how to define resource requests and limits in a pod configuration.
- You know how to configure automatic scaling in Kubernetes using a HorizontalPodAutoscaler.
- You understand how the HorizontalPodAutoscaler decides the number of replicas based on the metrics collected.
So far, our focus has been on understanding Kubernetes and working through an example where an application is deployed to a Kubernetes cluster. Kubernetes has plenty of features that allow automatic scaling of applications based on the demand. Here, we look into metrics that are used for scaling and how to configure automatic scaling in Kubernetes.
Metrics for scaling
The kubelet agent that runs on each node collects data about the node and the pods running on it. This data is sent to a metrics server, and exposed through a Metrics API. To collect the data, we need to enable a metrics server addon. For minikube, run the command:
minikube addons enable metrics-server
The output should be similar to the following:
š” metrics-server is an addon maintained by Kubernetes. For any concerns contact minikube on GitHub.
You can view the list of minikube maintainers at: https://github.com/kubernetes/minikube/blob/master/OWNERS
āŖ Using image registry.k8s.io/metrics-server/metrics-server:v0.7.2
š The 'metrics-server' addon is enabled
The metrics server is added to the kube-system
namespace. To verify that the metrics server is running, use the command kubectl get pods -n kube-system
. The output should show a pod named metrics-server
running.
$ kubectl get pods -n kube-system | grep metrics
metrics-server-7fbb699795-fjmqv 1/1 Running 0 7m26s
With the metrics server running, we can now collect metrics about the pods and nodes in the cluster. The metrics collected include CPU and memory usage, which are used for scaling decisions.
By default, Minikube starts as a docker container with default CPU limit set to 2. This is sufficient for our present purposes, but if we want to explicitly test automatic scaling, we can increase the CPU limit by passing a --cpus
parameter to the minikube start
command.
As an example, the following command would start the Minikube instance with 4 CPUs. This would, however, require removing the existing Minikube cluster.
minikube start --kubernetes-version=v1.32.0 --cpus 4
Resource requests
Each container in a pod can define resource requests and limits. The resource requests are used by the scheduler to find a suitable node for the pod, and the limits are used to prevent a container from using more resources than allowed. The resource requests and limits are defined in the pod configuration file. In the following, we add resource requests for 100m CPU units per container (0.1 CPU units) and limit the resource usage to 200m CPU units per container (0.2 CPU units) ā modify the minikube-demo-server-deployment.yaml
to match the following:
apiVersion: apps/v1
kind: Deployment
metadata:
name: minikube-demo-server-deployment
labels:
app: minikube-demo-server
spec:
replicas: 1
selector:
matchLabels:
app: minikube-demo-server
template:
metadata:
labels:
app: minikube-demo-server
spec:
containers:
- name: minikube-demo-server
image: minikube-demo-server:1.2
imagePullPolicy: Never
ports:
- containerPort: 8000
resources:
requests:
cpu: 100m
limits:
cpu: 200m
envFrom:
- configMapRef:
name: minikube-demo-configmap
- secretRef:
name: minikube-demo-secret
env:
- name: PGHOST
valueFrom:
secretKeyRef:
name: minikube-demo-database-cluster-app
key: host
optional: false
- name: PGPORT
valueFrom:
secretKeyRef:
name: minikube-demo-database-cluster-app
key: port
optional: false
- name: PGDATABASE
valueFrom:
secretKeyRef:
name: minikube-demo-database-cluster-app
key: dbname
optional: false
- name: PGUSERNAME
valueFrom:
secretKeyRef:
name: minikube-demo-database-cluster-app
key: user
optional: false
- name: PGPASSWORD
valueFrom:
secretKeyRef:
name: minikube-demo-database-cluster-app
key: password
optional: false
volumeMounts:
- name: data-storage
mountPath: "/app/data"
volumes:
- name: data-storage
persistentVolumeClaim:
claimName: minikube-demo-local-persistentvolume-claim
Once adjusted, apply the configuration file.
$ kubectl apply -f k8s/minikube-demo-server-deployment.yaml
deployment.apps/minikube-demo-server-deployment configured
Now, with the metrics server and resource definitions in place, we can check the resource usage of the pods. This can be done with the command kubectl top pods
, which lists the pods and their current resource usage.
kubectl top pod
NAME CPU(cores) MEMORY(bytes)
minikube-demo-database-cluster-1 8m 64Mi
minikube-demo-database-cluster-2 8m 76Mi
minikube-demo-server-deployment-7bb5b4cbf4-pb6rq 7m 20Mi
minikube-demo-server-fetcher-deployment-6548f75dd4-pcjsb 1m 24Mi
Based on the above output, there are four pods. The database cluster pods both use 8m CPU units (0.008 CPU units) and 64Mi and 76Mi of memory, respectively. The server deployment pod uses 7m CPU units (0.007 CPU units) and 20Mi of memory. The fetcher deployment pod uses 1m CPU units (0.001 CPU units) and 24Mi of memory.
While we did not explicitly set limits on memory usage, this could also be done by adding a
memory
section to therequests
andlimits
section.
Automatically scaling a deployment
To automatically scale a deployment, we configure a HorizontalPodAutoscaler (HPA). HPA is a Kubernetes resource that monitors data from the metrics server and automatically scales the number of pods in a deployment based on observed CPU utilization (or other metrics).
Create a file called minikube-demo-server-deployment-hpa.yaml
to the k8s
folder with the following content:
apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
name: minikube-demo-server-deployment-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: minikube-demo-server-deployment
minReplicas: 1
maxReplicas: 5
targetCPUUtilizationPercentage: 10
Above, we define an HPA that targets the deployment minikube-demo-server-deployment
, sets the minimum number of replicas to 1, the maximum number of replicas to 5, and the target CPU utilization to 10%. The 10% is a low value to make it easier to see the scaling in action. In production, the target would likely be somewhere around 50-75%, depending on the actual scenario.
Apply the configuration file with the command kubectl apply -f k8s/minikube-demo-server-deployment-hpa.yaml
.
$ kubectl apply -f k8s/minikube-demo-server-deployment-hpa.yaml
horizontalpodautoscaler.autoscaling/minikube-demo-server-deployment-hpa created
Now, the deployment is configured to scale automatically based on the CPU utilization. The available scaling deployments can be listed with the command kubectl get hpa
.
$ kubectl get hpa
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
minikube-demo-server-deployment-hpa Deployment/minikube-demo-server-deployment cpu: 1%/10% 1 5 1 65s
The output shows that the autoscaling policy is applied to the deployment minikube-demo-server-deployment
, the current average CPU utilization is 1% (which is lower than the target of 10%), and the minimum and maximum number of replicas is 1 and 5, respectively. The number of currently running replicas is 1.
Testing automatic scaling
To test the automatic scaling, we can create a k6 script, which we also used for measuring performance. To get the URL of the service to test, we use the command minikube service my-app-service --url
.
minikube service my-app-service --url
http://192.168.49.2:32512
With the above URL, a simple test script would be as follows.
import http from "k6/http";
export const options = {
duration: "30s",
vus: 10,
};
export default function () {
http.get("http://192.168.49.2:32512");
}
Letās save the above file to a folder called k6-tests
with the name k6-test.js
. Now, with k6 installed, we can run the test with the command k6 run k6-tests/k6-test.js
.
k6 run k6-tests/k6-test.js
With the command running in one terminal window, we can open another terminal and check the state of the application. Using the command kubectl get hpa
, we can see that the number of replicas is increased to 5.
$ kubectl get hpa
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
minikube-demo-server-deployment-hpa Deployment/minikube-demo-server-deployment 100%/10% 1 5 5 18m
The output shows that the CPU utilization is 100% (which is higher than the target of 10%), and the number of replicas has been increased to 5. Scaling does take a while, as the scaling decisions are done based on the metrics collected by the metrics server, which collects data periodically.
Once the k6 tests have finished and we wait for an additional while, we notice that the number of replicas has been decreased back to 1.
$ kubectl get hpa
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
minikube-demo-server-deployment-hpa Deployment/minikube-demo-server-deployment cpu: 1%/10% 1 5 1 22m
HPA Algorithm
The HPA algorithm is based on the following formula:
desiredReplicas = ceil[currentReplicas * ( currentMetricValue / desiredMetricValue )]
The currentMetricValue
is the average of the metrics collected from the pods, and the desiredMetricValue
is the target value set in the HPA configuration. The currentReplicas
is the number of replicas currently running.
As an example, if there would be currently three replicas running, the average CPU utilization would be 50%, and the target CPU utilization would be 10%, the desired number of replicas would be calculated as follows:
desiredReplicas = ceil[3 * (50 / 10)] = ceil[3 * 5] = ceil[15] = 15
The desired number of replicas would be 15, which would be the number of replicas needed to handle the load based on the current CPU utilization. The actual number of replicas would be limited by the maxReplicas
set in the HPA configuration.