Containers & Kubernetes

Metrics and Autoscaling


Learning Objectives

  • You know what the metrics server is and you know how to enable it in Minikube.
  • You know how to define resource requests and limits in a pod configuration.
  • You know how to configure automatic scaling in Kubernetes using a HorizontalPodAutoscaler.
  • You understand how the HorizontalPodAutoscaler decides the number of replicas based on the metrics collected.

So far, our focus has been on understanding Kubernetes and working through an example where an application is deployed to a Kubernetes cluster. Kubernetes has plenty of features that allow automatic scaling of applications based on the demand. Here, we look into metrics that are used for scaling and how to configure automatic scaling in Kubernetes.

Metrics for scaling

The kubelet agent that runs on each node collects data about the node and the pods running on it. This data is sent to a metrics server, and exposed through a Metrics API. To collect the data, we need to enable a metrics server addon. For minikube, run the command:

minikube addons enable metrics-server

The output should be similar to the following:

šŸ’”  metrics-server is an addon maintained by Kubernetes. For any concerns contact minikube on GitHub.
You can view the list of minikube maintainers at: https://github.com/kubernetes/minikube/blob/master/OWNERS
    ā–Ŗ Using image registry.k8s.io/metrics-server/metrics-server:v0.7.2
🌟  The 'metrics-server' addon is enabled

The metrics server is added to the kube-system namespace. To verify that the metrics server is running, use the command kubectl get pods -n kube-system. The output should show a pod named metrics-server running.

$ kubectl get pods -n kube-system | grep metrics
metrics-server-7fbb699795-fjmqv    1/1     Running   0               7m26s

With the metrics server running, we can now collect metrics about the pods and nodes in the cluster. The metrics collected include CPU and memory usage, which are used for scaling decisions.

Available resources

By default, Minikube starts as a docker container with default CPU limit set to 2. This is sufficient for our present purposes, but if we want to explicitly test automatic scaling, we can increase the CPU limit by passing a --cpus parameter to the minikube start command.

As an example, the following command would start the Minikube instance with 4 CPUs. This would, however, require removing the existing Minikube cluster.

minikube start --kubernetes-version=v1.32.0 --cpus 4

Resource requests

Each container in a pod can define resource requests and limits. The resource requests are used by the scheduler to find a suitable node for the pod, and the limits are used to prevent a container from using more resources than allowed. The resource requests and limits are defined in the pod configuration file. In the following, we add resource requests for 100m CPU units per container (0.1 CPU units) and limit the resource usage to 200m CPU units per container (0.2 CPU units) — modify the minikube-demo-server-deployment.yaml to match the following:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: minikube-demo-server-deployment
  labels:
    app: minikube-demo-server
spec:
  replicas: 1
  selector:
    matchLabels:
      app: minikube-demo-server
  template:
    metadata:
      labels:
        app: minikube-demo-server
    spec:
      containers:
        - name: minikube-demo-server
          image: minikube-demo-server:1.2
          imagePullPolicy: Never
          ports:
            - containerPort: 8000
          resources:
            requests:
              cpu: 100m
            limits:
              cpu: 200m
          envFrom:
            - configMapRef:
                name: minikube-demo-configmap
            - secretRef:
                name: minikube-demo-secret
          env:
            - name: PGHOST
              valueFrom:
                secretKeyRef:
                  name: minikube-demo-database-cluster-app
                  key: host
                  optional: false
            - name: PGPORT
              valueFrom:
                secretKeyRef:
                  name: minikube-demo-database-cluster-app
                  key: port
                  optional: false
            - name: PGDATABASE
              valueFrom:
                secretKeyRef:
                  name: minikube-demo-database-cluster-app
                  key: dbname
                  optional: false
            - name: PGUSERNAME
              valueFrom:
                secretKeyRef:
                  name: minikube-demo-database-cluster-app
                  key: user
                  optional: false
            - name: PGPASSWORD
              valueFrom:
                secretKeyRef:
                  name: minikube-demo-database-cluster-app
                  key: password
                  optional: false
          volumeMounts:
            - name: data-storage
              mountPath: "/app/data"
      volumes:
        - name: data-storage
          persistentVolumeClaim:
            claimName: minikube-demo-local-persistentvolume-claim

Once adjusted, apply the configuration file.

$ kubectl apply -f k8s/minikube-demo-server-deployment.yaml
deployment.apps/minikube-demo-server-deployment configured

Now, with the metrics server and resource definitions in place, we can check the resource usage of the pods. This can be done with the command kubectl top pods, which lists the pods and their current resource usage.

kubectl top pod
NAME                                                       CPU(cores)   MEMORY(bytes)
minikube-demo-database-cluster-1                           8m           64Mi
minikube-demo-database-cluster-2                           8m           76Mi
minikube-demo-server-deployment-7bb5b4cbf4-pb6rq           7m           20Mi
minikube-demo-server-fetcher-deployment-6548f75dd4-pcjsb   1m           24Mi

Based on the above output, there are four pods. The database cluster pods both use 8m CPU units (0.008 CPU units) and 64Mi and 76Mi of memory, respectively. The server deployment pod uses 7m CPU units (0.007 CPU units) and 20Mi of memory. The fetcher deployment pod uses 1m CPU units (0.001 CPU units) and 24Mi of memory.

While we did not explicitly set limits on memory usage, this could also be done by adding a memory section to the requests and limits section.

Loading Exercise...

Automatically scaling a deployment

To automatically scale a deployment, we configure a HorizontalPodAutoscaler (HPA). HPA is a Kubernetes resource that monitors data from the metrics server and automatically scales the number of pods in a deployment based on observed CPU utilization (or other metrics).

Create a file called minikube-demo-server-deployment-hpa.yaml to the k8s folder with the following content:

apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
  name: minikube-demo-server-deployment-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: minikube-demo-server-deployment
  minReplicas: 1
  maxReplicas: 5
  targetCPUUtilizationPercentage: 10

Above, we define an HPA that targets the deployment minikube-demo-server-deployment, sets the minimum number of replicas to 1, the maximum number of replicas to 5, and the target CPU utilization to 10%. The 10% is a low value to make it easier to see the scaling in action. In production, the target would likely be somewhere around 50-75%, depending on the actual scenario.

Apply the configuration file with the command kubectl apply -f k8s/minikube-demo-server-deployment-hpa.yaml.

$ kubectl apply -f k8s/minikube-demo-server-deployment-hpa.yaml
horizontalpodautoscaler.autoscaling/minikube-demo-server-deployment-hpa created

Now, the deployment is configured to scale automatically based on the CPU utilization. The available scaling deployments can be listed with the command kubectl get hpa.

$ kubectl get hpa
NAME                                  REFERENCE                                    TARGETS       MINPODS   MAXPODS   REPLICAS   AGE
minikube-demo-server-deployment-hpa   Deployment/minikube-demo-server-deployment   cpu: 1%/10%   1         5         1          65s

The output shows that the autoscaling policy is applied to the deployment minikube-demo-server-deployment, the current average CPU utilization is 1% (which is lower than the target of 10%), and the minimum and maximum number of replicas is 1 and 5, respectively. The number of currently running replicas is 1.

Testing automatic scaling

To test the automatic scaling, we can create a k6 script, which we also used for measuring performance. To get the URL of the service to test, we use the command minikube service my-app-service --url.

minikube service my-app-service --url
http://192.168.49.2:32512

With the above URL, a simple test script would be as follows.

import http from "k6/http";

export const options = {
  duration: "30s",
  vus: 10,
};

export default function () {
  http.get("http://192.168.49.2:32512");
}

Let’s save the above file to a folder called k6-tests with the name k6-test.js. Now, with k6 installed, we can run the test with the command k6 run k6-tests/k6-test.js.

k6 run k6-tests/k6-test.js

With the command running in one terminal window, we can open another terminal and check the state of the application. Using the command kubectl get hpa, we can see that the number of replicas is increased to 5.

$ kubectl get hpa
NAME                                  REFERENCE                                    TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
minikube-demo-server-deployment-hpa   Deployment/minikube-demo-server-deployment   100%/10%   1         5         5          18m

The output shows that the CPU utilization is 100% (which is higher than the target of 10%), and the number of replicas has been increased to 5. Scaling does take a while, as the scaling decisions are done based on the metrics collected by the metrics server, which collects data periodically.

Once the k6 tests have finished and we wait for an additional while, we notice that the number of replicas has been decreased back to 1.

$ kubectl get hpa
NAME                                  REFERENCE                                    TARGETS       MINPODS   MAXPODS   REPLICAS   AGE
minikube-demo-server-deployment-hpa   Deployment/minikube-demo-server-deployment   cpu: 1%/10%   1         5         1          22m
Loading Exercise...

HPA Algorithm

The HPA algorithm is based on the following formula:

desiredReplicas = ceil[currentReplicas * ( currentMetricValue / desiredMetricValue )]

The currentMetricValue is the average of the metrics collected from the pods, and the desiredMetricValue is the target value set in the HPA configuration. The currentReplicas is the number of replicas currently running.

As an example, if there would be currently three replicas running, the average CPU utilization would be 50%, and the target CPU utilization would be 10%, the desired number of replicas would be calculated as follows:

desiredReplicas = ceil[3 * (50 / 10)] = ceil[3 * 5] = ceil[15] = 15

The desired number of replicas would be 15, which would be the number of replicas needed to handle the load based on the current CPU utilization. The actual number of replicas would be limited by the maxReplicas set in the HPA configuration.

Loading Exercise...

Loading Exercise...