Storage and Operators
Learning Objectives
- You know how to use PersistentVolumes and PersistentVolumeClaims.
- You know of Kubernetes operators.
Persistent storage
As pods are ephemeral, any files written to the container filesystem of a pod is lost when the pod is deleted or when a new pod is scheduled. For applications that need to store data, Kubernetes provides abstractions for persistent storage.
The idea is similar to Docker volumes. With Kubernetes, however, the abstraction is stronger and more configuration-oriented.
There are two main abstractions for persistent storage: PersistentVolumes and PersistentVolumeClaims. A PersistentVolume is a piece of storage in the cluster that has been provisioned by an administrator of the cluster. A PersistentVolumeClaim is a request for storage by a user.
PersistentVolumes are resources, PersistentVolumeClaims are resource requests.
Creating a PersistentVolume
To provision a PersistentVolume, we need to create a PersistentVolume configuration file. Create a file called minikube-demo-persistentvolume.yaml
to the k8s
directory with the following content:
apiVersion: v1
kind: PersistentVolume
metadata:
name: minikube-demo-local-persistentvolume
spec:
storageClassName: "standard"
capacity:
storage: 1Gi
accessModes:
- ReadWriteOnce
hostPath:
path: "/mnt/data"
The above configuration declares a PersistentVolume named minikube-demo-local-persistentvolume
with 1Gi of storage capacity. The storage is set to use a StorageClass “standard” (default in Minikube), and is accessible as a ReadWriteOnce object, meaning it can be mounted by a single node at a time. The storage is backed by a hostPath, which is a directory on the Minikube VM at /mnt/data
.
While hostPath is useful for local development, it is not recommended for production use. In production, one would use a cloud provider’s storage solution or a network storage solution.
Next, apply the configuration to create the PersistentVolume:
kubectl apply -f k8s/minikube-demo-persistentvolume.yaml
You can check the status of the PersistentVolume with the following command:
kubectl get pv
The output is something like the following
NAME CAPACITY ...
minikube-demo-local-persistentvolume 1Gi ...
Now, the cluster has a PersistentVolume available for use.
Creating a PersistentVolumeClaim
Next, we need to create a PersistentVolumeClaim to request storage. Create a file called minikube-demo-persistentvolume-claim.yaml
to the k8s
directory with the following content:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: minikube-demo-local-persistentvolume-claim
spec:
storageClassName: "standard"
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 100Mi
The above configuration declares a PersistentVolumeClaim named minikube-demo-local-persistentvolume-claim
requesting 100Mi of storage capacity. The specification is similar to the PersistentVolume configuration, but this time it is a request for storage.
Next, apply the configuration to create the PersistentVolumeClaim:
kubectl apply -f k8s/minikube-demo-persistentvolume-claim.yaml
You can check the status of the PersistentVolumeClaim with the following command:
kubectl get pvc
The output is something like the following:
NAME STATUS VOLUME ...
minikube-demo-local-persistentvolume-claim Bound pvc-51875849-29d7-4118-b541-47f98ad16eb2 ...
Using the PersistentVolumeClaim
To use a PersistentVolumeClaim in a pod, we need to mount the claim as a volume in the deployment specification. Let’s modify the file minikube-demo-server-deployment.yaml
to include a volume mount for the PersistentVolumeClaim:
apiVersion: apps/v1
kind: Deployment
metadata:
name: minikube-demo-server-deployment
labels:
app: minikube-demo-server
spec:
replicas: 1
selector:
matchLabels:
app: minikube-demo-server
template:
metadata:
labels:
app: minikube-demo-server
spec:
containers:
- name: minikube-demo-server
image: minikube-demo-server:1.1
imagePullPolicy: Never
ports:
- containerPort: 8000
envFrom:
- configMapRef:
name: minikube-demo-configmap
- secretRef:
name: minikube-demo-secret
volumeMounts:
- name: data-storage
mountPath: "/app/data"
volumes:
- name: data-storage
persistentVolumeClaim:
claimName: minikube-demo-local-persistentvolume-claim
Now, the configuration states that the container in the pod should have a volume mounted at /app/data
which is backed by the PersistentVolumeClaim minikube-demo-local-persistentvolume-claim
.
Let’s apply the modified deployment configuration to see whether this holds.
kubectl apply -f k8s/minikube-demo-server-deployment.yaml
Now, when we access the pod, we should see a directory /app/data
which is backed by the PersistentVolumeClaim.
To concretely test this, we can first get the name of the pod, and then exec into the pod to check the directory:
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
minikube-demo-server-deployment-554f9fcf65-vmxtp 1/1 Running 0 104s
minikube-demo-server-fetcher-deployment-6548f75dd4-pcjsb 1/1 Running 0 6m4s
$ kubectl exec -it minikube-demo-server-deployment-554f9fcf65-vmxtp -- /bin/sh
/app # ls -lt
total 24
drwxrwxrwx 2 root root 4096 Mar 13 15:57 data
-rw-r--r-- 1 root root 176 Mar 12 16:14 Dockerfile
-rw-r--r-- 1 root root 325 Mar 12 16:13 app.js
-rw-r--r-- 1 root root 293 Mar 11 13:57 deno.lock
-rw-r--r-- 1 root root 64 Mar 11 11:39 deno.json
-rw-r--r-- 1 root root 52 Mar 11 11:39 app-run.js
As we can see from the above output, the /app/data
directory is present in the pod. Now, the container has the /app/data
directory which is backed by the persistent volume. Any files the app writes there will persist even if the pod is rescheduled or restarted.
Dynamic provisioning
Above, we manually created a PersistentVolume and a PersistentVolumeClaim. In practice, especially when using Kubernetes on the cloud, dynamic provisioning is used. This means that we do not manually create persistent volumes, but instead, we create a persistent volume claim, and Kubernetes automatically creates a persistent volume to satisfy it.
The main change would be changing the StorageClass in the PersistentVolumeClaim configuration to match the class that dynamically provisions the persistent volume. This is cloud vendor specific.
Often, also, when working with cloud vendors, the storage is not local to the cluster, but is network storage provided by the cloud vendor. This is a more robust and scalable solution.
Operators
In much of web development, the goal is to keep the application as stateless as possible. However, there are cases where stateful applications are necessary. For example, databases like PostgreSQL or Redis are stateful services that need to maintain data across restarts. Managing such services in Kubernetes can be however complex. This is where Operators come in.
Kubernetes operators are Kubernetes extensions that work with custom resources to manage applications. They allow automating configuration, deployment, and maintenance of software, and in general help in defining deployable software components for Kubernetes.
For additional information on operators, see the Cloud Native Computing Foundation’s Operator White Paper.
There exists a variety of Kubernetes operators for setting up databases. As an example, for PostgreSQL, there exists multiple operators, including the CloudNativePG, Zalando’s Postgres Operator, Kubegres, Stolon, and Crunchy Data’s PGO. Similarly, for e.g. Redis, there exists a handful of options to choose from, including the official (non-free) Redis Enterprise version, Spotahome’s redis operator, and a redis operator from Opstree solutions.
When using operators, one is typically bound to a specific configuration and a way of doing things, which may not always align with project requirements. Furthermore, like with any external dependency, the use of operators in an unorthodox way may also lead to isses. Regardless, using an operator is sensible when the operator aligns with the project requirements.
As an example of potential challenges, read the Palark blog post Our failure story with Redis operator for K8s.