Scaling databases
Learning objectives
- You know what Kubernetes operators are.
- You know how to set up a PostgreSQL database cluster using a Kubernetes operator.
- You know how to use database migrations for a PostgreSQL cluster.
- You know how to access a PostgreSQL cluster from a Deno application.
The basic elements of scaling databases consists of caching database query results, data denormalization, table partitioning, and making sure that the indexes are in order and that they are fit for the purpose. Eventually, when scaling applications, we are bound to end up with volumes of data (pun intended 🥁) and need to distribute it.
Scaling databases when using Kubernetes does not differ from this. Eventually, we are bound to need to distribute the database data over multiple servers, for example by using primary-secondary replication. Setting this up is not trivial, however, as it requires configuring processes that handle database failures, processes that handle load balancing between servers, and processes that handle the replication of data, not to mention making sure that the data is stored somewhere. As an example of the concrete steps for this in the case of PostgreSQL, see the PostgreSQL documentation on High Availability, Load Balancing, and Replication.
Fortunately, we do not have to set everything up from the scratch.
Kubernetes operators
Kubernetes operators are Kubernetes extensions that work with custom resources to manage applications. They allow automating configuration, deployment, and maintenance of software, and in general help in defining deployable software components for Kubernetes.
For additional information on operators, see the Cloud Native Computing Foundation's Operator White Paper.
There exists a variety of Kubernetes operators for setting up databases. As an example, for PostgreSQL, there exists multiple operators, including the CloudNativePG, Zalando's Postgres Operator, Kubegres, Stolon, and Crunchy Data's PGO. Similarly, for e.g. Redis, there exists a handful of options to choose from, including the official (non-free) Redis Enterprise version, Spotahome's redis operator, and a redis operator from Opstree solutions.
When using operators, one is typically bound to a specific configuration and a way of doing things, which may not always align with project requirements. Furthermore, like with any external dependency, the use of operators in an unorthodox way may also lead to isses. As an example, read the Palark blog post Our failure story with Redis operator for K8s. Regardless, using an operator is a good way to get started.
Using a PostgreSQL operator
Here, we'll briefly look into using the CloudNativePG operator for setting up a PostgreSQL cluster. Before starting, make sure that Minikube is running and that you have the kubectl
command line tool installed.
Installing the operator
To install the operator, we apply the operator configuration from the CloudNativePG repository.
kubectl apply -f https://raw.githubusercontent.com/cloudnative-pg/cloudnative-pg/release-1.19/releases/cnpg-1.19.1.yaml
namespace/cnpg-system created
customresourcedefinition.apiextensions.k8s.io/backups.postgresql.cnpg.io created
customresourcedefinition.apiextensions.k8s.io/clusters.postgresql.cnpg.io created
customresourcedefinition.apiextensions.k8s.io/poolers.postgresql.cnpg.io created
customresourcedefinition.apiextensions.k8s.io/scheduledbackups.postgresql.cnpg.io created
serviceaccount/cnpg-manager created
clusterrole.rbac.authorization.k8s.io/cnpg-manager created
clusterrolebinding.rbac.authorization.k8s.io/cnpg-manager-rolebinding created
configmap/cnpg-default-monitoring created
service/cnpg-webhook-service created
deployment.apps/cnpg-controller-manager created
mutatingwebhookconfiguration.admissionregistration.k8s.io/cnpg-mutating-webhook-configuration created
validatingwebhookconfiguration.admissionregistration.k8s.io/cnpg-validating-webhook-configuration created
This installs the operator and starts a controller that can be used to manage the databases. The operator is installed into the cnpg-system
namespace, which we can take a peek at using kubectl get all -n cnpg-system
.
kubectl get all -n cnpg-system
NAME READY STATUS RESTARTS AGE
pod/cnpg-controller-manager-f9d9ccfb-jmdrx 1/1 Running 1 (18s ago) 59s
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/cnpg-webhook-service ClusterIP 10.102.201.150 <none> 443/TCP 59s
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/cnpg-controller-manager 1/1 1 1 59s
NAME DESIRED CURRENT READY AGE
replicaset.apps/cnpg-controller-manager-f9d9ccfb 1 1 1 59s
Deploying a cluster
With the operator installed, we can deploy a cluster. To do this, we create a configuration that defines a CloudNativePG cluster, and apply the configuration. The following configuration defines a cluster with two instances, and a storage size of 64Mi (64 Mebibytes).
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
name: my-app-database-cluster
spec:
instances: 2
storage:
size: 64Mi
We save the configuration into a file called my-app-database-cluster.yaml
which we place to the kubernetes
folder. Finally, we apply the configuration.
kubectl apply -f kubernetes/my-app-database-cluster.yaml
cluster.postgresql.cnpg.io/my-app-database-cluster created
Once the configuration has been applied, the cluster is deployed. The status of the cluster can be checked with the command kubectl get cluster
-- setting the cluster up can take a while.
kubectl get cluster
NAME AGE INSTANCES READY STATUS PRIMARY
my-app-database-cluster 42s Setting up primary
kubectl get cluster
NAME AGE INSTANCES READY STATUS PRIMARY
my-app-database-cluster 64s Setting up primary
kubectl get cluster
NAME AGE INSTANCES READY STATUS PRIMARY
my-app-database-cluster 106s 2 2 Cluster in healthy state my-app-database-cluster-1
Now, we have a database cluster that is up and running.
Storage size
Above, we created the storage with 64 Mebibytes, which is not a large amount of data. In actual production use, the storage size would naturally be much larger. When considering working with databases, one of the painpoints can be migrating to a larger disk when the space runs out. CloudNativePG supports this -- for additional information, refer to their documentation on Volume expansion.
Peeking into the database
CloudNativePG has a kubectl
plugin that can be used to connect to the database cluster with psql
. To install the plugin, follow the guidelines on the CloudNativePG Plugin documentation page. With the plugin installed, we can check the status of our cluster with the command kubectl cnpg status my-app-database-cluster
, where my-app-database-cluster
is the name of the cluster. The command prints information about the cluster certificates, instances, replication status, and so on.
kubectl cnpg status my-app-database-cluster
Cluster Summary
Name: my-app-database-cluster
Namespace: default
System ID: ..
PostgreSQL Image: ghcr.io/cloudnative-pg/postgresql:15.2
Primary instance: my-app-database-cluster-1
Status: Cluster in healthy state
Instances: 2
Ready instances: 2
Current Write LSN: 0/5000000 (Timeline: 1 - WAL File: 000000010000000000000004)
Certificates Status
..
Streaming Replication status
..
Instances status
Name Database Size Current LSN Replication role Status QoS Manager Version Node
---- ------------- ----------- ---------------- ------ --- --------------- ----
my-app-database-cluster-1 29 MB 0/5000000 Primary OK BestEffort 1.19.1 minikube
my-app-database-cluster-2 29 MB 0/5000000 Standby (async) OK BestEffort 1.19.1 minikube
The plugin also allows us to connect to the database, which would allow us to look into the data in the database. There's nothing yet there though.
kubectl cnpg psql my-app-database-cluster
psql (15.2 (Debian 15.2-1.pgdg110+1))
Type "help" for help.
postgres=# \dt
Did not find any relations.
postgres=# \q
At this point, our project folder looks as follows.
tree --dirsfirst
.
├── k6-tests
│ └── k6-test.js
├── kubernetes
│ ├── my-app-database-cluster.yaml
│ ├── my-app-deployment-hpa.yaml
│ ├── my-app-deployment.yaml
│ └── my-app-service.yaml
└── my-app
├── app.js
└── Dockerfile
Usernames and passwords
When we created the database cluster, usernames and passwords were automatically created to the cluster. They are stored as Secrets that can be accessed from the cluster. To describe the secrets created for the cluster, we can run the command kubectl describe secret my-app-database-cluster
. The output contains information about the username and password for the cluster.
kubectl describe secret my-app-database-cluster
...
Data
====
username: 8 bytes
password: 64 bytes
pgpass: 108 bytes
The secrets are injected into our containers as environment variables.
Further, when the cluster is created, an username called app
is created for the cluster, which is the one that we will be using in our application. The secrets for the username app
are available in my-app-database-cluster-app
(the name of the cluster followed by the suffix -app)
.
kubectl describe secret my-app-database-cluster-app
...
Data
====
password: 64 bytes
pgpass: 105 bytes
username: 3 bytes
Peeking into services
The CloudNativePG operator also creates a handful of services, which are used to direct traffic to the database instances. The services are created with the my-app-database-cluster
prefix, which is the name of our database cluster. There are a handful of services, which have different levels of access ranging from read only (-ro
) to read and write (-rw
). The services are created with the type ClusterIP
, which means that they are only accessible from within the cluster.
We can see all the running services with the command kubectl get services
.
kubectl get services
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 6d18h
my-app-database-cluster-any ClusterIP 10.101.234.244 <none> 5432/TCP 68m
my-app-database-cluster-r ClusterIP 10.108.65.7 <none> 5432/TCP 68m
my-app-database-cluster-ro ClusterIP 10.104.210.101 <none> 5432/TCP 68m
my-app-database-cluster-rw ClusterIP 10.107.183.39 <none> 5432/TCP 68m
my-app-service LoadBalancer 10.101.223.73 <pending> 7777:32512/TCP 40h
Database migrations
Previously, in our applications, we've used Flyway for database migrations. Why break the good habit? In the case of our present Kubernetes application, we want to run the database migrations within the cluster. A straightforward approach to achieve this is to create a container for running the migrations, and run the container as a Job.
Flyway container
Let's first create our Flyway container. Create a folder called flyway
and a folder called sql
into it. Place a file called V1___initial_schema.sql
into it, and store the following content to the file.
CREATE TABLE items (
id SERIAL PRIMARY KEY,
name TEXT NOT NULL
);
Next, create a Dockerfile
to the folder flyway
and place the following contents to it.
FROM flyway/flyway:9.8.3-alpine
# Assuming we're building the image inside the `flyway` -folder
COPY sql/ /flyway/sql/
# Use shell form for entrypoint to get access to env variables
ENTRYPOINT ./flyway migrate -user=$FLYWAY_USER -password=$FLYWAY_PASSWORD -url="jdbc:postgresql://${MY_APP_DATABASE_CLUSTER_RW_SERVICE_HOST}:${MY_APP_DATABASE_CLUSTER_RW_SERVICE_PORT}/${FLYWAY_USER}"
The above Dockerfile
copies the SQL files to the container, and defines an entrypoint for running the Flyway migrations. The variables FLYWAY_USER
, FLYWAY_PASSWORD
, MY_APP_DATABASE_CLUSTER_RW_SERVICE_HOST
, and MY_APP_DATABASE_CLUSTER_RW_SERVICE_PORT
will be injected to the container by Kubernetes (as soon as we've configured that..).
Now, to build the image, we use the minikube image build
command that both builds the image and copies the image to Minikube's container registry. The command needs to be run within the flyway
folder. Let's call the image my-app-database-migrations
.
minikube image build -t my-app-database-migrations -f ./Dockerfile .
..
Step 1/3 : FROM flyway/flyway:9.8.3-alpine
..
Step 2/3 : COPY sql/ /flyway/sql/
..
Step 3/3 : ENTRYPOINT ...
..
Successfully tagged my-app-database-migrations:latest
Once the image has been built, we check that it is visible within the Minikube's container registry using the minikube image list
command.
minikube image list
..
docker.io/library/my-app-database-migrations:latest
..
Now, when the image is within the container registry, we can use it in our Kubernetes cluster. Let's next create the needed configuration for it.
Flyway Job configuration
Database migrations are meaningful to run as Kubernetes Jobs, which are jobs that are meant to run once -- in the need of a new migration, we just run them again.
Create a file called my-app-database-migration-job.yaml
to the kubernetes
folder, and place the following contents to it.
apiVersion: batch/v1
kind: Job
metadata:
name: my-app-database-migration-job
spec:
template:
metadata:
name: my-app-database-migration-job
spec:
containers:
- name: my-app-database-migrations
image: my-app-database-migrations:latest
imagePullPolicy: Never
env:
- name: FLYWAY_USER
valueFrom:
secretKeyRef:
name: my-app-database-cluster-app
key: username
optional: false
- name: FLYWAY_PASSWORD
valueFrom:
secretKeyRef:
name: my-app-database-cluster-app
key: password
optional: false
restartPolicy: Never
backoffLimit: 2
The above configuration creates a job that runs the my-app-database-migrations
container. The container is configured to use the username and password from the secret my-app-database-cluster-app
, which are injected to the job as environment variables called called FLYWAY_USER
and FLYWAY_PASSWORD
.
At this point, our project folder structure is as follows.
tree --dirsfirst
.
├── flyway
│ ├── sql
│ │ └── V1___initial_schema.sql
│ └── Dockerfile
├── k6-tests
│ └── k6-test.js
├── kubernetes
│ ├── my-app-database-cluster.yaml
│ ├── my-app-database-migration-job.yaml
│ ├── my-app-deployment-hpa.yaml
│ ├── my-app-deployment.yaml
│ └── my-app-service.yaml
└── my-app
├── app.js
└── Dockerfile
Applying the migration
When we apply the above configuration, the job will be created to the cluster. Let's apply the configuration using the kubectl apply
command.
kubectl apply -f kubernetes/my-app-database-migration-job.yaml
job.batch/my-app-database-migration-job created
Now, when we list the pods, we can see that the job has been completed.
kubectl get pods
NAME READY STATUS RESTARTS AGE
my-app-database-cluster-1 1/1 Running 0 90m
my-app-database-cluster-2 1/1 Running 0 90m
my-app-database-migration-job-jndk9 0/1 Completed 0 2m9s
my-app-deployment-595b7ddc84-7zl9w 1/1 Running 2 (102m ago) 25h
The above output also shows two database cluster instances and our deployment. The database migration job has been completed, and has also been removed, as indicated by the 0/1 in the READY
column.
To run a new migration, we build a new image, remove the old job, and run the
kubectl apply
command again.
Checking the database
Now that we've run the migration, we should be able to see the table items
in the database. Let's connect to the database using the cpng
plugin and list the database tables.
kubectl cnpg psql my-app-database-cluster
psql (15.2 (Debian 15.2-1.pgdg110+1))
Type "help" for help.
postgres=# \dt
Did not find any relations.
When we connect to the database, we do not see any tables. This is because by default we connect as the postgres
user and hence see the tables for the postgres
database. In the case of our database, the changes have been done with the user app
to the database app
. Let's switch the database with the \c
shorthand and check again.
postgres=# \c app
You are now connected to database "app" as user "postgres".
app=#
app=# \dt
List of relations
Schema | Name | Type | Owner
--------+-----------------------+-------+-------
public | flyway_schema_history | table | app
public | items | table | app
(2 rows)
app=# \q
Now, we see the tables -- let's next adjust our application to connect to the database programmatically.
Connecting to the database
Let's next modify our application to connect to the database.
Dockerfile and dependencies
Let's first create a deps.js
file for importing the PostgreSQL driver. Create a file called deps.js
to the my-app
folder, and place the following contents to it.
import postgres from "https://deno.land/x/postgresjs@v3.4.4/mod.js";
export { postgres };
Next, modify the Dockerfile
in the my-app
folder to match the following. Essentially, we cache the deps.js
to the image, so that the dependencies are not downloaded every time we build the image. We further adjust the run command so that the application is run with the --allow-net
flag, which is needed for connecting to the database, and the --allow-env
flag that is needed for reading the database connection details from the environment variables. The --unstable
is needed for the connection pool of the PostgreSQL driver.
FROM denoland/deno:alpine-1.42.2
EXPOSE 7777
WORKDIR /app
COPY deps.js .
RUN deno cache deps.js
COPY . .
CMD [ "run", "--unstable", "--watch", "--allow-net", "--allow-read", "--allow-env", "app.js" ]
Deployment configuration
Next, we need to inject the pgpass
secret from my-app-database-cluster-app
to our application. To achieve this, modify the my-app-deployment.yaml
file to match the following. The modification essentially adds an environment variable called PGPASS
to the application.
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-app-deployment
labels:
app: my-app
spec:
selector:
matchLabels:
app: my-app
template:
metadata:
labels:
app: my-app
spec:
containers:
- name: my-app
image: my-app:latest
imagePullPolicy: Never
ports:
- containerPort: 7777
resources:
requests:
cpu: 100m
limits:
cpu: 200m
env:
- name: PGPASS
valueFrom:
secretKeyRef:
name: my-app-database-cluster-app
key: pgpass
optional: false
Modifying the application
Finally, we need to modify the application code. Let's create a file called database.js
that reads in the PGPASS
environment variable, uses it to connect to the database, and exports the sql
function for use in other locations.
The contents of database.js
are as follows.
import { postgres } from "./deps.js";
const PGPASS = Deno.env.get("PGPASS").trim();
const PGPASS_PARTS = PGPASS.split(":");
const host = PGPASS_PARTS[0];
const port = PGPASS_PARTS[1];
const database = PGPASS_PARTS[2];
const username = PGPASS_PARTS[3];
const password = PGPASS_PARTS[4];
const sql = postgres({
host,
port,
database,
username,
password,
});
export { sql };
Place the database.js
to the my-app
folder.
Next, modify the app.js
so that it imports the sql
function from the database.js
file, and makes a database query to the database on each request, asking for the number of items in the database. The number of items is appended to the response from the server.
This would look as follows.
import { sql } from "./database.js";
const helloMessage = `Hello from server: ${Math.floor(10000 * Math.random())}`;
const handleRequest = async (request) => {
const rows = await sql`SELECT COUNT(*) FROM items`;
const count = rows[0].count;
return new Response(`${helloMessage} -- count: ${count}`);
};
Deno.serve({ hostname: "0.0.0.0", port: 7777 }, handleRequest);
Building and deploying the image
With all the changes in place, it's time to build the image. When in the my-app
folder, we run the following command that builds the my-app
image into the Minikube registry.
minikube image build -t my-app -f ./Dockerfile .
(this takes a while...)
Successfully tagged my-app:latest
Now, to deploy the image, we apply the updated configuration.
kubectl apply -f kubernetes/my-app-deployment.yaml
deployment.apps/my-app-deployment configured
And check the status of the pods.
kubectl get pods
NAME READY STATUS RESTARTS AGE
my-app-database-cluster-1 1/1 Running 0 133m
my-app-database-cluster-2 1/1 Running 0 133m
my-app-database-migration-job-jndk9 0/1 Completed 0 45m
my-app-deployment-8557c44664-f85r6 1/1 Running 0 5s
We see that the pod is running.
Querying the application
Finally, with the application deployed, it's time to query the server. We again run the command minikube service my-app-service --url
to determine the URL for the deployment.
minikube service my-app-service --url
http://192.168.49.2:32512
And then query the server.
curl http://192.168.49.2:32512
Hello from server: 1915 -- count: 0
curl http://192.168.49.2:32512
Hello from server: 1915 -- count: 0
As you notice, there are no items in the database. Let's quickly add something to the database.
kubectl cnpg psql my-app-database-cluster
psql (15.2 (Debian 15.2-1.pgdg110+1))
Type "help" for help.
postgres=# \c app
You are now connected to database "app" as user "postgres".
app=# \dt
List of relations
Schema | Name | Type | Owner
--------+-----------------------+-------+-------
public | flyway_schema_history | table | app
public | items | table | app
(2 rows)
app=# \d items
Table "public.items"
Column | Type | Collation | Nullable | Default
--------+---------+-----------+----------+-----------------------------------
id | integer | | not null | nextval('items_id_seq'::regclass)
name | text | | not null |
Indexes:
"items_pkey" PRIMARY KEY, btree (id)
app=# INSERT INTO items (name) VALUES ('hamburger');
INSERT 0 1
app=# \q
Now, when we query the server, we see that there indeed is one item in the database.
curl http://192.168.49.2:32512
Hello from server: 1915 -- count: 1
curl http://192.168.49.2:32512
Hello from server: 1915 -- count: 1