Database Setup, Studying Queries, and Connection Pools
Learning Objectives
- You rehearse setting up a PostgreSQL database and running migrations.
- You know of tools to study slow queries and database connections.
- You understand the concept of connection pools and their benefits.
Adding PostgreSQL database
Building on the walking skeleton with Deno and Astro from the end of the last part, follow the chapter PostgreSQL database and migrations from the Web Software Development course and add a PostgreSQL database with database migrations to the walking skeleton.
Once ready, you should have an application with a database that responds to requests.In particular, a query to the path localhost:8000/todos
should return the todos from the database — or an empty array if there are no todos in the database.
curl localhost:8000/todos
[]%
curl localhost:8000
{"message":"Hello world!"}%
Studying database queries
PostgreSQL comes with ready-made functionality that allows studying database queries. The extension auto_explain provides the functionality for logging slow database queries, while the extension pg_stat_statements allows recording statistics of SQL queries.
Both of these extensions induce a small overhead, so they should be used with care and typically only in development or when investigating performance in production.
Enabling auto_explain and pg_stat_statements
To add them to use, modify compose.yaml
of the project to add a command that loads the libraries.
database:
container_name: postgresql_database
image: postgres:17.0
restart: unless-stopped
env_file:
- project.env
command: >
postgres
-c shared_preload_libraries=auto_explain,pg_stat_statements
-c auto_explain.log_min_duration=10
Now, when you restart the application, PostgreSQL will log statements that take more than 10 milliseconds to run. It also provides functionality for storing statement statistics in the database.
The logs are available in the docker logs of the database container. To see the logs, run the following command. When running the application with docker compose up
, the logs are shown in the terminal.
Enabling statement statistics
The statement statistics need to be enabled for each database that you want to monitor. To enable the statistics, create a new migration file, say V2__enable_pg_stat_statements.sql
, and add the following content to it:
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;
When you again restart the application and this migration file is run, a table (or view) called pg_stat_statements
can be queried in the database. It has numerous columns with query statistics, including the query itself, the number of times the query has been run, the total time spent running the query, and the average time the query has taken.
For additional details on the columns, see the documentation of pg_stat_statements.
As an example, if we wish to see the top ten average query execution times of queries that have been run more than 5 times, sorted based on average query time, we can run the following query:
SELECT query, mean_exec_time
FROM pg_stat_statements
WHERE calls > 5
ORDER BY mean_exec_time DESC
LIMIT 10;
The column mean_exec_time
shows the average query time in milliseconds — most, if not all, of the queries shown in the result should be sub-millisecond under minimal load.
When you initially look at the query results, you may notice that most rows are not relevant to the application. This is partly due to the database not having been queried heavily yet, and partly because the default user is often used for many other tasks. To better see queries from your application, visit localhost:8000/todos
a handful of times and run the above query again. You should now see the query that selects the todos from the database.
To limit the listed queries to a specific user, you can filter based on the userid
column. The userid
column corresponds to the usesysid
column in the pg_user
table. As an example, the following would limit the queries to those made by the user username
.
SELECT query, mean_exec_time
FROM pg_stat_statements
WHERE calls > 5
AND userid = (SELECT usesysid
FROM pg_user
WHERE usename = 'username'
)
ORDER BY mean_exec_time DESC
LIMIT 10;
Note that as username
is the default superuser created at the time of the database creation in the walking skeleton, it is also used to create databases, run migrations, etc. In a production setting, it’s best practice to create a separate, non-superuser role specifically for your application.
Studying connections
We can also look into database connections by examining the table pg_stat_activity
. This table contains information about the current connections to the database, including the query being run, the user who is running it, and how long it has been running.
For example, the following query shows the current connections to the database, including the connection ID, the name of the user, the time when the connection was initiated, the time when the last query was run, the state of the connection, and the last query executed on that connection:
SELECT pid, usename, backend_start, query_start, state, query
FROM pg_stat_activity
WHERE state IS NOT null;
When you run the above query, you see at least one connection, which is the connection that you are using to run the query. The state
column shows the state of the connection: active
means that the connection is running a query, while idle
means that the connection is not currently running a query.
Now, open up the address localhost:8000/todos
in your browser or via curl
, and run the above query again. You should now see an additional connection — the one from the server — which, after the query completes, will be in an idle
state.
Connection pools
The reason why the connection remains open (in idle
state) is that the Postgres.js driver that we use employs a connection pool. A connection pool acts as an intermediary between the web server and the database server, managing the connections to the database server.
When an application needs to execute a database query, it first requests a connection from the pool. If an idle connection is available, it’s used. Otherwise, as long as the pool has not reached its maximum size, it creates a new physical connection and uses that. The query is then executed over the connection, and once complete, the connection is returned to the pool. This approach saves the overhead of repeatedly opening and closing connections.
Pool-based vs non-pool-based connections
Using a connection pool significantly improves the performance of an application because it effectively removes the overhead of constantly establishing new database connections. On a local machine with minimal load, the difference might not feel big, but in a production environment or when handling numerous requests per second, connection pooling makes a major impact.
Below is an version of our server-side code (in server/app.js
) that uses a connection pool. The connection pool is created when the application starts and is used for all database queries. The pool is closed when the application is shut down.
import { Hono } from "@hono/hono";
import { cors } from "@hono/hono/cors";
import { logger } from "@hono/hono/logger";
import postgres from "postgres";
const app = new Hono();
const sql = postgres();
app.use("/*", cors());
app.use("/*", logger());
app.get("/", (c) => c.json({ message: "Hello world!" }));
app.get("/todos", async (c) => {
const todos = await sql`SELECT * FROM todos`;
return c.json(todos);
});
export default app;
When running the application and making requests to it, we see from the server logs that the first query takes a while — as it establishes a connection — but the subsequent queries are faster:
server-1 | Listening on http://0.0.0.0:8000/
server-1 | <-- GET /todos
server-1 | --> GET /todos 200 87ms
server-1 | <-- GET /todos
server-1 | --> GET /todos 200 1ms
server-1 | <-- GET /todos
server-1 | --> GET /todos 200 1ms
server-1 | <-- GET /todos
server-1 | --> GET /todos 200 1ms
server-1 | <-- GET /todos
server-1 | --> GET /todos 200 1ms
The following, on the other hand, establishes a new connection for each request.
import { Hono } from "@hono/hono";
import { cors } from "@hono/hono/cors";
import { logger } from "@hono/hono/logger";
import postgres from "postgres";
const app = new Hono();
app.use("/*", cors());
app.use("/*", logger());
app.get("/", (c) => c.json({ message: "Hello world!" }));
app.get("/todos", async (c) => {
const sql = postgres();
const todos = await sql`SELECT * FROM todos`;
sql.end();
return c.json(todos);
});
export default app;
Similarly, when running the above code and making requests to the server, we see that each query takes a while, as a new connection is established for each query:
server-1 | Listening on http://0.0.0.0:8000/
server-1 | <-- GET /todos
server-1 | --> GET /todos 200 87ms
server-1 | <-- GET /todos
server-1 | --> GET /todos 200 62ms
server-1 | <-- GET /todos
server-1 | --> GET /todos 200 64ms
server-1 | <-- GET /todos
server-1 | --> GET /todos 200 49ms
server-1 | <-- GET /todos
server-1 | --> GET /todos 200 56ms
server-1 | <-- GET /todos
server-1 | --> GET /todos 200 74ms
In the version that uses a connection pool, each query — once the connection is established — is in the range of milliseconds, while in the version without the connection pool, each query is in the range of tens of milliseconds. The difference can be larger in environments where the servers are further apart or under heavier load.