Monitoring and Security

LGTM Stack


Learning Objectives

  • You know what the LGTM stack is and how it can be taken into use.
  • You know how to use the LGTM stack to monitor Deno and PostgreSQL.

To concretely illustrate monitoring, we add the LGTM stack to the walking skeleton. The LGTM stack consists of Loki log database, Grafana for data visualization, Tempo for traces, and Prometheus for metrics.

All of the products in the LGTM stack come from Grafana Labs, which is a company that specializes in observability and monitoring tools. Their business model focuses on providing the products as managed services, but the projects are also open-source, which makes them nice for learning.

All of the data is collected using OpenTelemetry, which is an open-source observability framework that provides functionality for capturing traces, logs, and metrics from applications.

Loading Exercise...

Adding LGTM

To add LGTM to the walking skeleton, add the following lines to the compose.yaml file.

  lgtm:
    image: grafana/otel-lgtm:0.8.6
    ports:
      - "3000:3000"
      - "4318:4318"

The port 3000 is used for Grafana, and the port 4318 is used for the OpenTelemetry HTTP collector.

Then, run docker-compose up --build to start the LGTM stack. The stack takes a while to start — you can see the progress in the terminal.

// ...
lgtm-1                 | Waiting for Grafana to start up...
lgtm-1                 | Waiting for Grafana to start up...
lgtm-1                 | Grafana is up and running.
lgtm-1                 | Waiting for Loki to start up...
lgtm-1                 | Waiting for Loki to start up...
// ...

Once LGTM is up and running, opening up the address localhost:3000 shows Grafana dashboard, similar to the one in Figure 1. The default username and password for Grafana is admin and admin.

Fig. 1 - Grafana running at http://localhost:3000

Fig. 1 - Grafana running at http://localhost:3000

Collecting data from Deno

Setting up data collection

Recent Deno versions come with built-in support for collecting logs and traces with OpenTelemetry. First, modify the Dockerfile in the folder server to include the flag --unstable-otel and update the Deno version to at least 2.2.0. The updated Dockerfile should look, for example, like this:

FROM denoland/deno:alpine-2.2.3

WORKDIR /app

COPY deno.json .

RUN DENO_FUTURE=1 deno install

COPY . .

CMD [ "run", "--allow-env", "--allow-net", "--allow-read", "--unstable-otel", "--watch", "app-run.js" ]

Then, modify the project.env to enable OpenTelemetry for Deno and to explicitly tell the OpenTelemetry collector where to send the data. In our case, the data collector for the lgtm service is at port 4318 — thus, we use http://lgtm:4318 to find it within the Docker Network. Further, we want to explicitly tell the name of the service so that it has a name that we can look for — below, we set the name as deno_server.

Add the following lines to the project.env file:

OTEL_DENO=true
OTEL_EXPORTER_OTLP_ENDPOINT=http://lgtm:4318
OTEL_SERVICE_NAME=deno_server

Now, the Deno application sends traces and logs to the OpenTelemetry collector at http://lgtm:4318. To test this out, modify the app.js of the server to include the following endpoint.

app.get("/api/lgtm-test", (c) => {
  console.log("Hello log collection :)");
  return c.json({ message: "Hello, world!" });
});

Next, restart the application and make a few requests to the endpoint http://localhost:8000/api/lgtm-test. This generates logs and traces that are sent to the OpenTelemetry collector.

For more information, including creating own metrics and traces, see Deno’s OpenTelemetry documentation.

Loading Exercise...

Viewing logs and metrics

Now, log data is aggregated to the Loki database, which is part of the LGTM stack. To view the logs, go to the Grafana dashboard at localhost:3000. Open “Explore” on the left-hand side, and select “Logs”.

You should see the logs generated by the Deno application, similar to what is shown in Figure 2 below (the Figure is rather small, sorry about that — the gist is that the logs are shown in the Grafana dashboard).

Fig. 2 - Selecting "Logs" under "Explore" in Grafana shows logs from Loki. As we have configured the service name to "deno_server", the logs relate to a service called "deno_server".

Fig. 2 - Selecting “Logs” under “Explore” in Grafana shows logs from Loki. As we have configured the service name to “deno_server”, the logs relate to a service called “deno_server”.

Collecting log data to a centralized location makes it easier to monitor and analyze logs. This is especially useful when running multiple services, as it allows viewing logs from all services in one place. In addition, even if a container crashes, the logs are still available in the centralized location.

Similarly, metrics that Deno sends to the OpenTelemetry collector are aggregated to Prometheus. To view the metrics, open “Explore” and select “Metrics”. In the metrics view, it is possible to limit the viewed metrics to “http_”, which then limits the metrics to those related to HTTP requests. The metrics view is shown in Figure 3 below.

Fig. 3 - Selecting "Metrics" under "Explore" in Grafana shows collected metrics from Prometheus.

Fig. 3 - Selecting “Metrics” under “Explore” in Grafana shows collected metrics from Prometheus.

In our case, we have made just a handful of requests to the server, so the metrics are not very interesting. However, as the application scales and more requests are made, the metrics view becomes more useful.

Alerts based on metrics

It is also possible to set up alerts based on metrics in the Grafana dashboard. Alerts can be used to notify when certain thresholds are exceeded. For example, you could set up an alert to notify you when the average response time of the server exceeds 500ms. However, we do not go into this in more detail.

Viewing PostgreSQL data

Earlier, when setting up PostgreSQL, we also added the functionality for collecting data from the database. This involved enabling the extension pg_stat_statements.

Setting up data connection

The LGTM stack comes with a Prometheus exporter for PostgreSQL, which can be used to collect metrics from the database. To use it, open “Connections” from the left hand side menu and click “Add new connection”. Search for PostgreSQL and once found, click it. This opens a view for creating a data source for PostgreSQL.

For testing, we can use the same connection details as we normally use to connect to PostgreSQL. That is, use the following:

  • Host URL: database:5432
  • Database: database
  • User: username
  • Password: password

For the TLS/SSL mode, use disable. Select the same version for PostgreSQL as you have in the compose.yaml file.

Finally, click “Save & Test”. If the connection is successful, you should see a message saying “Database Connection OK”.

Studying database data

Next, under “Connections”, click “Data sources”. This should now include the PostgreSQL data source that you just created. Click “Explore” for the datasource. This opens a view where you can write database queries.

Given that the pg_stat_statements extension is enabled, you can select pg_stat_statements as the table, and e.g. select the columns “query” and “mean_exec_time”. When you press “Run query”, the query statistics are retrieved from PostgreSQL and shown in a table.

PostgreSQL data exporters

There are also PostgreSQL-specific data exporters available for Prometheus, such as postgres_exporter. OpenTelemetry has also a community contribution for collecting PostgreSQL data.

In general, these need to be run as a separate services, which then collect metrics. Exporters can be configured to collect a wide range of metrics, such as query execution times, cache hit ratios, and so on.

Dashboards

Grafana also allows constructing dashboards that provide a high-level overview of the system. As creating tailored dashboards is a bit out of scope of this course, we’ll leave it at the mention level.

To get started with Dashboards in Grafana, see e.g. Understanding Dashboards in Grafana YouTube video from Grafana labs.

Software teams often have dashboards that are visible to everyone in the team, e.g. in a big screen in the office. This way, everyone can see the status of the system at a glance.

All sorts of data

Although the examples here focused just on Deno and PostgreSQL, the LGTM stack with OpenTelemetry can be used to collect data from a wide range of services. As an example, if we would wish to enable OpenTelemetry metric collection for Traefik, we would just need to add the two following lines to the command part of Traefik in compose.yaml:

- "--metrics.otlp=true"
- "--metrics.otlp.http.endpoint=http://lgtm:4318/v1/metrics"

With the above, Traefik metrics would be available in Grafana with the prefix traefik_. For further options and for understanding OpenTelemetry in depth, a good starting point is the OpenTelemetry documentation.