Skip to content

Update WAL_*_TXN metrics names #218

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 7 additions & 3 deletions documentation/operations/monitoring-alerting.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,11 @@ title: Monitoring and alerting
description: Shows you how to set up to monitor your database for potential issues, and how to raise alerts
---

There are many variables to consider when monitoring an active production database. This document is designed to be a helpful starting point. We plan to expand this guide to be more helpful. If you have any recommendations, feel free to [create an issue](https://github.com/questdb/documentation/issues) or a PR on GitHub.
There are many variables to consider when monitoring an active production
database. This document is designed to be a helpful starting point. We plan to
expand this guide to be more helpful. If you have any recommendations, feel free
to [create an issue](https://github.com/questdb/documentation/issues) or a PR on
GitHub.

## Basic health check

Expand Down Expand Up @@ -55,8 +59,8 @@ take longer if the data is out of order, or touches different time partitions.
You can monitor the overall performance of this process of applying the WAL
data to tables. QuestDB exposes two Prometheus counters for this:

1. `questdb_wal_apply_seq_txn_total`: sum of all committed transaction sequence numbers
2. `questdb_wal_apply_writer_txn_total`: sum of all transaction sequence numbers applied to tables
1. `questdb_wal_apply_seq_txn`: sum of all committed transaction sequence numbers
2. `questdb_wal_apply_writer_txn`: sum of all transaction sequence numbers applied to tables

Both of these numbers are continuously growing as the data is ingested. When
they are equal, all WAL data has been applied to the tables. While data is being
Expand Down
7 changes: 5 additions & 2 deletions documentation/third-party-tools/prometheus.md
Original file line number Diff line number Diff line change
Expand Up @@ -241,11 +241,14 @@ The following metrics are available:
| `questdb_wal_apply_rows_per_second` | gauge | Rate of rows applied per second during WAL apply. |
| `questdb_wal_apply_written_rows_total` | counter | Total number of rows written during WAL apply. |
| `questdb_wal_written_rows_total` | counter | Total number of rows written to WAL. |
| `questdb_wal_seq_txn` | gauge | Sum of all committed transaction sequence numbers. Used in conjunction with `questdb_wal_writer_txn`. |
| `questdb_wal_writer_txn` | gauge | Sum of all transaction sequence numbers applied to tables. With no pending transactions in the WAL, equal to `questdb_wal_seq_txn`. When its lag behind `questdb_wal_seq_txn` is steadily growing, indicates QuestDB is unable to keep up with writes. |
| `questdb_workers_job_start_micros_max` | gauge | Maximum time taken to start a worker job in microseconds. |
| `questdb_workers_job_start_micros_min` | gauge | Minimum time taken to start a worker job in microseconds. |

All of the above metrics are volatile, i.e. they're collected since the current
database start.
Most of the above metrics are volatile, i.e. they're collected since the current
database start. The exception are `questdb_wal_seq_txn` and
`questdb_wal_writer_txn`, because transaction sequence numbers are persistent.

## Configuring Prometheus Alertmanager

Expand Down