Skip to content

Commit efbbaa2

Browse files
committed
doc/07-Operations.md: New Operations section
The newly introduced Operations section is a first attempt at an operational Icinga DB documentation. It covers - essential Icinga DB monitoring, - backups and corner cases for MySQL/MariaDB, - optional systemd service restarts after failures, - MySQL/MariaDB configuration options, including AWS RDS and Galera, - and memory overcommitment for Redis. As such a section can never be completed, this documentation is a target for continuous improvement. Fixes #745.
1 parent ade4a66 commit efbbaa2

File tree

1 file changed

+128
-0
lines changed

1 file changed

+128
-0
lines changed

doc/07-Operations.md

Lines changed: 128 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,128 @@
1+
# Operations
2+
3+
Once Icinga DB is installed and configured, it usually runs silently in the background.
4+
This section is a loose collection of various topics to keep it that way.
5+
6+
## Monitor Icinga DB
7+
8+
It is strongly recommended to monitor the monitoring.
9+
10+
There is a built-in [`icingadb` check command](https://icinga.com/docs/icinga-2/latest/doc/10-icinga-template-library/#icingadb) in the Icinga 2 ITL.
11+
It covers several potential errors, including operations that take too long or invalid high availability scenarios.
12+
Even if the Icinga DB has crashed, checks will still run and Icinga 2 would generate notifications.
13+
14+
In addition, both the Redis® and the relational database should be monitored.
15+
There are predefined check commands in the ITL to choose from.
16+
17+
- [`redis`](https://icinga.com/docs/icinga-2/latest/doc/10-icinga-template-library/#redis)
18+
- [`mysql`](https://icinga.com/docs/icinga-2/latest/doc/10-icinga-template-library/#mysql)
19+
- [`mysql_health`](https://icinga.com/docs/icinga-2/latest/doc/10-icinga-template-library/#mysql_health)
20+
- [`postgres`](https://icinga.com/docs/icinga-2/latest/doc/10-icinga-template-library/#postgres)
21+
22+
A simpler approach would be to check if the processes are running, e.g.,
23+
with [`proc`](https://icinga.com/docs/icinga-2/latest/doc/10-icinga-template-library/#procs) or [`systemd`](https://icinga.com/docs/icinga-2/latest/doc/10-icinga-template-library/#systemd).
24+
25+
## Backups
26+
27+
There are only two things to back up in Icinga DB.
28+
29+
1. The configuration file in `/etc/icingadb` and
30+
2. the relational database, using `mysqldump`, `mariadb-dump` or `pg_dump`.
31+
32+
!!! warning
33+
34+
When creating a database dump for MySQL or MariaDB with `mysqldump` or `mariadb-dump`,
35+
use the [`--single-transaction` command line argument flag](https://dev.mysql.com/doc/refman/8.4/en/mysqldump.html#option_mysqldump_single-transaction)
36+
to not lock the whole database while the backup is running.
37+
38+
## Automatic Restart
39+
40+
<!-- NOTE: Would be obsolete after https://git.icinga.com/packages/icingadb/-/merge_requests/10 -->
41+
42+
The retry logic of the Icinga DB daemon will retry certain errors for up to five minutes before giving up and stopping the daemon.
43+
While an error persisting for five minutes in most cases indicates a critical error,
44+
in some environments this happens occasionally and continuing is a valid solution.
45+
46+
In this case, the `icingadb.service` unit can be modified to include a [`Restart` option](https://www.freedesktop.org/software/systemd/man/latest/systemd.service.html#Restart=).
47+
48+
!!! warning
49+
50+
Making systemd retry fatal Icinga DB errors may help in some cases, but it usually hides the symptom of a deeper problem.
51+
Please do not use this as a silver bullet and try to understand the error, search for already [reported issues](https://github.com/Icinga/icingadb/issues) or report one yourself.
52+
53+
```shell
54+
systemctl edit icingadb.service
55+
```
56+
57+
Add a `[Service]` block like the following.
58+
59+
```ini
60+
[Service]
61+
Restart=on-failure
62+
RestartSec=60
63+
```
64+
65+
This will create an additional `/etc/systemd/system/icingadb.service.d/override.conf` file that overrides the defaults.
66+
A final unit restart is required for the changes to take effect.
67+
68+
```shell
69+
systemctl restart icingadb.service
70+
```
71+
72+
## Third-Party Configuration
73+
74+
Icinga DB relies on external components to work.
75+
The following collection is based on experience.
76+
It is a target for continuous improvement.
77+
78+
### MySQL and MariaDB
79+
80+
#### `max_allow_packets`
81+
82+
The `max_allow_packets` system variable limits the size of messages between MySQL/MariaDB servers and clients.
83+
More information is available in
84+
[MySQL's "Replication and max_allowed_packet" documentation section](https://dev.mysql.com/doc/refman/8.4/en/replication-features-max-allowed-packet.html),
85+
[MySQL's variable documentation](https://dev.mysql.com/doc/refman/8.4/en/server-system-variables.html#sysvar_max_allowed_packet) and
86+
[MariaDB's variable documentation](https://mariadb.com/kb/en/server-system-variables/#max_allowed_packet).
87+
88+
The database configuration should have `max_allow_packets` set to at least `64M`.
89+
90+
#### Amazon RDS for MySQL
91+
92+
When importing the MySQL schema into Amazon RDS for MySQL, the following may occur.
93+
94+
```
95+
Error 1419: You do not have the SUPER privilege and binary logging is enabled (you *might* want to use the less safe log_bin_trust_function_creators variable)
96+
```
97+
98+
This error can be mitigated by following the related [AWS article](https://repost.aws/knowledge-center/rds-mysql-functions).
99+
100+
#### Galera Cluster
101+
102+
Starting with Icinga DB version 1.2.0, Galera support has been added to the Icinga DB daemon.
103+
Its specific database configuration is described in the [Galera configuration section](03-Configuration.md#galera-cluster).
104+
105+
As mentioned in [MariaDB's known Galera cluster limitations](https://mariadb.com/kb/en/mariadb-galera-cluster-known-limitations/),
106+
transactions are limited in both amount of rows (128K) and size (2GiB).
107+
A busy Icinga setup can cause Icinga DB to create transactions that exceed these limits with the default configuration.
108+
109+
If you get an error like `Error 1105 (HY000): Maximum writeset size exceeded`
110+
and your Galera node logs something like `WSREP: transaction size limit (2147483647) exceeded`,
111+
decrease the values of `max_placeholders_per_statement` and `max_rows_per_transaction` in Icinga DB's
112+
[Database Options](https://icinga.com/docs/icinga-db/latest/doc/03-Configuration/#database-options).
113+
114+
### Redis®
115+
116+
On Linux, enable [memory committing](https://www.kernel.org/doc/Documentation/vm/overcommit-accounting).
117+
118+
```shell
119+
sysctl vm.overcommit_memory=1
120+
```
121+
122+
To persist this setting across reboots, add the following line either to `/etc/sysctl.conf` or to a file in the `/etc/sysctl.d/` directory.
123+
124+
```
125+
vm.overcommit_memory = 1
126+
```
127+
128+
In addition, the official [Redis® administration documentation](https://redis.io/docs/latest/operate/oss_and_stack/management/admin/) is quite useful.

0 commit comments

Comments
 (0)