From e57a291eff9e7ee04c8789d3fb98274854412987 Mon Sep 17 00:00:00 2001 From: Conall O'Brien Date: Tue, 15 Jul 2025 12:26:21 +0100 Subject: [PATCH 1/2] Add Important Labels subsection, with job and instance called out Signed-off-by: Conall O'Brien --- docs/practices/naming.md | 13 ++++++++++++- 1 file changed, 12 insertions(+), 1 deletion(-) diff --git a/docs/practices/naming.md b/docs/practices/naming.md index 4999feb5d..1ad1625da 100644 --- a/docs/practices/naming.md +++ b/docs/practices/naming.md @@ -80,7 +80,18 @@ the underlying metric type and unit you work with. * **Metric collisions**: With growing adoption and metric changes over time, there are cases where lack of unit and type information in the metric name will cause certain series to collide (e.g. `process_cpu` for seconds and milliseconds). -## Labels +### Important Labels + +* `job` + * The `job` label is a primary key to differentiate metrics from eaach other. + * If not specified in PromQL expressions, they will match unrelated metrics with the same name. This is especially true in a multi system or multi tenant installation + +WARNING: When using `without`, be careful not to strip out the `job` label accidentally. + +* `instance` + * The `instance` label will include the `ip:port` what was scraped, providing a crucial breadcrumb for debugging scrape time issues + +### General Labelling Advice Use labels to differentiate the characteristics of the thing that is being measured: From a40367a3df6e898a1df7bad47994d3babd6a4072 Mon Sep 17 00:00:00 2001 From: Conall O'Brien Date: Tue, 15 Jul 2025 13:34:11 +0100 Subject: [PATCH 2/2] Add a new section about labels Signed-off-by: Conall O'Brien --- docs/practices/rules.md | 18 ++++++++++++++---- 1 file changed, 14 insertions(+), 4 deletions(-) diff --git a/docs/practices/rules.md b/docs/practices/rules.md index f91fef5eb..eebe880ed 100644 --- a/docs/practices/rules.md +++ b/docs/practices/rules.md @@ -19,6 +19,8 @@ This page documents proper naming conventions and aggregation for recording rule Keeping the metric name unchanged makes it easy to know what a metric is and easy to find in the codebase. +IMPORTANT: `job` label acts as a primary key. It is **strongly** recommended that you use it to scope your PromQL expressions to the system you are monitoring. + To keep the operations clean, `_sum` is omitted if there are other operations, as `sum()`. Associative operations can be merged (for example `min_min` is the same as `min`). @@ -27,6 +29,18 @@ If there is no obvious operation to use, use `sum`. When taking a ratio by doing division, separate the metrics using `_per_` and call the operation `ratio`. +## Labels + +NOTE: Omitting a label in a PromQL expression is the functional equivalent of specifying `label=*` + +* In both recorded rules and alerting expressions, always specify a `job` label to prevent expression mismatches from occuring. + This is especially important in multi-tenant systems where the same metric names may be exported by different jobs or the + same job (e.g `node_exporter) in multiple, distinct deployments + +* Always specify a `without` clause with the labels you are aggregating away. +This is to preserve all the other labels such as `job`, which will avoid +conflicts and give you more useful metrics and alerts. + ## Aggregation * When aggregating up ratios, aggregate up the numerator and denominator @@ -40,10 +54,6 @@ Instead keep the metric name without the `_count` or `_sum` suffix and replace the `rate` in the operation with `mean`. This represents the average observation size over that time period. -* Always specify a `without` clause with the labels you are aggregating away. -This is to preserve all the other labels such as `job`, which will avoid -conflicts and give you more useful metrics and alerts. - ## Examples _Note the indentation style with outdented operators on their own line between