Skip to content

Commit f613ab4

Browse files
UTF-8 support in metric and label names (#1255)
Adds UTF-8 support for metric and label names. These changes are based on the work done on the Prometheus common libraries [here](prometheus/common#537) and [here](prometheus/common#570) - The `prometheus-metrics-exposition-formats` module will use the new quoting syntax `{"foo"}` iff the metric does not conform to the legacy name format (`foo{}`) - The `prometheus-metrics-model` module has a new flag (`nameValidationScheme`) that determines if validation is done using the legacy or the UTF-8 scheme. This flag can be set via a property in the properties file. - Scrapers can announce via content negotiation that they support UTF-8 names by adding `escaping=allow-utf-8` in the Accept header. In cases where UTF-8 is not available, metric providers can be configured to escape names in a few different ways: values (`U__` UTF value escaping for perfect round-tripping), underscores (all invalid chars become `_`), dots (dots become `_dot_`, `_` becomes `__`, all other values become `___`). Escaping has a global default (`PrometheusNaming.DEFAULT_ESCAPING_SCHEME`) or can also be specified in Accept header with the `escaping=` term, which can be `allow-utf-8` (for UTF-8-compatible), `underscores`, `dots`, or `values`. This should still be a noop for existing configurations because scrapers will not be passing the escaping key in the Accept header. Existing functionality is maintained. - The `prometheus-metrics-exporter-pushgateway` module will [escape](https://github.com/prometheus/proposals/blob/main/proposals/2023-08-21-utf8.md#text-escaping) UTF-8 grouping keys in the URL path used when pushing metrics (see prometheus/pushgateway#689) Work towards prometheus/prometheus#13095 --------- Signed-off-by: Federico Torres <[email protected]> Signed-off-by: Gregor Zeitlinger <[email protected]> Co-authored-by: Gregor Zeitlinger <[email protected]>
1 parent ff23461 commit f613ab4

File tree

61 files changed

+1979
-537
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

61 files changed

+1979
-537
lines changed

.github/super-linter.env

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -14,8 +14,6 @@ VALIDATE_GO_MODULES=false
1414
VALIDATE_HTML=false
1515
# done by checkstyle
1616
VALIDATE_JAVA=false
17-
# contradicting with prettier
18-
VALIDATE_JAVASCRIPT_STANDARD=false
1917
# we have many duplicate code in our codebase for demo purposes
2018
VALIDATE_JSCPD=false
2119
VALIDATE_PYTHON_PYLINT=false

benchmarks/src/main/java/io/prometheus/metrics/benchmarks/TextFormatUtilBenchmark.java

Lines changed: 7 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,6 @@
11
package io.prometheus.metrics.benchmarks;
22

3+
import io.prometheus.metrics.config.EscapingScheme;
34
import io.prometheus.metrics.expositionformats.ExpositionFormatWriter;
45
import io.prometheus.metrics.expositionformats.OpenMetricsTextFormatWriter;
56
import io.prometheus.metrics.expositionformats.PrometheusTextFormatWriter;
@@ -69,14 +70,15 @@ public OutputStream openMetricsWriteToByteArray(WriterState writerState) throws
6970
// avoid growing the array
7071
ByteArrayOutputStream byteArrayOutputStream = writerState.byteArrayOutputStream;
7172
byteArrayOutputStream.reset();
72-
OPEN_METRICS_TEXT_FORMAT_WRITER.write(byteArrayOutputStream, SNAPSHOTS);
73+
OPEN_METRICS_TEXT_FORMAT_WRITER.write(
74+
byteArrayOutputStream, SNAPSHOTS, EscapingScheme.ALLOW_UTF8);
7375
return byteArrayOutputStream;
7476
}
7577

7678
@Benchmark
7779
public OutputStream openMetricsWriteToNull() throws IOException {
7880
OutputStream nullOutputStream = NullOutputStream.INSTANCE;
79-
OPEN_METRICS_TEXT_FORMAT_WRITER.write(nullOutputStream, SNAPSHOTS);
81+
OPEN_METRICS_TEXT_FORMAT_WRITER.write(nullOutputStream, SNAPSHOTS, EscapingScheme.ALLOW_UTF8);
8082
return nullOutputStream;
8183
}
8284

@@ -85,14 +87,15 @@ public OutputStream prometheusWriteToByteArray(WriterState writerState) throws I
8587
// avoid growing the array
8688
ByteArrayOutputStream byteArrayOutputStream = writerState.byteArrayOutputStream;
8789
byteArrayOutputStream.reset();
88-
PROMETHEUS_TEXT_FORMAT_WRITER.write(byteArrayOutputStream, SNAPSHOTS);
90+
PROMETHEUS_TEXT_FORMAT_WRITER.write(
91+
byteArrayOutputStream, SNAPSHOTS, EscapingScheme.ALLOW_UTF8);
8992
return byteArrayOutputStream;
9093
}
9194

9295
@Benchmark
9396
public OutputStream prometheusWriteToNull() throws IOException {
9497
OutputStream nullOutputStream = NullOutputStream.INSTANCE;
95-
PROMETHEUS_TEXT_FORMAT_WRITER.write(nullOutputStream, SNAPSHOTS);
98+
PROMETHEUS_TEXT_FORMAT_WRITER.write(nullOutputStream, SNAPSHOTS, EscapingScheme.ALLOW_UTF8);
9699
return nullOutputStream;
97100
}
98101

docs/content/config/config.md

Lines changed: 13 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ Future releases will add more options, like configuration via environment variab
1515
Example:
1616

1717
```properties
18-
io.prometheus.exporter.httpServer.port = 9401
18+
io.prometheus.exporter.httpServer.port=9401
1919
```
2020

2121
The property above changes the port for the
@@ -71,13 +71,13 @@ metric only by specifying the metric name. Example:
7171
Let's say you have a histogram named `latency_seconds`.
7272

7373
```properties
74-
io.prometheus.metrics.histogramClassicUpperBounds = 0.2, 0.4, 0.8, 1.0
74+
io.prometheus.metrics.histogramClassicUpperBounds=0.2, 0.4, 0.8, 1.0
7575
```
7676

7777
The line above sets histogram buckets for all histograms. However:
7878

7979
```properties
80-
io.prometheus.metrics.latency_seconds.histogramClassicUpperBounds = 0.2, 0.4, 0.8, 1.0
80+
io.prometheus.metrics.latency_seconds.histogramClassicUpperBounds=0.2, 0.4, 0.8, 1.0
8181
```
8282

8383
The line above sets histogram buckets only for the histogram named `latency_seconds`.
@@ -170,10 +170,15 @@ See Javadoc for details.
170170

171171
<!-- editorconfig-checker-disable -->
172172

173-
| Name | Javadoc | Note |
174-
| ------------------------------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------- | ---- |
175-
| io.prometheus.exporter.pushgateway.address | [PushGateway.Builder.address()](</client_java/api/io/prometheus/metrics/exporter/pushgateway/PushGateway.Builder.html#address(java.lang.String)>) | |
176-
| io.prometheus.exporter.pushgateway.scheme | [PushGateway.Builder.scheme()](</client_java/api/io/prometheus/metrics/exporter/pushgateway/PushGateway.Builder.html#scheme(java.lang.String)>) | |
177-
| io.prometheus.exporter.pushgateway.job | [PushGateway.Builder.job()](</client_java/api/io/prometheus/metrics/exporter/pushgateway/PushGateway.Builder.html#job(java.lang.String)>) | |
173+
| Name | Javadoc | Note |
174+
| ------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | ---- |
175+
| io.prometheus.exporter.pushgateway.address | [PushGateway.Builder.address()](</client_java/api/io/prometheus/metrics/exporter/pushgateway/PushGateway.Builder.html#address(java.lang.String)>) | |
176+
| io.prometheus.exporter.pushgateway.scheme | [PushGateway.Builder.scheme()](</client_java/api/io/prometheus/metrics/exporter/pushgateway/PushGateway.Builder.html#scheme(java.lang.String)>) | |
177+
| io.prometheus.exporter.pushgateway.job | [PushGateway.Builder.job()](</client_java/api/io/prometheus/metrics/exporter/pushgateway/PushGateway.Builder.html#job(java.lang.String)>) | |
178+
| io.prometheus.exporter.pushgateway.escapingScheme | [PushGateway.Builder.escapingScheme()](</client_java/api/io/prometheus/metrics/exporter/pushgateway/PushGateway.Builder.html#escapingScheme(io.prometheus.metrics.config.EscapingScheme)>) | (1) |
178179

179180
<!-- editorconfig-checker-enable -->
181+
182+
(1) Escaping scheme can be `allow-utf-8`, `underscores`, `dots`, or `values` as described in
183+
[escaping schemes](https://github.com/prometheus/docs/blob/main/docs/instrumenting/escaping_schemes.md#escaping-schemes) <!-- editorconfig-checker-disable-line -->
184+
and in the [Unicode documentation]({{< relref "../exporters/unicode.md" >}}).

docs/content/exporters/filter.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
---
22
title: Filter
3-
weight: 2
3+
weight: 3
44
---
55

66
All exporters support a `name[]` URL parameter for querying only specific metric names. Examples:

docs/content/exporters/httpserver.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
---
22
title: HTTPServer
3-
weight: 3
3+
weight: 4
44
---
55

66
The `HTTPServer` is a standalone server for exposing a metric endpoint. A minimal example

docs/content/exporters/pushgateway.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
---
22
title: Pushgateway
3-
weight: 5
3+
weight: 6
44
---
55

66
The [Prometheus Pushgateway](https://github.com/prometheus/pushgateway) exists to allow ephemeral

docs/content/exporters/servlet.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
---
22
title: Servlet
3-
weight: 4
3+
weight: 5
44
---
55

66
The

docs/content/exporters/spring.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
---
22
title: Spring
3-
weight: 5
3+
weight: 7
44
---
55

66
## Alternative: Use Spring's Built-in Metrics Library

docs/content/exporters/unicode.md

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
---
2+
title: Unicode
3+
weight: 2
4+
---
5+
6+
The Prometheus Java client library allows all Unicode characters, that can be encoded as UTF-8.
7+
8+
At scrape time, some characters are replaced based on the `encoding` header according
9+
to
10+
the [Escaping scheme](https://github.com/prometheus/docs/blob/main/docs/instrumenting/escaping_schemes.md). <!-- editorconfig-checker-disable-line -->
11+
12+
For example, if you use the `underscores` escaping scheme, dots in metric and label names are
13+
replaced with underscores, so that the metric name `http.server.duration` becomes
14+
`http_server_duration`.
15+
16+
Prometheus servers that do not support Unicode at all will not pass the `encoding` header, and the
17+
Prometheus Java client library will replace dots, as well as any character that is not in the legacy
18+
character set (`a-zA-Z0-9_:`), with underscores by default.
19+
20+
When `escaping=allow-utf-8` is passed, add valid UTF-8 characters to the metric and label names
21+
without replacing them. This allows you to use dots in metric and label names, as well as
22+
other UTF-8 characters, without any replacements.
23+
24+
## PushGateway
25+
26+
When using the [Pushgateway](/exporters/pushgateway/), Unicode support has to be enabled
27+
explicitly by setting `io.prometheus.exporter.pushgateway.escapingScheme` to `allow-utf-8` in the
28+
Pushgateway configuration file - see
29+
[Pushgateway configuration]({{< relref "/config/config.md#exporter-pushgateway-properties" >}})

docs/content/otel/names.md

Lines changed: 5 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ the Prometheus server as if you had exposed Prometheus metrics directly.
1717

1818
The main steps when converting OpenTelemetry metric names to Prometheus metric names are:
1919

20-
- Replace dots with underscores.
20+
- Escape illegal characters as described in [Unicode support]
2121
- If the metric has a unit, append the unit to the metric name, like `_seconds`.
2222
- If the metric type has a suffix, append it, like `_total` for counters.
2323

@@ -29,14 +29,8 @@ OpenTelemetry's [Semantic Conventions for HTTP Metrics](https://opentelemetry.io
2929
say that if you instrument an HTTP server with OpenTelemetry, you must have a histogram named
3030
`http.server.duration`.
3131

32-
Most names defined in semantic conventions use dots. In the Prometheus server, the dot is an illegal
33-
character (this might change in future versions of the Prometheus server).
32+
Most names defined in semantic conventions use dots.
33+
Dots in metric and label names are now supported in the Prometheus Java client library as
34+
described in [Unicode support].
3435

35-
The Prometheus Java client library allows dots, so that you can use metric names and label names as
36-
defined in OpenTelemetry's semantic conventions.
37-
The dots will automatically be replaced with underscores if you expose metrics in Prometheus format,
38-
but you will see the original names with dots if you push your metrics in OpenTelemetry format.
39-
40-
That way, you can use OTel-compliant metric and label names today when instrumenting your
41-
application with the Prometheus Java client, and you are prepared in case your monitoring backend
42-
adds features in the future that require OTel-compliant instrumentation.
36+
[Unicode support]: {{< relref "../exporters/unicode.md" >}}

0 commit comments

Comments
 (0)