|
| 1 | +--- |
| 2 | +title: Kubernetes Metrics |
| 3 | +further_reading: |
| 4 | +- link: "/opentelemetry/setup/" |
| 5 | + tag: "Documentation" |
| 6 | + text: "Send OpenTelemetry Data to Datadog" |
| 7 | +- link: "https://docs.datadoghq.com/getting_started/tagging/unified_service_tagging/" |
| 8 | + tag: "Documentation" |
| 9 | + text: "Unified Service Tagging" |
| 10 | +- link: "https://github.com/DataDog/opentelemetry-examples/tree/main/guides/kubernetes" |
| 11 | + tag: "GitHub" |
| 12 | + text: "Example Collector Configurations" |
| 13 | +--- |
| 14 | + |
| 15 | +<div class="alert alert-info">The OpenTelemetry Kubernetes integration is in Preview. To request access, contact your Datadog account team.</div> |
| 16 | + |
| 17 | +## Overview |
| 18 | + |
| 19 | +Collect Kubernetes metrics using the OpenTelemetry Collector to gain comprehensive insights into your cluster's health and performance. This integration uses a combination of OpenTelemetry receivers to gather data, which populates the [Kubernetes - Overview][1] dashboard. |
| 20 | + |
| 21 | +{{< img src="/opentelemetry/collector_exporter/kubernetes_metrics.png" alt="The 'Kubernetes - Overview' dashboard, showing metrics for containers, including status and resource usage of your cluster and its containers." style="width:100%;" >}} |
| 22 | + |
| 23 | +This integration requires the [`kube-state-metrics`][8] service and uses a two-collector architecture to gather data. |
| 24 | + |
| 25 | +The `kube-state-metrics` service is a required component that generates detailed metrics about the state of Kubernetes objects like deployments, nodes, and pods. This architecture uses two separate OpenTelemetry Collectors: |
| 26 | +- A Cluster Collector, deployed as a Kubernetes Deployment, gathers cluster-wide metrics (for example, the total number of deployments). |
| 27 | +- A Node Collector, deployed as a Kubernetes DaemonSet, runs on each node to collect node-specific metrics (for example, CPU and memory usage per node). |
| 28 | + |
| 29 | +This approach ensures that cluster-level metrics are collected only once, preventing data duplication, while node-level metrics are gathered from every node in the cluster. |
| 30 | + |
| 31 | +## Setup |
| 32 | + |
| 33 | +To collect Kubernetes metrics with OpenTelemetry, you need to deploy `kube-state-metrics` and configure both of the above OpenTelemetry Collectors in your cluster. |
| 34 | + |
| 35 | +### Prerequisites |
| 36 | + |
| 37 | +* **Helm**: The setup uses Helm to deploy resources. To install Helm, see the [official Helm documentation][2]. |
| 38 | +* **Collector Image**: This guide uses the `otel/opentelemetry-collector-contrib:0.130.0` image or newer. |
| 39 | + |
| 40 | +### Installation |
| 41 | + |
| 42 | +#### 1. Install kube-state-metrics |
| 43 | + |
| 44 | +Run the following commands to add the `prometheus-community` Helm repository and install `kube-state-metrics`: |
| 45 | +```sh |
| 46 | +helm repo add prometheus-community https://prometheus-community.github.io/helm-charts |
| 47 | +helm repo update |
| 48 | +helm install kube-state-metrics prometheus-community/kube-state-metrics |
| 49 | +``` |
| 50 | + |
| 51 | +#### 2. Create a Datadog API Key Secret |
| 52 | + |
| 53 | +Create a Kubernetes secret to store your Datadog API key securely. |
| 54 | +```sh |
| 55 | +export DD_API_KEY="<YOUR_DATADOG_API_KEY>" |
| 56 | +kubectl create secret generic datadog-secret --from-literal api-key=$DD_API_KEY |
| 57 | +``` |
| 58 | + |
| 59 | +#### 3. Install the OpenTelemetry Collectors |
| 60 | + |
| 61 | +1. Add the OpenTelemetry Helm chart repository: |
| 62 | + ```sh |
| 63 | + helm repo add open-telemetry https://open-telemetry.github.io/opentelemetry-helm-charts |
| 64 | + helm repo update |
| 65 | + ``` |
| 66 | + |
| 67 | +1. Download the configuration files for the two Collectors: |
| 68 | + - [cluster-collector.yaml][3] |
| 69 | + - [daemonset-collector.yaml][4] |
| 70 | + |
| 71 | +1. Set your cluster name as an environment variable and use Helm to deploy both the Cluster and Node Collectors. Make sure the paths to the YAML files are correct. |
| 72 | + |
| 73 | + ```bash |
| 74 | + # Set your cluster name |
| 75 | + export K8S_CLUSTER_NAME="<YOUR_CLUSTER_NAME>" |
| 76 | + |
| 77 | + # Install the Node Collector (DaemonSet) |
| 78 | + helm install otel-daemon-collector open-telemetry/opentelemetry-collector \ |
| 79 | + -f daemonset-collector.yaml \ |
| 80 | + --set image.repository=otel/opentelemetry-collector-contrib \ |
| 81 | + --set image.tag=0.130.0 \ |
| 82 | + --set-string "config.processors.resource.attributes[0].key=k8s.cluster.name" \ |
| 83 | + --set-string "config.processors.resource.attributes[0].value=${K8S_CLUSTER_NAME}" |
| 84 | + |
| 85 | + # Install the Cluster Collector (Deployment) |
| 86 | + helm install otel-cluster-collector open-telemetry/opentelemetry-collector \ |
| 87 | + -f cluster-collector.yaml \ |
| 88 | + --set image.repository=otel/opentelemetry-collector-contrib \ |
| 89 | + --set image.tag=0.130.0 \ |
| 90 | + --set-string "config.processors.resource.attributes[0].key=k8s.cluster.name" \ |
| 91 | + --set-string "config.processors.resource.attributes[0].value=${K8S_CLUSTER_NAME}" |
| 92 | + ``` |
| 93 | + |
| 94 | +## Metric metadata configuration |
| 95 | + |
| 96 | +Some metrics require manual metadata updates in Datadog to ensure they are interpreted and displayed correctly. |
| 97 | + |
| 98 | +To edit a metric's metadata: |
| 99 | +1. Go to **[Metrics > Summary][6]**. |
| 100 | +1. Select the metric you want to edit. |
| 101 | +1. Click **Edit** in the side panel. |
| 102 | +1. Edit the metadata as needed. |
| 103 | +1. Click **Save**. |
| 104 | + |
| 105 | +Repeat this process for each of the metrics listed in the following table: |
| 106 | + |
| 107 | +| Metric Name | Metric Type | Unit | |
| 108 | +|--------------------------|-------------|------------------------------------------| |
| 109 | +| `k8s.pod.cpu.usage` | `Gauge` | `core` | |
| 110 | +| `k8s.pod.network.io` | `Gauge` | `byte_in_binary_bytes_family per second` | |
| 111 | +| `k8s.pod.network.errors` | `Gauge` | `byte_in_binary_bytes_family per second` | |
| 112 | + |
| 113 | +## Correlating traces with infrastructure metrics |
| 114 | + |
| 115 | +To correlate your APM traces with Kubernetes infrastructure metrics, Datadog uses [unified service tagging][7]. This requires setting three standard resource attributes on telemetry from both your application and your infrastructure. Datadog automatically maps these OpenTelemetry attributes to the standard Datadog tags (`env`, `service`, and `version`) used for correlation. |
| 116 | + |
| 117 | +The required OpenTelemetry attributes are: |
| 118 | + |
| 119 | +- `service.name` |
| 120 | +- `service.version` |
| 121 | +- `deployment.environment.name` (formerly `deployment.environment`) |
| 122 | + |
| 123 | +This ensures that telemetry from your application is consistently tagged, allowing Datadog to link traces, metrics, and logs to the same service. |
| 124 | + |
| 125 | +### Application configuration |
| 126 | + |
| 127 | +Set the following environment variables in your application's container specification to tag outgoing telemetry: |
| 128 | + |
| 129 | +```yaml |
| 130 | +spec: |
| 131 | + containers: |
| 132 | + - name: my-container |
| 133 | + env: |
| 134 | + - name: OTEL_SERVICE_NAME |
| 135 | + value: "<SERVICE_NAME>" |
| 136 | + - name: OTEL_SERVICE_VERSION |
| 137 | + value: "<SERVICE_VERSION>" |
| 138 | + - name: OTEL_ENVIRONMENT |
| 139 | + value: "<ENVIRONMENT>" |
| 140 | + - name: OTEL_RESOURCE_ATTRIBUTES |
| 141 | + value: "service.name=$(OTEL_SERVICE_NAME),service.version=$(OTEL_SERVICE_VERSION),deployment.environment.name=$(OTEL_ENVIRONMENT)" |
| 142 | +``` |
| 143 | +
|
| 144 | +### Infrastructure configuration |
| 145 | +
|
| 146 | +Add the corresponding annotations to your Kubernetes `Deployment` metadata. The `k8sattributes` processor in the Collector uses these annotations to enrich infrastructure metrics with service context. |
| 147 | + |
| 148 | +```yaml |
| 149 | +apiVersion: apps/v1 |
| 150 | +kind: Deployment |
| 151 | +metadata: |
| 152 | + name: my-app |
| 153 | + annotations: |
| 154 | + # Use resource.opentelemetry.io/ for the k8sattributes processor |
| 155 | + resource.opentelemetry.io/service.name: "<SERVICE_NAME>" |
| 156 | + resource.opentelemetry.io/service.version: "<SERVICE_VERSION>" |
| 157 | + resource.opentelemetry.io/deployment.environment.name: "<ENVIRONMENT>" |
| 158 | +spec: |
| 159 | + template: |
| 160 | + metadata: |
| 161 | + annotations: |
| 162 | + resource.opentelemetry.io/service.name: "<SERVICE_NAME>" |
| 163 | + resource.opentelemetry.io/service.version: "<SERVICE_VERSION>" |
| 164 | + resource.opentelemetry.io/deployment.environment.name: "<ENVIRONMENT>" |
| 165 | +# ... rest of the manifest |
| 166 | +``` |
| 167 | + |
| 168 | +## Data collected |
| 169 | + |
| 170 | +This integration collects metrics using several OpenTelemetry receivers. |
| 171 | + |
| 172 | +### kube-state-metrics (using Prometheus receiver) |
| 173 | + |
| 174 | +Metrics scraped from the `kube-state-metrics` endpoint provide information about the state of Kubernetes API objects. |
| 175 | + |
| 176 | +### Kubelet stats receiver |
| 177 | + |
| 178 | +The `kubeletstatsreceiver` collects metrics from the Kubelet on each node, focusing on pod, container, and volume resource usage. |
| 179 | + |
| 180 | +{{< mapping-table resource="kubeletstats.csv">}} |
| 181 | + |
| 182 | +### Kubernetes cluster receiver |
| 183 | + |
| 184 | +The `k8sclusterreceiver` collects cluster-level metrics, such as the status and count of nodes, pods, and other objects. |
| 185 | + |
| 186 | +{{< mapping-table resource="k8scluster.csv">}} |
| 187 | + |
| 188 | +### Host metrics receiver |
| 189 | + |
| 190 | +The `hostmetricsreceiver` gathers system-level metrics from each node in the cluster. |
| 191 | + |
| 192 | +{{< mapping-table resource="host.csv">}} |
| 193 | + |
| 194 | +See [OpenTelemetry Metrics Mapping][5] for more information. |
| 195 | + |
| 196 | +## Further reading |
| 197 | + |
| 198 | +{{< partial name="whats-next/whats-next.html" >}} |
| 199 | + |
| 200 | +[1]: https://app.datadoghq.com/dash/integration/86/kubernetes---overview |
| 201 | +[2]: https://helm.sh/docs/intro/install/ |
| 202 | +[3]: https://github.com/DataDog/opentelemetry-examples/blob/main/guides/kubernetes/configuration/cluster-collector.yaml |
| 203 | +[4]: https://github.com/DataDog/opentelemetry-examples/blob/main/guides/kubernetes/configuration/daemonset-collector.yaml |
| 204 | +[5]: /opentelemetry/schema_semantics/metrics_mapping/ |
| 205 | +[6]: https://app.datadoghq.com/metric/summary |
| 206 | +[7]: /getting_started/tagging/unified_service_tagging/?tab=kubernetes#opentelemetry |
| 207 | +[8]: https://github.com/kubernetes/kube-state-metrics |
0 commit comments