You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/source/reference/api-server/examples/api-server-gpu-metrics-setup.rst
+21-22Lines changed: 21 additions & 22 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -18,33 +18,11 @@ Before you begin, make sure your Kubernetes cluster meets the following
18
18
requirements:
19
19
20
20
* **NVIDIA GPUs** are available on your worker nodes.
21
-
* The Prometheus Operator is installed.
22
21
* The `NVIDIA device plugin <https://github.com/NVIDIA/k8s-device-plugin>`_ or the NVIDIA **GPU Operator** is installed.
23
22
* **DCGM-Exporter** is running on the cluster and exposes metrics on
24
23
port ``9400``. Most GPU Operator installations already deploy DCGM-Exporter for you.
25
24
* `Node Exporter <https://prometheus.io/docs/guides/node-exporter/>`_ is running on the cluster and exposes metrics on port ``9100``. This is required only if you want to monitor the CPU and Memory metrics.
26
25
27
-
Installing the Prometheus Operator and Node Exporter
Copy file name to clipboardExpand all lines: docs/source/reference/api-server/examples/example-deploy-gke-nebius-okta.rst
+28-4Lines changed: 28 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -430,6 +430,34 @@ If you are using Nebius Kubernetes cluster, you can setup GPU metrics in the clu
430
430
431
431
1. Install the Prometheus operator.
432
432
433
+
On Nebius console, in the detail page of the Nebius Kubernetes cluster, go to ``Applications`` -> Search for ``Prometheus Operator`` -> ``Deploy`` -> Enter ``skypilot`` for the ``Namespace`` -> ``Deploy application``.
@@ -445,11 +473,7 @@ If you are using Nebius Kubernetes cluster, you can setup GPU metrics in the clu
445
473
--set nodeExporter.enabled=true \
446
474
--set prometheusOperator.enabled=true
447
475
448
-
You can check the Pod status to verify the installation.
449
476
450
-
.. code-block:: bash
451
-
452
-
kubectl get pods -n skypilot
453
477
454
478
By default, the CPU and memory metrics exported by node exporter do not include the ``node`` label, which is required for the SkyPilot dashboard to display the metrics. You can add the ``node`` label to the metrics by applying the following config to the node exporter service monitor resource:
0 commit comments