accuracy

rohansonecha · rohansonecha · commit b8c2227eed58 · 2025-11-24T10:50:15.000-08:00
diff --git a/docs/source/reference/api-server/examples/api-server-gpu-metrics-setup.rst b/docs/source/reference/api-server/examples/api-server-gpu-metrics-setup.rst
@@ -18,33 +18,11 @@ Before you begin, make sure your Kubernetes cluster meets the following
 requirements:
 
 * **NVIDIA GPUs** are available on your worker nodes.
-* The Prometheus Operator is installed.
 * The `NVIDIA device plugin <https://github.com/NVIDIA/k8s-device-plugin>`_ or the NVIDIA **GPU Operator** is installed.
 * **DCGM-Exporter** is running on the cluster and exposes metrics on
   port ``9400``.  Most GPU Operator installations already deploy DCGM-Exporter for you.
 * `Node Exporter <https://prometheus.io/docs/guides/node-exporter/>`_ is running on the cluster and exposes metrics on port ``9100``. This is required only if you want to monitor the CPU and Memory metrics.
 
-Installing the Prometheus Operator and Node Exporter
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-The Prometheus Operator is necessary for the DCGM-Exporter to start properly. The Prometheus Operator and Node Exporter can be
-deployed using the prometheus community helm chart:
-
-.. code-block:: bash
-
-    helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
-    helm repo update
-
-    helm upgrade --install kube-prometheus prometheus-community/kube-prometheus-stack \
-    --namespace skypilot \
-    --create-namespace \
-    --set prometheus.enabled=false \
-    --set alertmanager.enabled=false \
-    --set grafana.enabled=false \
-    --set kubeStateMetrics.enabled=false \
-    --set nodeExporter.enabled=true \
-    --set prometheusOperator.enabled=true
-
 Check the dcgm exporter setup
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
@@ -104,6 +82,27 @@ If any are missing, edit the Service to add them.
 
 where ``$NAMESPACE`` is the DCGM-Exporter namespace.
 
+Deploying the Prometheus Operator and Node Exporter
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The Prometheus Operator and Node Exporter can be
+deployed using the prometheus community helm chart:
+
+.. code-block:: bash
+
+    helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
+    helm repo update
+
+    helm upgrade --install kube-prometheus prometheus-community/kube-prometheus-stack \
+    --namespace skypilot \
+    --create-namespace \
+    --set prometheus.enabled=false \
+    --set alertmanager.enabled=false \
+    --set grafana.enabled=false \
+    --set kubeStateMetrics.enabled=false \
+    --set nodeExporter.enabled=true \
+    --set prometheusOperator.enabled=true
+
 Check the node exporter setup
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
diff --git a/docs/source/reference/api-server/examples/example-deploy-gke-nebius-okta.rst b/docs/source/reference/api-server/examples/example-deploy-gke-nebius-okta.rst
@@ -430,6 +430,34 @@ If you are using Nebius Kubernetes cluster, you can setup GPU metrics in the clu
 
 1. Install the Prometheus operator.
 
+On Nebius console, in the detail page of the Nebius Kubernetes cluster, go to ``Applications`` -> Search for ``Prometheus Operator`` -> ``Deploy`` -> Enter ``skypilot`` for the ``Namespace`` -> ``Deploy application``.
+
+.. image:: ../../../images/metrics/search-prom-operator.png
+    :alt: Search for Prometheus Operator
+    :align: center
+    :width: 60%
+
+.. image:: ../../../images/metrics/deploy-prom-operator.png
+    :alt: Deploy Prometheus Operator
+    :align: center
+    :width: 60%
+
+Wait for the Prometheus operator to be installed, the status badge will become ``Deployed``.
+
+.. image:: ../../../images/metrics/status-prom-operator.png
+    :alt: Status of Prometheus Operator
+    :align: center
+    :width: 60%
+
+You can also check the Pod status to verify the installation.
+
+.. code-block:: bash
+
+  kubectl get pods -n skypilot
+
+If there are any issues with the installation like the pods stuck in ``ErrImagePull`` or ``ImagePullBackOff``,
+you can install the Prometheus operator manually using the community helm chart:
+
 .. code-block:: bash
 
   helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
@@ -445,11 +473,7 @@ If you are using Nebius Kubernetes cluster, you can setup GPU metrics in the clu
     --set nodeExporter.enabled=true \
     --set prometheusOperator.enabled=true
 
-You can check the Pod status to verify the installation.
 
-.. code-block:: bash
-
-  kubectl get pods -n skypilot
 
 By default, the CPU and memory metrics exported by node exporter do not include the ``node`` label, which is required for the SkyPilot dashboard to display the metrics. You can add the ``node`` label to the metrics by applying the following config to the node exporter service monitor resource: