Skip to content

Conversation

sxd
Copy link
Member

@sxd sxd commented Oct 2, 2024

Every cluster now has the podMonitor enabled and the script scripts/setup_monitoring.sh was created to deploy Prometheus Operator and Grafana Operator, using these operators we create:

  • Prometheus instance
  • Grafana Deployment
  • Grafana Datasource (connecting to Prometheus instance)
  • Grafana Dashboard (from the project cloudnative-pg/grafana-dashboard)

After the cluster are updated the Grafana dashboard can be accessed redirecting the 3000 port of the Grafana deployment (namespace grafana) on any of the two kind clusters.

The dashboard is loaded from the URL of the json file in the repository, this means that if the content is changed also the dashboard will be updated.

@sxd sxd requested a review from a team as a code owner October 2, 2024 16:27
@smiyc
Copy link

smiyc commented Oct 4, 2024

I tested the setup_monitoring.sh and it failed because my laptop didn't had kustomize installed.
Therefor I would add kustomize to the prereqs in the setup.sh script or add a prereq check into setup_monitoring.sh

@gbartolini
Copy link
Contributor

I would like to keep the requirements to a minimal level, without requiring development tools. We shouldn't forget that the audience are not developers.

@smiyc
Copy link

smiyc commented Oct 4, 2024

so instead of kustomize, kubectl apply -f < the directory >?

@sxd
Copy link
Member Author

sxd commented Oct 7, 2024

@smiyc I did some minor changes and removed the needed of kustomize, can you check again please! :D

@smiyc
Copy link

smiyc commented Oct 7, 2024

@sxd the monitoring stack is set up correctly without kustomize.
The cloudnativepg-dashboard GrafanaDashboard is applied, but I don't see it in grafana

image

image

@sxd
Copy link
Member Author

sxd commented Oct 7, 2024

@smiyc it should be in the playlists section, can you check please? :D

@smiyc
Copy link

smiyc commented Oct 7, 2024

@smiyc it should be in the playlists section, can you check please? :D

nope

image

@sxd
Copy link
Member Author

sxd commented Oct 7, 2024

@smiyc will create a fresh environment to test, and check, but in the meantime, can you check the status of the grafana operator ? looking for errors or something.
None of the command failed right?

@smiyc
Copy link

smiyc commented Oct 7, 2024

@sxd

the operator wants to download the dashboard from my external IP address...

│ 2024-10-07T20:37:58Z    ERROR    GrafanaDashboardReconciler    error fetching dashboard    {"controller": "grafanadashboard", "controllerGroup": "grafana.integreatly.org", "controllerKind": "GrafanaDashboard", "GrafanaDashboard": {"name":"cloudnativepg-dashboard","namespace":"grafana"}, "namespace": "grafana", "na │
│ me": "cloudnativepg-dashboard", "reconcileID": "818bb948-ce37-4bdb-8507-9b5730b23cb8", "dashboard": "cloudnativepg-dashboard", "error": "dial tcp XXX:443: i/o timeout"}   

a nslookup raw.githubusercontent.com from inside the pod as well outside docker is looking good

image

...after connecting via hotspot, the dashboard was downloaded...
so the problem is in my setup, nevermind

@gbartolini gbartolini force-pushed the dev/add_monitoring_stack branch 2 times, most recently from e0bb90c to 8e3d2de Compare October 20, 2024 20:00
@gbartolini
Copy link
Contributor

@sxd can you please review this? please do not merge as we need to add Leonardo as co author.

@sxd
Copy link
Member Author

sxd commented Oct 20, 2024

@gbartolini on it!! will modify the commit message too ;) BTW nice changes!

sxd and others added 10 commits December 31, 2024 11:04
Every cluster now has the podMonitor enabled and the script
scripts/setup_monitoring.sh was created to deploy Prometheus Operator
and Grafana Operator, using these operators we create:

* Prometheus instance
* Grafana Deployment
* Grafana Datasource (connecting to Prometheus instance)
* Grafana Dashboard (from the project cloudnative-pg/grafana-dashboard)

After the cluster are updated the Grafana dashboard can be accessed
redirecting the 3000 port of the Grafana deployment (namespace grafana) on
any of the two kind clusters.

The dashboard is loaded from the URL of the json file in the repository,
this means that if the content is changed also the dashboard will be updated.

Signed-off-by: Jonathan Gonzalez V. <[email protected]>
Signed-off-by: Gabriele Bartolini <[email protected]>
Signed-off-by: Gabriele Bartolini <[email protected]>
Signed-off-by: Gabriele Bartolini <[email protected]>
Signed-off-by: Gabriele Bartolini <[email protected]>
Signed-off-by: Gabriele Bartolini <[email protected]>
Signed-off-by: Gabriele Bartolini <[email protected]>
Signed-off-by: Gabriele Bartolini <[email protected]>
Signed-off-by: Gabriele Bartolini <[email protected]>
@gbartolini gbartolini force-pushed the dev/add_monitoring_stack branch from 06fe773 to 64fa4d6 Compare December 31, 2024 10:11
@smiyc
Copy link

smiyc commented Jan 1, 2025

there is a variable mismatch context / region in the grafana deployment

image

I can't commit to this PR, do you fix it on your side or should I create a fork and commit there?

@ardentperf
Copy link
Contributor

what's the latest status on this PR?

@ardentperf
Copy link
Contributor

ardentperf commented Aug 4, 2025

I gave this PR an end-to-end test, twice in a row and then a third time on a completely clean setup (starting with a complete fresh/clean OS install before installing the all prereqs and then the CNPG playground). Consistent results in all three attempts.

EU cluster: only the prometheus operator pod starts
US cluster: four pods - prometheus operator, prometheus deployment, grafana operator, grafana deployment

After forwarding the port, I can connect to grafana in the US. It has the CNPG dashboard however it does not have any metrics. I haven't debugged it yet, but wanted to leave the initial feedback here FYI.

nb. when I cut and past the helm-based instructions from the quick start in the official docs, these work - everything starts in both EU and US, and the metrics appear on the CNPG dashboard. My overall environment seems to work; I need to look closer to see what's happening with the monitoring script in this PR

Three other notes:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants