feat: add monitoring stack #5

sxd · 2024-10-02T16:27:44Z

Every cluster now has the podMonitor enabled and the script scripts/setup_monitoring.sh was created to deploy Prometheus Operator and Grafana Operator, using these operators we create:

Prometheus instance
Grafana Deployment
Grafana Datasource (connecting to Prometheus instance)
Grafana Dashboard (from the project cloudnative-pg/grafana-dashboard)

After the cluster are updated the Grafana dashboard can be accessed redirecting the 3000 port of the Grafana deployment (namespace grafana) on any of the two kind clusters.

The dashboard is loaded from the URL of the json file in the repository, this means that if the content is changed also the dashboard will be updated.

smiyc · 2024-10-04T20:12:46Z

I tested the setup_monitoring.sh and it failed because my laptop didn't had kustomize installed.
Therefor I would add kustomize to the prereqs in the setup.sh script or add a prereq check into setup_monitoring.sh

gbartolini · 2024-10-04T20:15:39Z

I would like to keep the requirements to a minimal level, without requiring development tools. We shouldn't forget that the audience are not developers.

smiyc · 2024-10-04T20:26:11Z

so instead of kustomize, kubectl apply -f < the directory >?

sxd · 2024-10-07T08:04:14Z

@smiyc I did some minor changes and removed the needed of kustomize, can you check again please! :D

smiyc · 2024-10-07T19:32:57Z

@sxd the monitoring stack is set up correctly without kustomize.
The cloudnativepg-dashboard GrafanaDashboard is applied, but I don't see it in grafana

sxd · 2024-10-07T19:53:00Z

@smiyc it should be in the playlists section, can you check please? :D

smiyc · 2024-10-07T20:19:10Z

@smiyc it should be in the playlists section, can you check please? :D

nope

sxd · 2024-10-07T20:23:15Z

@smiyc will create a fresh environment to test, and check, but in the meantime, can you check the status of the grafana operator ? looking for errors or something.
None of the command failed right?

smiyc · 2024-10-07T20:52:38Z

@sxd

the operator wants to download the dashboard from my external IP address...

│ 2024-10-07T20:37:58Z    ERROR    GrafanaDashboardReconciler    error fetching dashboard    {"controller": "grafanadashboard", "controllerGroup": "grafana.integreatly.org", "controllerKind": "GrafanaDashboard", "GrafanaDashboard": {"name":"cloudnativepg-dashboard","namespace":"grafana"}, "namespace": "grafana", "na │
│ me": "cloudnativepg-dashboard", "reconcileID": "818bb948-ce37-4bdb-8507-9b5730b23cb8", "dashboard": "cloudnativepg-dashboard", "error": "dial tcp XXX:443: i/o timeout"}

a nslookup raw.githubusercontent.com from inside the pod as well outside docker is looking good

...after connecting via hotspot, the dashboard was downloaded...
so the problem is in my setup, nevermind

gbartolini · 2024-10-20T20:01:00Z

@sxd can you please review this? please do not merge as we need to add Leonardo as co author.

sxd · 2024-10-20T20:06:36Z

@gbartolini on it!! will modify the commit message too ;) BTW nice changes!

Every cluster now has the podMonitor enabled and the script scripts/setup_monitoring.sh was created to deploy Prometheus Operator and Grafana Operator, using these operators we create: * Prometheus instance * Grafana Deployment * Grafana Datasource (connecting to Prometheus instance) * Grafana Dashboard (from the project cloudnative-pg/grafana-dashboard) After the cluster are updated the Grafana dashboard can be accessed redirecting the 3000 port of the Grafana deployment (namespace grafana) on any of the two kind clusters. The dashboard is loaded from the URL of the json file in the repository, this means that if the content is changed also the dashboard will be updated. Signed-off-by: Jonathan Gonzalez V. <[email protected]>

Signed-off-by: Jonathan Gonzalez V. <[email protected]>

Signed-off-by: Gabriele Bartolini <[email protected]>

smiyc · 2025-01-01T14:29:31Z

there is a variable mismatch context / region in the grafana deployment

I can't commit to this PR, do you fix it on your side or should I create a fork and commit there?

ardentperf · 2025-06-11T19:35:10Z

what's the latest status on this PR?

ardentperf · 2025-08-04T03:40:00Z

I gave this PR an end-to-end test, twice in a row and then a third time on a completely clean setup (starting with a complete fresh/clean OS install before installing the all prereqs and then the CNPG playground). Consistent results in all three attempts.

EU cluster: only the prometheus operator pod starts
US cluster: four pods - prometheus operator, prometheus deployment, grafana operator, grafana deployment

After forwarding the port, I can connect to grafana in the US. It has the CNPG dashboard however it does not have any metrics. I haven't debugged it yet, but wanted to leave the initial feedback here FYI.

nb. when I cut and past the helm-based instructions from the quick start in the official docs, these work - everything starts in both EU and US, and the metrics appear on the CNPG dashboard. My overall environment seems to work; I need to look closer to see what's happening with the monitoring script in this PR

Three other notes:

missing a teardown script
default user/password should be explicitly set, i would suggest same password that's used in official quick start documentation. also should add the user/password to the README here
similar to cert-manager and cnpg operator can run on app node instead of infra node #29 - i also saw this running on app nodes, when it probably should be on infra or control plane

sxd requested a review from a team as a code owner October 2, 2024 16:27

gbartolini force-pushed the main branch from b19010b to 98d3cb3 Compare October 7, 2024 13:39

gbartolini force-pushed the main branch from b175a27 to 7735b30 Compare October 20, 2024 15:18

gbartolini force-pushed the dev/add_monitoring_stack branch 2 times, most recently from e0bb90c to 8e3d2de Compare October 20, 2024 20:00

sxd and others added 10 commits December 31, 2024 11:04

chore: replace kustomize with kubectl kustomize

c6fad0b

Signed-off-by: Jonathan Gonzalez V. <[email protected]>

fix: typos

ebc406f

Signed-off-by: Gabriele Bartolini <[email protected]>

fix: restart CNPG operator

b065ed6

Signed-off-by: Gabriele Bartolini <[email protected]>

fix: prometheus node selector

332cc5c

Signed-off-by: Gabriele Bartolini <[email protected]>

fix: node selector grafana operator

a2b626d

Signed-off-by: Gabriele Bartolini <[email protected]>

fix: node selector grafana instances

d00b1a1

Signed-off-by: Gabriele Bartolini <[email protected]>

docs

d42774c

Signed-off-by: Gabriele Bartolini <[email protected]>

fix

296afdc

Signed-off-by: Gabriele Bartolini <[email protected]>

fix: update

64fa4d6

Signed-off-by: Gabriele Bartolini <[email protected]>

gbartolini force-pushed the dev/add_monitoring_stack branch from 06fe773 to 64fa4d6 Compare December 31, 2024 10:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: add monitoring stack #5

feat: add monitoring stack #5

Uh oh!

sxd commented Oct 2, 2024

Uh oh!

smiyc commented Oct 4, 2024

Uh oh!

gbartolini commented Oct 4, 2024

Uh oh!

smiyc commented Oct 4, 2024 •

edited

Loading

Uh oh!

sxd commented Oct 7, 2024

Uh oh!

smiyc commented Oct 7, 2024

Uh oh!

sxd commented Oct 7, 2024

Uh oh!

smiyc commented Oct 7, 2024

Uh oh!

sxd commented Oct 7, 2024 •

edited

Loading

Uh oh!

smiyc commented Oct 7, 2024

Uh oh!

gbartolini commented Oct 20, 2024

Uh oh!

sxd commented Oct 20, 2024

Uh oh!

smiyc commented Jan 1, 2025

Uh oh!

ardentperf commented Jun 11, 2025

Uh oh!

ardentperf commented Aug 4, 2025 •

edited

Loading

Uh oh!

Uh oh!

feat: add monitoring stack #5

Are you sure you want to change the base?

feat: add monitoring stack #5

Uh oh!

Conversation

sxd commented Oct 2, 2024

Uh oh!

smiyc commented Oct 4, 2024

Uh oh!

gbartolini commented Oct 4, 2024

Uh oh!

smiyc commented Oct 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sxd commented Oct 7, 2024

Uh oh!

smiyc commented Oct 7, 2024

Uh oh!

sxd commented Oct 7, 2024

Uh oh!

smiyc commented Oct 7, 2024

Uh oh!

sxd commented Oct 7, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

smiyc commented Oct 7, 2024

Uh oh!

gbartolini commented Oct 20, 2024

Uh oh!

sxd commented Oct 20, 2024

Uh oh!

smiyc commented Jan 1, 2025

Uh oh!

ardentperf commented Jun 11, 2025

Uh oh!

ardentperf commented Aug 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

smiyc commented Oct 4, 2024 •

edited

Loading

sxd commented Oct 7, 2024 •

edited

Loading

ardentperf commented Aug 4, 2025 •

edited

Loading