Skip to content

Update monitoring stack to SDP 25.7 and scrape all products #284

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 26 commits into from
Aug 1, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
11de4c5
WIP: Scrape everything
sbernauer Jul 23, 2025
559a765
Move files to monitoring stack folder
sbernauer Jul 23, 2025
1f3d2bd
Update MinIO chart
sbernauer Jul 23, 2025
5c23243
Add metric collection overview dashboard
sbernauer Jul 23, 2025
e093bb9
Update HBase
sbernauer Jul 23, 2025
6dac95f
Update HDFS
sbernauer Jul 23, 2025
0672248
Update Kafka
sbernauer Jul 23, 2025
7582831
Update Trino
sbernauer Jul 23, 2025
688ebf6
Scrape NiFi 2 using mTLS
sbernauer Jul 23, 2025
424b5a8
Remove leftover code
sbernauer Jul 23, 2025
4a2b1f2
Add comment
sbernauer Jul 23, 2025
d273c34
Merge branch 'main' into chore/improve-monitoring-stack
sbernauer Jul 23, 2025
1da9eb3
Add simple NiFi Dashboard
sbernauer Jul 24, 2025
71be4a3
typos
sbernauer Jul 24, 2025
50ed7bd
Merge branch 'main' into chore/improve-monitoring-stack
sbernauer Jul 25, 2025
2ad5dca
mention cert rotation
sbernauer Jul 25, 2025
8a8b391
change links
sbernauer Jul 25, 2025
71e8ab1
give nifi dashboard a different id
sbernauer Jul 25, 2025
6c75850
update nifi dashboard
sbernauer Jul 25, 2025
64b7dd9
fix: Add Kafka to metric overview dashboard
sbernauer Jul 31, 2025
03a78fc
fix: Remove kafka from general ServiceMonitor
sbernauer Jul 31, 2025
3ea7f4e
mention what the list is
sbernauer Jul 31, 2025
1a1ba37
scrape Spark drivers
sbernauer Jul 31, 2025
093bb45
Update stacks/monitoring/prometheus-service-monitors.yaml
sbernauer Jul 31, 2025
d3e1a39
Improve comment
sbernauer Jul 31, 2025
57fc5df
Merge branch 'main' into chore/improve-monitoring-stack
sbernauer Jul 31, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
39 changes: 0 additions & 39 deletions stacks/_templates/prometheus-service-monitor.yaml

This file was deleted.

Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: create-prometheus-tls-certificate-serviceaccount
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: create-prometheus-tls-certificate-rolebinding
subjects:
- kind: ServiceAccount
name: create-prometheus-tls-certificate-serviceaccount
namespace: {{ NAMESPACE }}
roleRef:
kind: Role
name: create-prometheus-tls-certificate-role
apiGroup: rbac.authorization.k8s.io
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: create-prometheus-tls-certificate-role
rules:
- apiGroups: [""]
resources: ["secrets"]
verbs: ["get", "create", "patch"]
- apiGroups: [""]
resources: ["pods"]
verbs: ["delete"]
76 changes: 76 additions & 0 deletions stacks/monitoring/create-prometheus-tls-certificate.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: create-prometheus-tls-certificate
labels:
app: create-prometheus-tls-certificate
spec:
replicas: 1
selector:
matchLabels:
app: create-prometheus-tls-certificate
template:
metadata:
labels:
app: create-prometheus-tls-certificate
spec:
serviceAccountName: create-prometheus-tls-certificate-serviceaccount
containers:
- name: create-prometheus-tls-certificate
image: oci.stackable.tech/sdp/tools:1.0.0-stackable0.0.0-dev
env:
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
command:
- bash
- -euo
- pipefail
- -c
- |
# "kubectl create secret" fails on existing Secrets, so we "kubectl apply" instead
kubectl create secret generic prometheus-tls-certificate \
--from-file=/prometheus-tls-certificate/ca.crt \
--from-file=/prometheus-tls-certificate/tls.crt \
--from-file=/prometheus-tls-certificate/tls.key \
--dry-run=client -o yaml \
| kubectl apply -f -

echo Sleeping 6 hours before deleting my own Pod
sleep 21600 # 6 * 60 * 60

echo "Deleting our own Pod, so that it gets re-created and secret-operator issues a new certificate (only crash-looping the container is not enough!)"
kubectl --namespace "$POD_NAMESPACE" delete pod "$POD_NAME"
exit 0
volumeMounts:
- name: prometheus-tls-certificate
mountPath: /prometheus-tls-certificate
volumes:
- name: prometheus-tls-certificate
ephemeral:
volumeClaimTemplate:
metadata:
annotations:
# Highly professional tests have shown that Prometheus is able to handle the
# certificate rotation :)
# You can change the certificate lifetime here for easier testing:
# secrets.stackable.tech/backend.autotls.cert.lifetime: "1d"
secrets.stackable.tech/class: "tls"
secrets.stackable.tech/format: "tls-pem"
secrets.stackable.tech/scope: "service=prometheus"
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: "1"
storageClassName: secrets.stackable.tech
volumeMode: Filesystem
securityContext:
fsGroup: 1000
Loading