Kubernetes Observability Stack — Prometheus + Grafana + Alertmanager

A production-grade observability stack for EKS deployed via Helm and managed through GitOps. Covers the three pillars of observability: metrics (Prometheus), visualization (Grafana), and alerting (Alertmanager). Built for multi-tenant clusters with per-namespace dashboards and team-scoped alert routing.

Architecture Overview

┌──────────────────────────────────────────────────────────────────────┐
│                          EKS Cluster                                 │
│                                                                      │
│  ┌────────────────────────────────────────────────────────────────┐  │
│  │                    monitoring namespace                        │  │
│  │                                                                │  │
│  │  ┌─────────────┐   scrapes   ┌──────────────────────────────┐ │  │
│  │  │  Prometheus │ ──────────► │  ServiceMonitor / PodMonitor │ │  │
│  │  │  (metrics)  │             │  (per-namespace targets)     │ │  │
│  │  └──────┬──────┘             └──────────────────────────────┘ │  │
│  │         │ evaluates                                            │  │
│  │         ▼                                                      │  │
│  │  ┌─────────────┐  fires     ┌──────────────────────────────┐  │  │
│  │  │ PrometheusRule│ ────────► │     Alertmanager             │  │  │
│  │  │ (alert rules) │          │  routes → Slack / PagerDuty  │  │  │
│  │  └──────────────┘           └──────────────────────────────┘  │  │
│  │                                                                │  │
│  │  ┌─────────────┐                                              │  │
│  │  │   Grafana   │ ◄── queries Prometheus datasource           │  │
│  │  │ (dashboards)│     reads ConfigMaps for dashboard JSON      │  │
│  │  └─────────────┘                                              │  │
│  └────────────────────────────────────────────────────────────────┘  │
│                                                                      │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐              │
│  │  team-alpha  │  │  team-beta   │  │  team-gamma  │              │
│  │  (metrics)   │  │  (metrics)   │  │  (metrics)   │              │
│  └──────────────┘  └──────────────┘  └──────────────┘              │
└──────────────────────────────────────────────────────────────────────┘
                              │
               ┌──────────────┼──────────────┐
               ▼              ▼              ▼
            Slack         PagerDuty      CloudWatch
         (#alerts)        (on-call)      (audit trail)

Features

kube-prometheus-stack — battle-tested Helm chart bundling Prometheus Operator, Grafana, Alertmanager, node-exporter, kube-state-metrics
ServiceMonitor/PodMonitor — per-team scrape configs using label selectors, no manual Prometheus config edits
PrometheusRules — alert rules for node health, pod crash loops, quota exhaustion, and API server latency
Grafana dashboards as code — dashboards stored as ConfigMaps, provisioned automatically at startup
Multi-tenant alert routing — Alertmanager routes alerts to team-specific Slack channels based on namespace label
Persistent storage — Prometheus data on EBS gp3, Grafana config on EBS gp3
IRSA — Prometheus uses IAM role via IRSA for CloudWatch remote write (no static credentials)

Repository Structure

.
├── prometheus/
│   ├── config/           # kube-prometheus-stack Helm values
│   └── rules/            # PrometheusRule manifests (alert rules)
├── grafana/
│   ├── dashboards/       # Dashboard JSON stored as ConfigMaps
│   └── datasources/      # Grafana datasource config
├── alertmanager/         # Alertmanager routing config
└── manifests/            # ServiceMonitor and PodMonitor examples

Quick Start

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update

kubectl create namespace monitoring

helm upgrade --install kube-prometheus-stack \
  prometheus-community/kube-prometheus-stack \
  --namespace monitoring \
  --values prometheus/config/helm-values.yaml \
  --wait

# Apply custom alert rules
kubectl apply -f prometheus/rules/

# Apply dashboards
kubectl apply -f grafana/dashboards/

# Apply ServiceMonitors
kubectl apply -f manifests/

Access Grafana

kubectl port-forward svc/kube-prometheus-stack-grafana \
  -n monitoring 3000:80

# Default credentials (change immediately)
# Username: admin
# Password: kubectl get secret kube-prometheus-stack-grafana \
#   -n monitoring -o jsonpath="{.data.admin-password}" | base64 -d

Alert Routing

Alerts are routed to teams based on the namespace label:

Namespace	Slack Channel	PagerDuty
team-alpha	#team-alpha-alerts	team-alpha-pd
team-beta	#team-beta-alerts	team-beta-pd
team-gamma	#team-gamma-alerts	team-gamma-pd
platform/*	#platform-alerts	platform-sre-pd

Key Dashboards

Dashboard	What It Shows
EKS Cluster Overview	Node CPU/memory, pod count, API server latency
Namespace Resource Usage	Per-team CPU/memory vs quota, pod saturation
Pod Health	Restart counts, OOMKill events, pending pods
Kubernetes API Server	Request rates, error rates, etcd latency
Node Exporter	Disk I/O, network throughput, filesystem usage

Related Repositories

Repo	Purpose
aws-eks-platform	Terraform — VPC, EKS, IAM
gitops-eks-platform	GitOps — ArgoCD workloads
k8s-security-platform	Security — Gatekeeper + Falco
k8s-multi-tenancy	Multi-tenancy — RBAC, Quotas
k8s-observability-stack (this repo)	Observability — Prometheus + Grafana

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Kubernetes Observability Stack — Prometheus + Grafana + Alertmanager

Architecture Overview

Features

Repository Structure

Quick Start

Access Grafana

Alert Routing

Key Dashboards

Related Repositories

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.github/workflows		.github/workflows
alertmanager		alertmanager
grafana/dashboards		grafana/dashboards
manifests		manifests
prometheus		prometheus
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

Kubernetes Observability Stack — Prometheus + Grafana + Alertmanager

Architecture Overview

Features

Repository Structure

Quick Start

Access Grafana

Alert Routing

Key Dashboards

Related Repositories

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages