Kepler is a Kubernetes-based Efficient Power Level Exporter that measures energy consumption at container, pod, VM, process, and node levels. This guide provides essential context for AI agents contributing to the project.
# Build and validate (run before any PR)
make all # clean -> fmt -> lint -> vet -> build -> test
# Individual targets
make fmt # Format code (go fmt)
make vet # Static analysis (go vet)
make lint # Linting (golangci-lint, 5m timeout locally, 3m in CI)
make test # Tests with race detection
make build # Production binary
make build-debug # Debug binary with race detection
make coverage # HTML coverage report
make deps # Tidy and verify go.mod
make clean # Clean artifacts
make gen-metrics-docs # Regenerate docs/user/metrics.md (do NOT edit manually)
# Test a specific package (preferred when working in one area)
CGO_ENABLED=1 go test -v -race ./internal/monitor/...
CGO_ENABLED=1 go test -v -race ./internal/device/...
CGO_ENABLED=1 go test -v -race ./config/...
# Local integration testing
cd compose/dev && docker compose up --build -d # Kepler + Prometheus + Grafana
cd compose/dev && docker compose down --volumes # cleanupAccess points (Docker Compose):
- Kepler Metrics: http://localhost:28283/metrics
- Prometheus: http://localhost:29090
- Grafana: http://localhost:23000 (credentials: admin/admin)
- Read any project files
- Run:
make fmt,make vet,make lint,make test,make build,make coverage - Run:
docker compose up/downincompose/dev/ - Create or update test files (with SPDX headers)
- Update code documentation and comments
- Fix linter errors and race conditions
- Refactor code following existing patterns
- Update
docs/(except auto-generateddocs/user/metrics.md)
- Committing changes (NEVER commit unless explicitly asked)
- Any git push, rebase, or reset operation
- Modifying
go.modor adding new dependencies - Modifying CI/CD configs (
.github/, Makefile) - Modifying
AGENTS.md,CLAUDE.md, orGOVERNANCE.md - Deploying to clusters (
make deploy,kubectl apply) - Architectural changes (create Enhancement Proposal first; see
docs/developer/proposal/EP_TEMPLATE.md)
- Force push to
main - Skip hooks (
--no-verify,--no-gpg-sign) - Commit without DCO sign-off (
-sflag) - Disable or skip race detection in tests
- Manually edit
docs/user/metrics.md(auto-generated)
All Go files MUST have these SPDX headers as the first two lines:
// SPDX-FileCopyrightText: 2025 The Kepler Authors
// SPDX-License-Identifier: Apache-2.0All code MUST be thread-safe. Tests run with -race and this is non-negotiable.
Always use make targets over raw go commands — only run raw commands if no make target exists for that operation.
kepler/
├── cmd/kepler/ # Main entry point
├── config/ # Configuration (builder, validation, CLI flags)
├── internal/ # Core implementation
│ ├── device/ # Hardware abstraction (RAPL, HWMon, GPU)
│ ├── exporter/ # Prometheus, stdout exporters
│ ├── k8s/ # Kubernetes integration
│ ├── logger/ # Logging setup
│ ├── monitor/ # Power monitoring and attribution
│ ├── platform/ # Platform integrations (Redfish)
│ ├── resource/ # Process/container tracking
│ ├── server/ # HTTP server
│ └── service/ # Service framework
├── test/ # E2E test suites
├── docs/ # Documentation
│ ├── user/ # User guides
│ └── developer/ # Architecture, proposals
├── manifests/ # Kubernetes/Helm manifests
└── hack/ # Development scripts
- Race Detection: All tests MUST pass with
-raceflag (enforced bymake test) - Coverage: Maintain or improve existing coverage (tracked via Codecov)
- Framework: Use
testifyfor assertions and mocking- Import:
github.com/stretchr/testify/assert - Import:
github.com/stretchr/testify/mock
- Import:
// SPDX-FileCopyrightText: 2025 The Kepler Authors
// SPDX-License-Identifier: Apache-2.0
// Verify interface implementations
var _ PowerDataProvider = (*MockPowerMonitor)(nil)
// Use t.Helper() in test helpers
func assertMetricValue(t *testing.T, expected, actual float64) {
t.Helper()
assert.Equal(t, expected, actual)
}
// Include cleanup logic
func TestExample(t *testing.T) {
resource := setupResource()
t.Cleanup(func() {
resource.Close()
})
// test logic
}make test-e2e && sudo ./bin/kepler-e2e.test -test.v # Bare-metal (requires RAPL)
make test-e2e-k8s # Kubernetes (requires: make cluster-up image deploy)See docs/developer/e2e-testing.md for details.
Default branch: main. PRs target main unless stated otherwise.
Commit format: Conventional Commits
with DCO sign-off. Enforced by commitlint pre-commit hook.
git commit -s -m "feat(monitor): add terminated workload tracking"
git commit -s -m "fix(exporter): resolve race condition in metrics handler"
git commit -s -m "docs: update architecture diagram"
# Types: feat, fix, docs, style, refactor, test, chore, ci, perf- Title: Use conventional commit format (e.g.,
feat: add feature X) - Description: Reference related issues (
Closes #123) - Scope: One feature/fix per PR (keep focused)
- Tests: Include tests for new features and bug fixes
- Documentation: Update docs if behavior changes
- No Breaking Changes: Without proper deprecation and migration guide
All PRs must pass:
make fmt- Code formattingmake vet- Static analysismake lint- Linting (golangci-lint with 3m timeout in CI)make test- Tests with race detectionmake gen-metrics-docs- Metrics documentation must be up-to-date- Pre-commit hooks (markdownlint, yamllint, commitlint, reuse-lint, shellcheck)
- Container image builds
- OpenSSF Scorecard
- Metrics Documentation: Auto-generated via
make gen-metrics-docs- don't manually editdocs/user/metrics.md - Race Conditions: All code must be thread-safe; tests run with
-race - Commit Sign-off: Forgot
-s? Amend:git commit --amend -s --no-edit - Pre-commit Failures: Run
pre-commit run --all-filesto check everything
- Service-Oriented Design: Components implement
service.Serviceinterface (seeinternal/service/service.go) - Dependency Injection: Services composed at startup in
cmd/kepler/main.go - Single Writer, Multiple Readers: Power monitor updates atomically; exporters read snapshots via
PowerDataProviderinterface - Interface-Based Abstractions: Hardware, resources, and exporters use interfaces
- Graceful Shutdown: All services handle context cancellation properly
- Hierarchical: CLI flags override YAML files, which override defaults
- Dev Options: Config keys prefixed with
dev.*are not exposed as CLI flags - Validation: All configs validated at startup; fail fast on errors
- Logging:
log/slog(structured logging, stdlib) - Metrics:
prometheus/client_golang - Kubernetes Client:
k8s.io/client-go - Service Management:
oklog/run - CLI Parsing:
alecthomas/kingpin/v2 - Testing:
stretchr/testify - Concurrency:
golang.org/x/sync(singleflight)
When making design decisions, follow these architectural principles
(reference: docs/developer/design/architecture/principles.md):
- Fair Power Allocation - Track terminated workloads to prevent unfair attribution
- Data Consistency & Mathematical Integrity - Maintain atomic snapshots; validate energy conservation
- Computation-Presentation Separation - Separate data models (Monitor) from export formats (Exporters) via
PowerDataProviderinterface - Data Freshness Guarantee - Configurable staleness threshold (default 10s); automatic refresh
- Deterministic Processing - Thread-safe, race-free operations with immutable snapshots
- Prefer Package Reuse - Use battle-tested libraries over custom implementations
- Configurable Collection & Exposure - Users control which metrics to collect/expose
- Implementation Abstraction - Interface-based design for flexibility
- Simple Configuration - Hierarchical config: CLI flags > YAML files > Defaults
- Error Handling: Always handle errors explicitly; use structured logging for context
- Idiomatic Go: Follow Effective Go
- Security: Validate inputs; avoid injection vulnerabilities (command, SQL, XSS)
For significant changes, use the template at docs/developer/proposal/EP_TEMPLATE.md.
Required sections: Problem Statement, Goals/Non-Goals, Detailed Design, Testing Plan,
Migration Strategy.
- Do not invent facts about the codebase — verify by reading the code before making claims or changes.
- Do not over-engineer; implement exactly what is asked, nothing more.
- If requirements are unclear, ask the user rather than guessing.
- If a test fails unexpectedly, report the failure. Do not modify the test to make it pass without understanding why it failed.
- If you cannot find a file, interface, or dependency, ask rather than creating new ones.
- If you are unsure whether a change is architectural, treat it as requiring approval.
- Run
make helpto see all available Makefile targets.
- Architecture:
docs/developer/design/architecture/ - Contributing:
CONTRIBUTING.md - Installation:
docs/user/installation.md - Configuration:
docs/user/configuration.md - Metrics catalog:
docs/user/metrics.md(auto-generated) - Enhancement Proposals:
docs/developer/proposal/EP_TEMPLATE.md - Governance:
GOVERNANCE.md - Security:
SECURITY.md - Issues:
github.com/sustainable-computing-io/kepler/issues