Skip to content

Conversation

eldios
Copy link
Collaborator

@eldios eldios commented Oct 17, 2025

Summary

This PR adds optional Grafana Alloy support to enable centralized observability for validators by collecting and forwarding metrics, logs, and traces to a remote monitoring infrastructure.

Changes

Docker Compose

  • Optional Alloy monitoring via docker-compose.alloy.yml override file

    • Opt-in: docker-compose -f docker-compose.yml -f docker-compose.alloy.yml up -d
    • Default behavior unchanged: validators run without monitoring
  • Alloy configuration (docker/alloy-config.river)

    • Metrics: Scrapes Prometheus metrics from proxy and shard services (port 21100)
    • Logs: Discovers and streams Docker container logs
    • Traces: OTLP receiver on ports 4317 (gRPC) and 4318 (HTTP)
    • Remote forwarding: Optionally forwards to central Prometheus, Loki, and Tempo

Kubernetes Helm Chart

  • Alloy dependency in Chart.yaml (version 1.3.1)

  • Alloy config template (alloy-config.river.tpl)

    • Kubernetes pod/service discovery
    • Scrapes metrics from linera-proxy and linera-shard pods
    • Collects pod logs via Kubernetes API
    • Deployed as DaemonSet for distributed collection
  • Configuration in values-local.yaml.gotmpl

    • Controlled by LINERA_HELMFILE_SET_ALLOY_ENABLED env var (default: false)
    • Environment-based credentials
    • Cluster/validator labels for multi-cluster visibility
    • Resource limits (CPU: 500m, Memory: 512Mi)
  • Updated README with monitoring reference

Configuration

Docker Compose

Basic (local metrics only):

docker-compose -f docker-compose.yml -f docker-compose.alloy.yml up -d

With remote endpoints:

# Prometheus (OTLP format)
export PROMETHEUS_OTLP_URL="https://your-prometheus/otlp"
export PROMETHEUS_OTLP_USER="username"
export PROMETHEUS_OTLP_PASS="password"

# Loki (logs)
export LOKI_PUSH_URL="https://your-loki/loki/api/v1/push"
export LOKI_PUSH_USER="username"
export LOKI_PUSH_PASS="password"

# Tempo (traces)
export TEMPO_OTLP_URL="https://your-tempo/otlp"
export TEMPO_OTLP_USER="username"
export TEMPO_OTLP_PASS="password"

# Validator identification
export HOSTNAME="validator-01"

docker-compose -f docker-compose.yml -f docker-compose.alloy.yml up -d

Kubernetes

# Enable Alloy
export LINERA_HELMFILE_SET_ALLOY_ENABLED=true

# Identification
export LINERA_HELMFILE_SET_CLUSTER_NAME="production-gke"
export LINERA_HELMFILE_SET_VALIDATOR_NAME="validator-01"

# Optional: Remote endpoints (same format as Docker)
export PROMETHEUS_OTLP_URL="https://..."
export PROMETHEUS_OTLP_USER="username"
export PROMETHEUS_OTLP_PASS="password"
# ... (Loki and Tempo)

# Deploy
lineractl deploy ...

Key Features

  • Fully optional: No changes to default behavior, validators work without monitoring
  • Centralized observability: Send data to a single monitoring stack
  • Standards-based: OpenTelemetry Protocol (OTLP) and Prometheus remote write
  • Secure: TLS with certificate verification, basic auth support
  • Auto-discovery: Docker containers and Kubernetes pods automatically detected
  • Configurable: All endpoints and credentials via environment variables
  • Distributed collection: Kubernetes DaemonSet scales across nodes
  • Multi-validator visibility: Monitor entire validator fleet

Architecture

Docker Compose: Alloy container scrapes metrics from services, collects container logs via Docker socket

Kubernetes: Alloy DaemonSet (one pod per node) discovers and collects from validator pods using Kubernetes API

Verification

Docker Compose

# Check Alloy status
docker-compose -f docker-compose.yml -f docker-compose.alloy.yml ps alloy

# View logs
docker-compose logs alloy

# Test metrics endpoint
curl http://localhost:12345/metrics

Kubernetes

# Check DaemonSet
kubectl get daemonset alloy -n <namespace>

# Check pods
kubectl get pods -l app.kubernetes.io/name=alloy -n <namespace>

# View logs
kubectl logs -l app.kubernetes.io/name=alloy -n <namespace> --tail=100

Files Changed

  • docker/docker-compose.alloy.yml - Optional Alloy override
  • docker/alloy-config.river - Alloy configuration for Docker
  • kubernetes/linera-validator/Chart.yaml - Alloy dependency
  • kubernetes/linera-validator/Chart.lock - Updated lock file
  • kubernetes/linera-validator/charts/alloy-1.3.1.tgz - Alloy Helm chart
  • kubernetes/linera-validator/alloy-config.river.tpl - Alloy config template
  • kubernetes/linera-validator/values-local.yaml.gotmpl - Alloy configuration
  • kubernetes/linera-validator/README.md - Added monitoring reference

Benefits

  • Internal validators can send telemetry to Linera's central monitoring platform
  • External partners can optionally enable monitoring to share telemetry
  • No impact on validators that don't need/want monitoring
  • Reduces local resource usage when using central storage
  • Enables fleet-wide monitoring and alerting

Future Work

  • Add OpenTelemetry instrumentation to linera-proxy and linera-shard for distributed tracing
  • Configure alerting rules for critical validator events
  • Create custom dashboards for validator-specific metrics

🤖 Generated with Claude Code

Co-Authored-By: Claude [email protected]

@eldios eldios self-assigned this Oct 17, 2025
@eldios eldios changed the title Add Grafana Alloy for centralized observability (metrics, logs, traces) [WIP] Add Grafana Alloy for centralized observability (metrics, logs, traces) Oct 17, 2025
@eldios eldios requested a review from ndr-ds October 20, 2025 13:55
@eldios eldios changed the title [WIP] Add Grafana Alloy for centralized observability (metrics, logs, traces) Add Grafana Alloy for centralized observability (metrics, logs, traces) Oct 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant