Skip to content

317 Add Observability Stack for this project#319

Merged
sthanikan2000 merged 30 commits intomainfrom
317-metrics
Dec 2, 2025
Merged

317 Add Observability Stack for this project#319
sthanikan2000 merged 30 commits intomainfrom
317-metrics

Conversation

@ginaxu1
Copy link
Copy Markdown
Collaborator

@ginaxu1 ginaxu1 commented Nov 17, 2025

THIS IS ONLY PART 1 of #317, in subsequent PRs will actually connect the services to use this observability stack

Summary

This PR introduces a centralized observability stack at repository root (observability/) providing unified metrics collection and visualization for all Go services using Prometheus and Grafana; modeled after https://github.com/LSFLK/superapp-mobile/tree/main/observability

Why this is needed:

  • No centralized view of system health across services
  • Observability setup was nested in exchange/, less discoverable
  • Missing metrics for root-level services (api-server-go, audit-service)

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update
  • Refactoring (no functional changes)
  • Performance improvement
  • Other (please describe):

Changes Made

Created

  • observability/docker-compose.yml - Prometheus + Grafana with data persistence
  • observability/prometheus/prometheus.yml - All 5 service targets with labels
  • observability/README.md - Complete documentation (metrics catalog, queries, troubleshooting)
  • observability/grafana/dashboards/go-services-metrics.json - Pre-configured dashboard

Modified

  • Moved exchange/monitoring/observability/ (root level)
  • Enhanced docker-compose.yml: Added volumes, retention (30d), updated container names
  • Enhanced prometheus.yml: Added global/service labels, organized by location, added api-server-go & audit-service
  • Updated docs: exchange/orchestration-engine-go/OBSERVABILITY.md, exchange/pkg/monitoring/README.md

Service Coverage

All 5 Go services configured: orchestration-engine, consent-engine, policy-decision-point, api-server-go, audit-service

Testing

  • I have tested this change locally
  • All existing tests pass

Validation Results:

  • Docker Compose config validates
  • All 5 service targets configured in Prometheus
  • Grafana provisioning files present
  • Dashboard JSON valid
  • All documentation paths updated

Note: api-server-go and audit-service are in Prometheus config but may show DOWN until they expose /metrics endpoints.

Checklist

  • My code follows the project's style guidelines
  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have checked that there are no merge conflicts

Related Issues

This PR addresses the need for centralized observability across all services in the OpenDIF MVP project.

Quick Start

cd observability && docker compose up -d

Access:

Service Metrics: All 5 services expose /metrics (ports: 3000, 3001, 4000, 8081, 8082)

Additional Notes

Metrics: All services using exchange/pkg/monitoring expose HTTP, external calls, workflows, business events, cache, and Go runtime metrics.

Service Status:

  • ✅ Instrumented: orchestration-engine, consent-engine, policy-decision-point
  • ⚠️ In config: api-server-go, audit-service (need /metrics endpoint)

Deployment Notes

Local: cd observability && docker compose up -d (data persists in volumes)

Production: Current setup is for local dev only. For production: enable auth, network security, HA setup, consider managed services.

Migration: exchange/monitoring/observability/, container names: oe-*opendif-*. No breaking changes.

@ginaxu1 ginaxu1 changed the title 317 metrics 317 Add Performance Metrics & Monitoring Nov 17, 2025

This comment was marked as outdated.

This comment was marked as outdated.

@ginaxu1 ginaxu1 marked this pull request as draft November 19, 2025 05:22
@ginaxu1 ginaxu1 marked this pull request as ready for review November 21, 2025 21:37
@ginaxu1 ginaxu1 force-pushed the 317-metrics branch 3 times, most recently from d48d922 to f1e297b Compare November 21, 2025 22:16
@ginaxu1 ginaxu1 changed the title 317 Add Performance Metrics & Monitoring 317 Add Performance Metrics & Monitoring to OE Nov 21, 2025
@ginaxu1 ginaxu1 force-pushed the 317-metrics branch 3 times, most recently from 4972bf4 to 2e551ab Compare November 25, 2025 01:32

This comment was marked as outdated.

@ginaxu1 ginaxu1 changed the title 317 Add Performance Metrics & Monitoring to OE 317 Add Observability Stack for this project Nov 27, 2025
@ginaxu1 ginaxu1 requested a review from sthanikan2000 December 2, 2025 06:08
Copy link
Copy Markdown
Member

@sthanikan2000 sthanikan2000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGFM!

@sthanikan2000 sthanikan2000 merged commit 74c5697 into main Dec 2, 2025
2 checks passed
@sthanikan2000 sthanikan2000 deleted the 317-metrics branch December 2, 2025 06:38
@sthanikan2000 sthanikan2000 restored the 317-metrics branch December 5, 2025 07:08
@sthanikan2000 sthanikan2000 deleted the 317-metrics branch December 5, 2025 07:12
sthanikan2000 pushed a commit that referenced this pull request Jan 13, 2026
* add OTEL-aware telemetry core that now registers HTTP runtime metrics

* Refactor to exchange/pkg/monitoring

* Add metrics to api-server-go

* Add metrics to PDP and CE too

* Add README for pkg/monitoring

* Address PR comments

* Update README

* Address PR comments

* add OTEL-aware telemetry core that now registers HTTP runtime metrics

* Refactor to exchange/pkg/monitoring

* Add metrics to PDP and CE too

* Address PR comments

* add monitoring for other services

* Fix Build fail

* Fix OE Unit tests

* Remove unneeded files

* Fix merge conflicts w main

* Rewrite observability/

* revert changes to services, will add in next PR

* more clean up, make PR smaller and focused only on adding observability/ folder

* remove pkg/monitoring

* Update observability/README.md

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update observability/README.md

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update observability/docker-compose.yml

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update observability/README.md

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Address remaining comments

* Address comments

* Update README for observability, refernce superapp format

* Fix merge conflict with main:

* revert portal-backend/main.go to main version

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants