Skip to content

feat(monitor): Go SDK dashboard generator — ADR-007 Step 2a#8216

Open
abhay1999 wants to merge 2 commits intojaegertracing:mainfrom
abhay1999:feat/grafana-sdk-generator-step2a
Open

feat(monitor): Go SDK dashboard generator — ADR-007 Step 2a#8216
abhay1999 wants to merge 2 commits intojaegertracing:mainfrom
abhay1999:feat/grafana-sdk-generator-step2a

Conversation

@abhay1999
Copy link
Contributor

What this PR does

Implements ADR-007 Step 2a: introduces a standalone Go generator (monitoring/jaeger-mixin/generate/) that uses grafana-foundation-sdk/go to produce dashboard-for-grafana-v2.json.

All 10 panels are now native timeseries (React-based), replacing the deprecated Angular graph panels emitted by the existing grafana-builder/Jsonnet toolchain.


Changes

File Purpose
monitoring/jaeger-mixin/generate/main.go Go generator — translates all 5 rows / 10 panels
monitoring/jaeger-mixin/generate/go.mod Standalone module (grafana-foundation-sdk/go v0.0.12)
monitoring/jaeger-mixin/dashboard-for-grafana-v2.json Generated output — checked in for reviewability
docker-compose/monitor/docker-compose.yml Mounts v2 dashboard alongside existing one for side-by-side comparison
.codecov.yml Excludes generate/ (build tool, not production code)

Panels translated (Jsonnet → Go SDK)

Row Panel Unit
Collector - Ingestion Span Ingest Rate ops/s (stacked)
Collector - Ingestion % Spans Refused percentunit (0–1)
Collector - Export Span Export Rate ops/s (stacked)
Collector - Export Export Success Rate % percent (0–100)
Storage Storage Request Rate ops/s (stacked)
Storage Storage Latency - P99 seconds
Query Query Request Rate ops/s (stacked)
Query Query Latency - P99 seconds
System CPU Usage percentunit
System Memory RSS bytes

How to regenerate

cd monitoring/jaeger-mixin/generate
go run . > ../dashboard-for-grafana-v2.json

Test plan

Both dashboards mounted side-by-side via docker-compose/monitor/:

docker compose -f docker-compose/monitor/docker-compose.yml up
# open http://localhost:3000

Screenshots from live validation against microsim traffic below.

Old dashboard (Jsonnet / Angular graph panels):

New dashboard (Go SDK / native timeseries panels):

Same queries, same data — zero Angular panels.


Relates to: #5833
ADR: docs/adr/007-grafana-dashboards-modernization.md

@abhay1999 abhay1999 requested a review from a team as a code owner March 21, 2026 06:36
Copilot AI review requested due to automatic review settings March 21, 2026 06:36
@dosubot dosubot bot added the enhancement label Mar 21, 2026
@abhay1999
Copy link
Contributor Author

Dashboard Screenshots (live validation)

Old dashboard — Jsonnet / Angular graph panels (existing):
image

New dashboard — Go SDK / native timeseries panels (this PR):
image

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a standalone Go-based generator for the Jaeger Grafana dashboard (ADR-007 Step 2a), switching the emitted panels to native Grafana timeseries panels and checking the generated v2 dashboard JSON into the repo for review and side-by-side comparison.

Changes:

  • Introduce monitoring/jaeger-mixin/generate/ as a standalone Go module using grafana-foundation-sdk/go to generate the v2 dashboard JSON.
  • Add dashboard-for-grafana-v2.json (generated) and mount it in the monitoring docker-compose setup alongside the existing dashboard.
  • Exclude the generator module directory from Codecov coverage reporting.

Reviewed changes

Copilot reviewed 5 out of 6 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
monitoring/jaeger-mixin/generate/main.go Implements the Go generator that builds the v2 dashboard (rows/panels/PromQL).
monitoring/jaeger-mixin/generate/go.mod Adds a standalone Go module for the generator with grafana foundation SDK dependency.
monitoring/jaeger-mixin/generate/go.sum Records dependency checksums for the generator module.
monitoring/jaeger-mixin/dashboard-for-grafana-v2.json Adds the generated Grafana v2 dashboard JSON for review/provisioning.
docker-compose/monitor/docker-compose.yml Mounts the v2 dashboard into Grafana provisioning for side-by-side comparison.
.codecov.yml Excludes the generator directory from coverage reporting.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Editable().
Refresh("30s").
Time("now-1h", "now").
Timezone(common.TimeZoneBrowser).
Copy link

Copilot AI Mar 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The v2 generator doesn't configure any templating variables or annotations. The existing Jaeger dashboard includes a Prometheus datasource selector via templating, which helps users with multiple datasources (and matches other dashboards in this repo that emit templating.list / annotations.list). Consider adding templating/annotations configuration so the generated JSON conforms to the usual dashboard schema and retains the datasource selector.

Suggested change
Timezone(common.TimeZoneBrowser).
Timezone(common.TimeZoneBrowser).
// Configure templating so the dashboard includes a Prometheus datasource
// selector, matching the original Jaeger dashboard and other mixin dashboards.
Templating(
dashboard.NewTemplatingBuilder().
WithDatasourceTemplate(
"datasource", // variable name
"Prometheus", // label shown in the UI
"prometheus", // query to list Prometheus datasources
),
).
// Configure default annotations so the dashboard emits annotations.list.
Annotations(
dashboard.NewAnnotationsBuilder().
WithBuiltInAnnotationsAndAlerts(),
).

Copilot uses AI. Check for mistakes.
Comment on lines +31 to +44
{
"type": "timeseries",
"targets": [
{
"expr": "sum(rate(otelcol_receiver_refused_spans_total[1m])) or vector(0)",
"legendFormat": "error"
},
{
"expr": "sum(rate(otelcol_receiver_accepted_spans_total[1m]))",
"legendFormat": "success"
}
],
"title": "Span Ingest Rate",
"transparent": false,
Copy link

Copilot AI Mar 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The timeseries panels in this dashboard JSON do not include an id field at all. Other provisioned dashboards in this repo include an id for every panel, and Grafana relies on these IDs for panel-level operations (links, repeats, edits). Please ensure each panel gets a unique id (ideally stable across regenerations) and regenerate the output.

Copilot uses AI. Check for mistakes.
Comment on lines +34 to +68
func buildDashboard() (dashboard.Dashboard, error) {
builder := dashboard.NewDashboardBuilder("Jaeger (v2)").
Uid("jaeger-v2").
Tags([]string{"jaeger"}).
Editable().
Refresh("30s").
Time("now-1h", "now").
Timezone(common.TimeZoneBrowser).

// ── Row 1: Collector - Ingestion ───────────────────────────────────────
WithRow(dashboard.NewRowBuilder("Collector - Ingestion")).
WithPanel(spanIngestRatePanel()).
WithPanel(spansRefusedPctPanel()).

// ── Row 2: Collector - Export ──────────────────────────────────────────
WithRow(dashboard.NewRowBuilder("Collector - Export")).
WithPanel(spanExportRatePanel()).
WithPanel(exportSuccessRatePanel()).

// ── Row 3: Storage ─────────────────────────────────────────────────────
WithRow(dashboard.NewRowBuilder("Storage")).
WithPanel(storageRequestRatePanel()).
WithPanel(storageLatencyP99Panel()).

// ── Row 4: Query ───────────────────────────────────────────────────────
WithRow(dashboard.NewRowBuilder("Query")).
WithPanel(queryRequestRatePanel()).
WithPanel(queryLatencyP99Panel()).

// ── Row 5: System ──────────────────────────────────────────────────────
WithRow(dashboard.NewRowBuilder("System")).
WithPanel(cpuUsagePanel()).
WithPanel(memoryRSSPanel())

return builder.Build()
Copy link

Copilot AI Mar 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The generator currently relies on SDK defaults for panel IDs, which results in dashboard-for-grafana-v2.json having duplicate row IDs and missing panel IDs. Please set explicit, unique IDs for every row and panel (preferably deterministic so diffs stay stable across regenerations) before marshaling the dashboard.

Copilot uses AI. Check for mistakes.
Add monitoring/jaeger-mixin/generate/ — a standalone Go module that
produces dashboard-for-grafana-v2.json using grafana-foundation-sdk/go.
All 10 panels are native timeseries (React-based), replacing the
deprecated Angular graph panels emitted by grafana-builder/Jsonnet.

Mount the v2 dashboard alongside the existing one in the SPM docker-compose
stack for side-by-side comparison before the Jsonnet cutover (Step 2b).

- monitoring/jaeger-mixin/generate/main.go: dashboard definition in Go
- monitoring/jaeger-mixin/generate/go.mod: standalone module (grafana-foundation-sdk/go v0.0.12)
- monitoring/jaeger-mixin/dashboard-for-grafana-v2.json: generated output
- docker-compose/monitor/docker-compose.yml: add second volume mount for v2 dashboard
- .codecov.yml: exclude generate/ (build tool, not production code)

Relates to: jaegertracing#5833

Signed-off-by: abhay1999 <abhaychaurasiya19@gmail.com>
@abhay1999 abhay1999 force-pushed the feat/grafana-sdk-generator-step2a branch from 4373e6e to ba58120 Compare March 21, 2026 06:42
@codecov
Copy link

codecov bot commented Mar 21, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 95.61%. Comparing base (4afa357) to head (cf0b273).

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #8216   +/-   ##
=======================================
  Coverage   95.61%   95.61%           
=======================================
  Files         319      319           
  Lines       16793    16793           
=======================================
  Hits        16056    16056           
  Misses        582      582           
  Partials      155      155           
Flag Coverage Δ
badger_direct 9.05% <ø> (ø)
badger_e2e 1.04% <ø> (ø)
cassandra-4.x-direct-manual 13.25% <ø> (ø)
cassandra-4.x-e2e-auto 1.03% <ø> (ø)
cassandra-4.x-e2e-manual 1.03% <ø> (ø)
cassandra-5.x-direct-manual 13.25% <ø> (ø)
cassandra-5.x-e2e-auto 1.03% <ø> (ø)
cassandra-5.x-e2e-manual 1.03% <ø> (ø)
clickhouse 1.16% <ø> (ø)
elasticsearch-6.x-direct 16.83% <ø> (ø)
elasticsearch-7.x-direct 16.86% <ø> (ø)
elasticsearch-8.x-direct 17.01% <ø> (ø)
elasticsearch-8.x-e2e 1.04% <ø> (ø)
elasticsearch-9.x-e2e 1.04% <ø> (ø)
grpc_direct 7.79% <ø> (ø)
grpc_e2e 1.04% <ø> (ø)
kafka-3.x-v2 1.04% <ø> (ø)
memory_v2 1.04% <ø> (ø)
opensearch-1.x-direct 16.91% <ø> (ø)
opensearch-2.x-direct 16.91% <ø> (ø)
opensearch-2.x-e2e 1.04% <ø> (ø)
opensearch-3.x-e2e 1.04% <ø> (ø)
query 1.04% <ø> (ø)
tailsampling-processor 0.52% <ø> (ø)
unittests 94.30% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@yurishkuro
Copy link
Member

please address all comments

@github-actions github-actions bot added the waiting-for-author PR is waiting for author to respond to maintainer's comments label Mar 21, 2026
Copilot AI review requested due to automatic review settings March 21, 2026 08:36
@github-actions github-actions bot removed the waiting-for-author PR is waiting for author to respond to maintainer's comments label Mar 21, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 6 changed files in this pull request and generated 5 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +84 to +88
return timeseries.NewPanelBuilder().
Id(id).
Title(title).
Span(12).
Height(8).
Copy link

Copilot AI Mar 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The shared panel helpers don’t set a lower y-axis bound or null-as-zero handling. The v1 dashboard sets min: 0 and nullPointMode: "null as zero" on all panels; without similar settings, the v2 visuals can diverge (negative axes, gaps instead of zeros). Consider adding Min(0) (or the SDK equivalent) and configuring null handling in these helper builders so all panels inherit it.

Copilot uses AI. Check for mistakes.
}

func spansRefusedPctPanel() *timeseries.PanelBuilder {
return stackedPanel(3, "% Spans Refused").
Copy link

Copilot AI Mar 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spansRefusedPctPanel uses stackedPanel(...), enabling stacking for a percent-of-total visualization. Stacking percentages across series can exceed 100% and is usually misleading; it also contradicts the stackedPanel doc comment (“Use for rate/count panels”). Consider switching this to timeseriesPanel(...) (no stacking) and, if desired, keep the unit/max settings.

Suggested change
return stackedPanel(3, "% Spans Refused").
return timeseriesPanel(3, "% Spans Refused").

Copilot uses AI. Check for mistakes.
}

func exportSuccessRatePanel() *timeseries.PanelBuilder {
return stackedPanel(6, "Export Success Rate %").
Copy link

Copilot AI Mar 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exportSuccessRatePanel uses stackedPanel(...), which enables stacking for a percentage metric. For success-rate percentages, stacking multiple exporters can produce values >100% and make the panel hard to interpret. Consider using timeseriesPanel(...) here (no stacking) and keep the percent unit/max.

Suggested change
return stackedPanel(6, "Export Success Rate %").
return timeseriesPanel(6, "Export Success Rate %").

Copilot uses AI. Check for mistakes.
Comment on lines +41 to +45
Editable().
Refresh("30s").
Time("now-1h", "now").
Timezone(common.TimeZoneBrowser).
// Prometheus datasource selector — matches the original Jaeger dashboard
Copy link

Copilot AI Mar 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This generator changes dashboard-wide defaults compared to the existing dashboard: refresh is set to 30s (v1 uses 10s) and timezone is set to browser (v1 uses utc). If the goal is a like-for-like translation for side-by-side comparison, consider matching the existing refresh interval/timezone (or document why these defaults intentionally changed).

Copilot uses AI. Check for mistakes.
func promTarget(expr, legend string) *prometheus.DataqueryBuilder {
return prometheus.NewDataqueryBuilder().
Expr(expr).
LegendFormat(legend)
Copy link

Copilot AI Mar 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The dashboard defines a datasource templating variable, but the Prometheus targets/panels produced by this generator don’t appear to bind to it (the generated v2 JSON has no datasource field on panels/targets, unlike v1 which sets "datasource": "$datasource"). This likely means changing the Data Source variable in Grafana won’t affect these panels. Consider setting the datasource reference on each panel (or on each Prometheus query target, depending on the Foundation SDK API) to use the ${datasource} variable.

Suggested change
LegendFormat(legend)
LegendFormat(legend).
Datasource("${datasource}")

Copilot uses AI. Check for mistakes.
- Add Prometheus datasource template variable so the dashboard exposes
  a datasource selector matching the original Jaeger dashboard
- Assign unique stable IDs (1-15) to all rows and panels; previously
  rows had id=0 and timeseries panels had no id field
- Fix stacking: P99 latency panels (Storage, Query) and single-metric
  panels (CPU Usage, Memory RSS) no longer use stacking mode — stacking
  percentile or single-series data produces misleading visualisations
- Regenerate dashboard-for-grafana-v2.json from updated generator

Relates to: jaegertracing#5833

Signed-off-by: abhay1999 <abhaychaurasiya19@gmail.com>
@abhay1999 abhay1999 force-pushed the feat/grafana-sdk-generator-step2a branch from b5f43d1 to cf0b273 Compare March 21, 2026 08:44
@abhay1999
Copy link
Contributor Author

@yurishkuro DCO failure is fixed (missing Signed-off-by on the review-fix commit — amended and force pushed). All other CI checks were passing. Ready for re-review when you get a chance.

@yurishkuro yurishkuro added the changelog:experimental Change to an experimental part of the code label Mar 21, 2026
@yurishkuro
Copy link
Member

please keep addressing/responding to bot comments

@github-actions github-actions bot added the waiting-for-author PR is waiting for author to respond to maintainer's comments label Mar 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

changelog:experimental Change to an experimental part of the code enhancement waiting-for-author PR is waiting for author to respond to maintainer's comments

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants