Skip to content

Add review-latency metric (PR review time, controlled for size)#1

Merged
snopoke merged 17 commits intomainfrom
feature/review-latency-metric
Apr 26, 2026
Merged

Add review-latency metric (PR review time, controlled for size)#1
snopoke merged 17 commits intomainfrom
feature/review-latency-metric

Conversation

@snopoke
Copy link
Copy Markdown
Contributor

@snopoke snopoke commented Apr 26, 2026

Summary

Adds a new review-latency metric that measures merged_at − COALESCE(ready_for_review_at, opened_at) per PR, bucketed by changed_files (XS=1, S=2-3, M=4-9, L+=10+), reported as median (chart) + p90 (JSON) per repo/week/bucket.

The existing lead-time metric (first_commit_at → merged_at) conflates dev work with review wait, and a 5-line typo PR is compared against a 1000-line refactor. review-latency answers a different question: once a PR is open and ready for review, how long does it take to land, given how big it is?

What shipped

  • DB schema — 4 new nullable columns on pull_requests (additions, deletions, changed_files, ready_for_review_at). Idempotent ALTER-TABLE migration handles legacy DBs in place via PRAGMA table_info.
  • GitHub fetch — unknown-PR branch now makes 2 new API calls (/pulls/{n} for size, /issues/{n}/timeline for the ready_for_review event). Known PRs still skip everything (cache hit unchanged).
  • Metricm_review_latency(conn, since) in src/dora/metrics.py, registered in the METRICS dict. Bucketing + aggregation in Python (matches m_lead_time's shape).
  • Dashboard — 5th panel (Review latency) with a 4-line Chart.js series, one line per bucket. Missing buckets in a week render as gaps, not zeros.
  • Docs — README gains a metrics-table row, a coverage-ramps-forward callout, and the new schema columns in the data-model block.

Coverage caveat

Metric coverage ramps forward: legacy DB rows stay NULL on the new columns and are excluded from the metric. Only newly-merged PRs contribute after this upgrade. README documents this; a --rebuild flag for forced backfill is parked as future work.

API cost

Cold-cache: 3 API calls per new PR (commits + pull detail + timeline) — was 1. Hot-cache (known PRs): unchanged at 0 per-PR calls. README's cold-cache cost note updated accordingly.

Spec & plan

  • Spec: `docs/superpowers/specs/2026-04-25-review-latency-metric-design.md`
  • Plan: `docs/superpowers/plans/2026-04-25-review-latency-metric.md`

Test plan

  • `uv run pytest` — 67 tests pass (was 57 before; +10 from new metric / migration / fetch / boundary tests)
  • Fresh DB: `dora pull` populates the 4 new columns on new PRs
  • Legacy DB (pre-migration schema): ALTER runs, columns appear, no data loss
  • Re-running `dora pull` on a hot cache: known PRs skip the new endpoints (verified via the progress "reused from cache" line)
  • `dora report --metric review-latency --format json --weeks 500` against a real DB: parseable output with the bucketed shape
  • Dashboard at `/?url=fixtures/sample.json`: 5th panel renders with 4 distinct lines, repo + date-range filters work, no console errors

Known follow-ups (parked)

  • Promote panel colors from hardcoded hex to CSS variables (dark-mode parity with the other charts)
  • `--rebuild` flag for backfilling legacy rows
  • Surface `additions + deletions` in the dashboard once the lockfile/generated-file noise question has a config story

🤖 Generated with Claude Code

snopoke and others added 17 commits April 25, 2026 21:09
Captures the brainstorm outcome: a size-bucketed review-cycle latency
metric using opened/ready-for-review → merged window. Pins the bucket
boundaries (1 / 2-3 / 4-9 / 10+ changed files), the schema additions,
the GitHub fetch impact (timeline + per-PR calls on cold cache), and
the dashboard treatment.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
12-task TDD plan covering: schema migration with ALTER-TABLE-on-legacy
support, upsert_pr extension, fixture seed update with size + draft
data, m_review_latency in metrics.py, GitHub timeline helper, fetch_prs
extension, CLI smoke test, dashboard panel + renderer, sample.json
demo data, and README updates.

Also corrects the spec's claim about an "existing collapsible Weekly
metrics (raw, sortable)" detail table — that section was sketched in
the original 2026-04-24 design but never built. Spec now reflects that
p90 lives in the JSON output; the chart shows median only.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Code reviewer caught: the seed comment had 44h, but PR 2's
review_window = merged − ready_for_review_at = 34h (not 44h, which
would be merged − opened_at, ignoring the COALESCE).

Also fixes the same wrong number in the plan's Task 4 expected-values
table.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Regression caught after Task 6 ran the full suite — Task 4 added
review-latency to the METRICS registry but missed updating the
hardcoded set in test_run_report_returns_all_metrics_by_default.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Reviewer caught two issues in the original test:
- `all(...)` was vacuously true on empty data, hiding the fact that
  --weeks 12 from today (April 2026) excluded the October 2025 fixture
  rows entirely. Bumped to --weeks 500 so the fixture window is covered,
  and added an explicit non-empty assertion to surface this class of bug.
- Missing timeout=15 on subprocess.run, inconsistent with the existing
  test_report_via_python_m_module pattern.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds REVIEW_LATENCY_BUCKETS constant, renderReviewLatencyChart function,
and wires it into renderForRepo. Panel shows "No data yet" until Task 10
lands sample fixture data.
Tightening pass after the branch-level review:

- Move BUCKETS / _assign_bucket below SUCCESS_DEPLOY_STATUSES so the
  module-level constants are grouped together before any helpers.
- Drop the dead `hours is None` guard in m_review_latency: opened_at is
  NOT NULL in the schema and the SQL filters merged_at IS NOT NULL, so
  the COALESCE-wrapped subtraction can never yield None. The hours < 0
  guard stays (defensive against clock-skew).
- Add parametrized boundary test for _assign_bucket so off-by-one
  regressions get caught (previously only interior bucket values were
  exercised).
- README data model block now lists the four new pull_requests columns.
- README cold-cache cost note now mentions 3 API calls per new PR
  (commits + pull detail + timeline) for the size + draft fields.
- Spec future-work list gains an entry for promoting the dashboard's
  per-bucket colors from hardcoded hex to CSS variables (so the panel
  picks up dark-mode adaptation alongside the other charts).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@snopoke snopoke merged commit 01c692b into main Apr 26, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant