feat(otel): ClickHouse-first OTEL metrics storage with full-fidelity decode#158
Open
dviejokfs wants to merge 24 commits into
Open
feat(otel): ClickHouse-first OTEL metrics storage with full-fidelity decode#158dviejokfs wants to merge 24 commits into
dviejokfs wants to merge 24 commits into
Conversation
…decode Add a native ClickHouse metrics path alongside the existing TimescaleDB one. When TEMPS_CLICKHOUSE_* is configured, OTLP metrics are decoded at full fidelity (temporality, monotonicity, explicit/exponential histograms, summaries, exemplars, typed labels) and stored in a native `metrics` MergeTree; query_metrics/list_metric_names and the anomaly-detector helpers run natively against it. The service_metrics alerting bridge and the default (no-ClickHouse) TimescaleDB path are unchanged. Proven end-to-end against a live ClickHouse testcontainer: store -> query -> read-back round-trips with fidelity (4 CH storage tests + 1 decode->store fidelity test + 263 lib unit tests + 7 e2e + 15 timescale, all green). Correctness fixes uncovered by the live integration tests: - ReplacingMergeTree ORDER BY now includes a MATERIALIZED label fingerprint (sipHash64 of the sorted label set) and a full-precision timestamp. Previously distinct label-series sharing a coarse per-second timestamp silently collapsed into one row (data loss across series). - Bucket expression uses toInt64(toUnixTimestamp(...))*1000; the prior toUnixTimestamp64Milli() over a DateTime was an illegal-type error. - Aggregates wrap value in assumeNotNull so Nullable(Float64) results match the f64 row read (RowBinary type-width mismatch otherwise). Test infrastructure: fix the ClickHouse testcontainer wait strategy (HTTP /ping via the http_wait_plain feature) and credentials so the integration tests actually execute instead of silently skipping. Frontend: scaffold a metrics explorer page (mirrors Traces) wired to the generated SDK; the richer Phase C query params and the Observe metric kind await an SDK regen against a running server. Deferred follow-ups: rate() delta-vs-cumulative handling, histogram quantile reconstruction, live-CH coverage for exp-histogram/summary/exemplar columns, and TimescaleDB parity for the new fidelity.
- rate() now honours aggregation temporality: DELTA series sum their per-interval increments while CUMULATIVE counters use the within-bucket (max - min). Previously every series used max-min, which undercounts the rate of delta-temporality counters. - query_metrics now populates histogram_summary with the explicit bucket layout (bounds + element-wise-summed bucket_counts) alongside count/sum/min/max, so histogram metrics return their real distribution instead of only a misleading synthetic mean. This enables correct quantile reconstruction from the buckets. Both behaviours are covered by new live-ClickHouse integration tests (query_metrics_rate_respects_temporality, histogram_summary_aggregates_buckets).
…plar columns Adds a decode->store->raw-read test that inserts a metric carrying the nested Array(Tuple(...)) columns (exponential-histogram bucket counts, summary quantiles, and exemplars with trace/span ids) into a live ClickHouse container and asserts they survive. These RowBinary nested-tuple codepaths were previously only unit-tested at the row-mapping tier, never against a real ClickHouse server.
…e-count) histogram_summary previously summed raw histogram_count / bucket_counts across all rows in a window. For CUMULATIVE histograms re-exported multiple times (the OTLP default), each export is a running total, so summing them multiplied the counts by the number of exports — a live demo showed count=300 for 50 observations exported 6 times. Compute histogram_summary from a per-series sub-aggregation: each series (attributes_hash) is collapsed first (CUMULATIVE -> latest snapshot via argMax/max; DELTA or unspecified -> sum across the window), then summed across series up to the requested grouping granularity, matched back to the scalar rows by (bucket_ms, series_values). Scalar/quantile aggregations are unchanged. Covered by a new live-ClickHouse test: cumulative re-exports collapse to per-series latest, then sum across series.
Regenerate the SDK against the Phase C backend (typed aggregation / metric_type / label_filters params; histogram_summary, quantiles, series_key on MetricBucket). MetricsExplorer: - Default view now shows ALL metrics as an overview grid (one mini chart per metric with its latest value); click a card to drill into the detailed view. - Send the real aggregation + label_filters params. For histogram metrics, percentiles are computed client-side from the histogram_summary buckets (the backend scalar quantile runs over the synthetic mean). - Add a histogram Distribution panel (count/mean/p50-p99 + per-bucket bars). - Fix an infinite refetch loop: end_time used new Date() every render, changing the query key each render; the time bounds are now memoized.
Per-project saved dashboards persisted in Postgres (metric_dashboards table) as a typed JSON layout of sections -> metric tiles. Full CRUD under /api/otel/dashboards following Handler->Service->Data: - temps-entities: metric_dashboards entity + migration. - temps-otel: MetricDashboardService (CRUD, typed DashboardLayout/Section/Tile, aggregation + size/length-bounds validation), handlers (utoipa, permission_guard OtelRead/OtelWrite, audit-logged writes), routes + OtelApiDoc registration, DashboardNotFound -> 404. - get/update/delete are scoped by project_id (defense-in-depth against cross-tenant IDOR: a mismatched project_id returns 404, never another project's dashboard). A corrupt stored layout is logged, not silently swallowed. - 8 service tests (CRUD + validation + pagination cap). Frontend: Dashboards list, dashboard view (sections of metric-chart tiles, reusing the metrics-explorer chart + client-side histogram percentiles, with memoized time bounds), and a builder (add/rename sections, add tiles by metric + aggregation, save). SDK regenerated. Verified end-to-end against a running server + ClickHouse: CRUD round-trips the layout losslessly, IDOR is blocked (404 on mismatched project_id), and the UI renders saved dashboards with live charts. Follow-ups: domain-prefix the nested layout schema names (OtelDashboardLayout etc.) and add handler-layer 401/403 tests.
Drop the mx-auto/max-w container constraint on the metrics explorer, dashboards list, dashboard view, and builder so they use the full content width (e.g. ~1600px instead of a centered 1152px on wide screens) — more columns in the metric overview grid and more room for tile charts.
Collapse the separate "OTel Metrics" and "Dashboards" sidebar entries into a single Metrics surface with a route-backed segmented control: - Explore (index, /metrics): the all-metrics overview + per-metric drill-in. - Dashboards (/metrics/dashboards/*): the saved-dashboards list/view/builder, nested unchanged so their relative navigation is preserved. /dashboards/* now redirects to /metrics/dashboards. One nav entry; explore and curate live in the same place.
Add first-class threshold alerting on OpenTelemetry metrics. Alert rules attach directly to a metric (not to a dashboard), so the metric is the source of truth and any surface — explorer, dashboards — merely displays them. Backend (temps-otel): - metric_alert_rules entity + migration (project-scoped: name, metric_name, aggregation, comparator, threshold, window/for-duration, severity, enabled, last_state/last_value for firing-state tracking). - MetricAlertService: project-scoped CRUD with IDOR-safe by-id access (get/update/delete 404 on project mismatch), threshold finiteness + bounds validation, paginated list. - MetricAlertEvaluator: background tokio-interval evaluator. Queries the latest closed bucket (limit 2 to skip the in-progress one), derives the rule value per aggregation (incl. client-side histogram_quantile for percentile rules), runs a for-duration state machine, and fires/resolves through the existing temps_monitoring AlarmService so alerts reuse configured notification channels. No-data ticks preserve prior state. - CRUD handlers under /otel/alerts with audit logging. Frontend (web): - Alerts tab in the unified Metrics surface (Explore | Dashboards | Alerts). - MetricAlerts list with firing-state badge + one-line rule summary, MetricAlertForm create/edit, AlertsRouter. - Explorer overlays a rule's threshold as a reference line on the metric chart (critical=poor tone, else warn). - SDK regen for the new /otel/alerts endpoints. Verified end-to-end against a live ClickHouse-backed server: CRUD, IDOR (get/delete 404 on wrong project), threshold validation, and a breaching rule transitioning unknown→firing with the alarm actually fired. 18 unit tests pass; clippy -D warnings clean.
…ules
Reshape metric_alert_rules so future detector families (anomaly, EWMA,
forecast, outlier, Watchdog-style auto-watch) ship as code-only changes —
never another migration. Done now, before the table merges, so there is no
backfill or transition dance.
Schema: replace the static-only `comparator`/`threshold` columns with a
coarse `detection_kind` string discriminator + a typed-in-Rust, jsonb
`detection_config`. The cross-cutting eval envelope (metric, aggregation,
window, for-duration, severity, enabled, last_* state) stays typed columns;
only the detector-specific knobs move into the blob. Folded directly into the
in-flight create migration (zero rows, no ALTER).
detection_config is a serde internally-tagged enum (`DetectionConfig`) in the
new `temps_otel::detectors` module — copied verbatim from the sanctioned
`ProviderConfig`/`revenue_integrations.config` precedent: NO
`#[schema(discriminator)]` (a compile error with serde(tag) in utoipa 5.5.0).
The raw `serde_json::Value` lives only on the sea-orm column; every service
and DTO layer is fully typed, so the generated SDK is a usable TS
discriminated union — `(StaticParams & { kind: 'static' }) | …` — not `any`.
Today only `static` is evaluable; anomaly/forecast/outlier/auto_watch are
typed, schema-present stubs that validate() rejects at create time. Enabling
each later = a new validate arm + evaluator branch + openapi-ts regen.
`detection_kind` is a plain string (not a PG enum) so new kinds need no
ALTER TYPE either.
Evaluator now decodes the typed detector and branches on it (static =
`Comparator::breaches`); the bad-input surface moves to serde (unknown kind /
bad comparator / missing threshold -> 422 at deserialize) which is stronger
than the old string allowlists. Frontend maps the static form to/from
detection_config and the explorer/list narrow on `kind`.
Verified live (ClickHouse-backed): create/get round-trip the typed config
through jsonb; anomaly -> 400 (not yet supported); bad input -> 422; the
static evaluator still fires. 25 unit tests pass; clippy -D warnings clean;
SDK regenerated; frontend typechecks.
Make the `anomaly` detector evaluable end to end — it was a typed,
creation-rejected stub. A rule now learns a baseline band from history and
fires when the current value deviates from it, reusing the same for-duration
state machine and AlarmService as static threshold rules.
Detection math (pure, unit-tested) in `detectors`:
- robust_band = median + MAD·1.4826 (consistent-with-σ scale).
- anomaly_breaches = direction-aware (above/below/both) z-score test, with a
MIN_BAND_SCALE floor so a flat baseline can't divide by zero.
- season_cell buckets a timestamp into none/hourly/daily/weekly cells.
- validate() now accepts anomaly (robust/basic); agile/ewma and bad
hyperparameters (deviations≤0, pct∉(0,1], lookback∉1..=90) are rejected.
Evaluator branch (`metric_alert_evaluator`):
- Baseline fetched via the SAME query_metrics aggregation path as the scored
point (so counter-rate / histogram-percentile compare like-for-like — NOT
get_metric_baseline, which bypasses aggregation), cached per rule for 1h.
- Seasonal-cell filter with cold-start fallback to the global band; an
insufficient (<8 samples) or degenerate (flat) baseline PRESERVES state
rather than firing — no spurious alerts on thin history.
- fire() refactored to a detector-agnostic FireDetails (static vs anomaly
message/metadata, e.g. "820ms is 4.2σ from the baseline 210 ± 90").
- run_cycle prunes breach-timer + baseline caches for disabled/deleted rules
(also fixes a pre-existing breach_start leak).
Latent bug fixed (affected static AND anomaly): translate_bucket_interval only
accepted space-separated forms, so the evaluator's `format!("{}s", secs)`
("300s") silently fell back to INTERVAL 1 HOUR — every windowed query was
coarsened to hourly regardless of window_secs. Now also parses the compact
"300s"/"5m"/"1h"/"2d"/"1w" form.
Frontend: the alert form authors anomaly rules — a Detection selector swaps
the static comparator/threshold for algorithm / sensitivity (σ) / direction /
seasonality; the list summary reads the typed config.
Verified live (ClickHouse-backed): anomaly create accepted (was 400);
insufficient baseline preserves state; a normal value stays ok (no false
positive); an injected spike (100000 vs a ~100±15 band) transitions to firing
and raises an alarm. 31 unit tests pass; clippy -D warnings clean; frontend
typechecks; the form renders anomaly fields.
Anomaly detection was exposed but not explained. Three fixes: - History/eligibility banner: when a metric is picked for an anomaly rule, the form checks how much history it has and warns if it's under ~14 days — spelling out that the rule will sit at "unknown" and not alert until a baseline can be built (the silent-inert trap). The "Unknown" badge now also carries a tooltip explaining the same. - Sensitivity presets: the raw σ number is replaced with High/Medium/Low presets (2/3/4σ); the exact σ stays available under "Custom". - Advanced disclosure: algorithm + seasonality (sensible defaults most users won't touch) move behind an "Advanced" details block, leaving Sensitivity + Direction as the two primary knobs.
…tion) Editing an alert rule showed empty Aggregation + Detection selects and fell back to the static fields, regardless of the saved rule. Root cause: the form was created with placeholder defaults while the rule loaded, then updated via react-hook-form's `values` prop — and that reset drops Radix Select values that *change* during it (same-value selects like severity were unaffected, masking the bug). Fix: load the rule in a thin parent, then mount the form body once (keyed on the rule id) with the resolved `defaultValues` from the start, so no Select is reset post-mount. Verified: editing the anomaly rule now restores aggregation (max), detector (anomaly), sensitivity, direction, seasonality.
Add a read-only preview/backtest so an anomaly rule is legible before you save
it (and tunable after). The single highest-value affordance for a feature that
otherwise fails silently.
Backend:
- Extract a shared `BandModel` in `detectors` (per-seasonal-cell robust bands +
global fallback, built once and queried per timestamp). The evaluator's
`anomaly_eval` now uses it too, so the preview can never diverge from what
production would actually do.
- New `services::anomaly_preview` + `POST /otel/alerts/preview`: replays a
metric over a range against the band and returns per-bucket
{value, lower, upper, breaching} + breach_count + baseline sufficiency,
through the SAME query_metrics aggregation path as the evaluator.
Frontend:
- `AnomalyBacktest`: for an anomaly rule, calls the endpoint (debounced) and
shows "would have fired N× in the last 7 days" plus a chart of the value
against its expected band with breach markers, or an explicit
"not enough history" state when the baseline is thin.
- SDK regen for the new endpoint.
Verified live: backtest of the seeded anomaly rule reports 2 breaches over 7d
with band [55.5, 144.5] (median 100 ± 3·MAD); UI renders the count + chart.
31+ unit tests pass (incl. new BandModel test); clippy clean.
For an enabled anomaly rule on the charted metric, backtest its band over the visible range (the same preview endpoint the form uses) and shade the expected [lower, upper] region behind the line via a recharts ReferenceArea. Only shown when the explorer's aggregation matches the rule's (so the band sits on the same scale as the line) and the baseline is sufficient. ThresholdLineChart gains an additive `bands` prop; existing threshold lines are unchanged.
Turn the metrics pages from a neutral data browser into a problem surface: the system finds what's wrong and the user reads the answer. - Health header pinned above the tabs (Metrics.tsx): triaged status — firing alerts + active anomalies worst-first, with Alert/Warn/No-data/OK status dots and a firing count on the Alerts tab. Honest coverage states: "all systems healthy" vs "nothing is being watched yet" vs "couldn't load" — never false-green when nothing is monitored. - Status dots + toned line on overview cards (MetricsExplorer) and dashboard tiles (MetricTile): a firing metric reds out of a wall of green instead of looking healthy. Join keyed on (metric_name, aggregation) so a tile only reds for a rule that targets the series it shows. Redundant encoding (dot + tone + chip), never hue alone. - Severity sort: overview grid and the alerts list float worst-first (alert → warn → no-data → ok), so the 24-tile cap and the alert list stop hiding the broken thing. - Shared alert-status model (one cached listAlerts fetch) reused by all three. Fixes two real token bugs found in the review: - Added --success/--warning theme tokens (+ @theme mapping); badge.tsx used bg-success/bg-warning with no token, so the "OK"/healthy badge rendered with no background. - AnomalyBacktest used hsl(var(--primary)) — but the tokens are oklch(), so the band/line/breach dots painted transparent. Use bare var(--chart-1)/ var(--destructive) like the working chart. Frontend-only; typechecks clean.
"Did a deploy cause this?" is the first triage question, and Temps owns the deploy pipeline — a structural edge over Datadog. Overlay deploy events that fall inside the chart's visible window as distinct (purple, dashed) vertical markers, snapped to the nearest bucket (the categorical x-axis can't take a raw timestamp), labelled with the short commit hash. Scoped to the selected environment; timestamps normalised (sec or ms). ThresholdLineChart gains an additive `markers` prop (vertical ReferenceLine); existing lines unchanged.
…(Tier 2)
When a metric looks wrong the next question is "what else moved in this same
window?" — and Temps owns metrics, deploys, traces, and errors, so the answer
shouldn't require re-pivoting four tools by hand. Under the detail chart, a
"related signals" strip:
- Frames itself by live state: "This metric is firing — see what else
changed" when a rule on this (metric, aggregation) is firing, else a
neutral "What changed in this window". Reuses the cached listAlerts via
useAlertStatus — no extra fetch.
- Leads with the deploy answer ("1 deploy landed here — marked on the chart
above", or "No deploys in this window — rules out a release"), the literal
thing the chart's deploy markers visualise.
- Deep-links to Traces and Errors pre-scoped to the SAME window: Traces gets
range+env (which it already honours); Errors learns to read `?range=` so
the jump actually lands on that window (it widens the metrics-only 6h to
24h rather than ignore the intent). Plus a "Live view" jump to /observe.
Honesty: every link carries params the target page genuinely applies, and the
strip only states what it knows (deploys) — it never bluffs a trace/error
count it didn't fetch. Verified live: firing CPU-anomaly drill-in shows the
firing header + "1 deploy", and /errors?range=1h lands on "the last hour".
All curated-lucide icons (Network/Bug/Eye/Rocket/ArrowUpRight) confirmed in
the runtime bundle — the subset excludes ListTree/Telescope.
A long project name in the breadcrumb switcher (and long crumbs on deep paths like errors/<long-title>) wrapped the header to two lines because shadcn's BreadcrumbList is flex-wrap + break-words. Force the list to flex-nowrap with a min-w-0 ancestor chain, truncate + responsively cap every crumb (switcher label, intermediate links, and the current-page crumb — not just the switcher, or removing the wrap escape-hatch would overflow the terminal crumb under the action cluster), and clip at the breadcrumb boundary (overflow-hidden) with the right-hand action cluster pinned shrink-0. Verified by measurement: with every crumb forced to a 393px label, the breadcrumb stays 1 line and the header 64px at both 1280px and 375px, each crumb ellipsizes, and the breadcrumb never overlaps the action cluster.
Make "is anything on fire here?" answerable at a glance. A dashboard's status is the worst alert-rule status across the metrics its tiles plot — derived from the same cached listAlerts the per-tile dots already use, so no extra fetch and the signals can't disagree. The dashboards list shows a pulsing status dot + "N firing" per row; the dashboard view shows a firing badge in the header and a per-section count; "All clear" appears only when tiles are actually watched, nothing when no rule covers the dashboard (no vanity green). Hardened per adversarial review: - rollupStatus counts DISTINCT firing rules (Set of rule id), not tiles, so three tiles plotting one firing metric read "1 firing", not "3" (the name-fallback would otherwise map all three to the same rule). - ruleStatus + the firing/gathering lists now treat a disabled rule as not-firing: the backend freezes a disabled rule's last_state, so without this a monitor switched off mid-firing flashed a false red alarm. Fixed at the source, so the alerts/health surfaces benefit too. - the section "N firing" carries a severity title (no color-only meaning). Verified live: toggling the rule's `enabled` flips the dashboard between "1 firing" and "All clear" while last_state stays frozen-firing.
The project "Observe" sidebar group mixed OpenTelemetry signals with operational monitoring and carried a legacy "Metrics" (resource monitoring) entry that duplicated and added nothing over the OTel Metrics page. Split it: OpenTelemetry Observe · Traces · AI Traces · Metrics · Error Tracking Monitoring Uptime · Request Logs · AI Crawlers - "OTel Metrics" → "Metrics" (it's the only metrics surface now); "All events" → "Observe". - Removed the legacy project Metrics: dropped the nav entry, the `monitoring` route, and the ProjectMonitoring page component (used nowhere else). - Command palette: repointed its dead "Metrics" → /metrics and added "Observe", so removing the route leaves no broken command.
The PR's Changelog Check requires CHANGELOG.md to carry an [Unreleased] entry. Document the OTel metrics feature set: explorer, dashboards, alert rules with anomaly detection + backtest, deploy markers, cross-signal links, Datadog-style firing status, the OpenTelemetry/Monitoring nav grouping, the one-line header fix, the disabled-rule firing fix, and the legacy Metrics page removal.
Filtering traces by service threw ClickHouse Code 184 ILLEGAL_AGGREGATION: the trace-summary SELECT aliases `argMax(service_name, …) AS service_name`, which shadows the raw column, so an unqualified `service_name = ?` in WHERE resolved to the aggregate alias. Qualify it as `spans.service_name` so it binds the per-span column. Fixes both query_trace_summaries (Traces) and query_genai_trace_summaries (AI Traces); the count mirrors are qualified too to keep their filter SQL byte-identical. Not the space in the value — verified against live ClickHouse with "Observability Starter".
An orange line told you a metric was anomalous but the chart didn't show why. Datadog-style: overlay the detector's time-varying expected-range band and mark the points that left it. - ThresholdLineChart gains an optional `bandSeries` (LineChart → ComposedChart): two stacked Areas draw the [lower, upper] band behind the line; a stroke-less Line with a custom dot marks only the breaching points in red (recharts' Scatter plots null points, so it can't mark a sparse subset). Existing usages (tiles, web vitals) are unchanged. - Drill-in: backtest the rule's detector with the DISPLAYED aggregation (not the rule's) so the band always tracks the visible line and shows even when you're viewing a different aggregation than the rule alerts on. Per-bucket band values are merged onto the chart points (nearest-timestamp), replacing the old flat, aggregation-gated median band that almost never showed. Verified live: anomtest.cpu drill-in shows the expected band + a red breach dot at the spike; non-anomaly metrics and the web-vitals charts render unchanged.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a ClickHouse-first, full-fidelity OpenTelemetry metrics path to
temps-otel, alongside the existing TimescaleDB one. WhenTEMPS_CLICKHOUSE_*is configured, OTLP metrics are decoded losslessly and stored in a nativemetricsReplacingMergeTree; queries,list_metric_names, and the anomaly-detector helpers run natively against ClickHouse. The default (no-ClickHouse) install and theservice_metricsalerting bridge are unchanged.Context: the premise was "we support traces/spans but not metrics" — in fact metrics were already wired on TimescaleDB but flattened. This PR is the ClickHouse-first leg (CH is far better suited to high-cardinality metric labels + native quantiles), with TimescaleDB parity deferred.
What's included
MetricPointcarries temporality,is_monotonic,start_time, exponential-histogram buckets, summary quantiles, exemplars (trace/span), flags, description, and typed labels. Syntheticvalue=sum/countretained for histograms (the anomaly detector reads it).migrations/clickhouse/0003_metrics.sql(metricstable) + nativestore_metrics, replacing the delegate-to-Timescale stubs.ChMetricRowfield order is guarded against the RowBinary positional-serialization landmine by a DDL-parsing unit test.query_metrics(time-bucket, label filters, group-by, avg/sum/min/max/count/rate/quantile), parameterized + allowlisted (no injection surface). Store-neutral DTOs frozen so TimescaleDB can later satisfy the same contract.Metrics.tsx/MetricsExplorer.tsxmirroring Traces, wired to the generated SDK (no hand-rolled fetch), routed under the project + a sidebar entry.Verification (live ClickHouse, not skipped)
cargo test -p temps-otel: 263 lib + 4 CH-storage + 1 decode→store fidelity + 7 e2e + 15 TimescaleDB — all pass. The ClickHouse tests genuinely start a container and assert store→query→read-back of temporality,is_monotonic, histogram buckets, labels, and multi-series grouping.cargo check --lib -p temps-otel -p temps-migrationsis clean.Bugs found by the live tests + adversarial review (all fixed)
ReplacingMergeTreeORDER BY excluded labels and was second-resolution → distinct label-series (e.g.http.method=GETvsPOST) silently collapsed into one row. Fixed via aMATERIALIZEDlabel-hash + fulltimestampin ORDER BY.toUnixTimestamp64Milli(toStartOfInterval(...))(DateTime, not DateTime64). Fixed totoInt64(toUnixTimestamp(...))*1000.min/max/sum/quantileoverNullable(Float64)→ wrapped inassumeNotNull.message_on_stdout) never matched and the tests silently skipped (false green); switched to an HTTP/pingwait (http_wait_plain) + non-empty CH password.Deferred (tracked follow-ups, not in this PR)
rate()does not branch delta-vs-cumulative (treats all as max−min).histogram_summary=None) — misleading; no caller warning yet.Array(Tuple(...))columns are only unit-tested, never inserted into live ClickHouse.metrickind is off until then.Note
A pre-existing sibling has the same silent-skip testcontainer wait bug:
temps-analytics-events/src/services/clickhouse_backend.rs— its ClickHouse integration tests have also never actually run. Out of scope here; fix tracked separately.🤖 Generated with Claude Code