A robot replanning through a world whose costs drift faces a question classical
planners never answer: how good is my current route, given that most of the
map is stale? CERT-FLOW answers it every round, with a proof: a
high-probability certificate LB ≤ OPT ≤ UB on the optimal route cost, built
from age-weighted non-exchangeable conformal prediction over drift-adjusted
observation residuals, and it spends paid sensing exactly where the
certificate says the gap shrinks fastest.
| classical replanning (D* Lite, AD*) | exchangeable conformal (CIA) | CERT-FLOW | |
|---|---|---|---|
| stale map | silently trusts it | coverage collapses (0.95 → 0.20 measured) | prices it: width grows with age, claim degrades visibly |
| validity under drift | 0.02–0.59 measured | gap-dependent | 0.95–1.00, every condition ever run |
| sensing | none / heuristic | none | certificate-directed (oracle-level regret) |
| static regime | fast | tight | proof-gated preprocessing: ns–µs queries that self-expire |
- Coverage ≥ claimed confidence on every condition ever run: 17 synthetic regimes, off-model worlds, and two real cities (METR-LA, PEMS-BAY) at up to 49% drift-model violation rates.
- Route quality: exactly optimal on known maps (≡ Dijkstra, plus the certificate); travel-regret −0.12 ≈ a clairvoyant oracle in unknown drifting terrain; 2–3× lower regret than freshness/uncertainty/random sensing at equal budget.
- Speed: 3.7 ms p50 / 12 ms p95 per fully-certified round at 60×60 (one CPU core). Certificate-gated preprocessing answers static queries in 269–394 ns (cost) / 8.7 µs (path), at or below published static-SOTA, and at road scale absorbs cost changes in 0.015–0.34 ms vs ~1 s for CRP-style recustomization, exact under ±20% perturbation.
- Theory T1–T7: coverage (observable + latent), a certifiability threshold (gap ε is sustainable iff sensing rate beats drift, both directions), a √L sum-aware upper certificate with a measured selection-bias hazard and its gate, an impossibility theorem (no uniform lower bound can beat Bonferroni by more than log factors, so the certificate's asymmetry is optimal), decision-uniform validity, and a churn-measured floor.
- Honest negatives, kept: the corridor-memory speed hypothesis failed (documented), a predictor's regime claim was downgraded after its test, and the maze negative-control shows exactly where route-critical sensing cannot help. Every known limitation and its disposition: docs/results/limitations.md.
pip install "certflow[fast]" # "fast" = numba (needed to reproduce the speed numbers)
python - <<'PY'
from certflow import CertPlanner, PlannerConfig
from certflow.drift import grid_world
world = grid_world(6, 6, seed=0, kind="bounded", rho=0.02, noise_scale=0.05)
planner = CertPlanner(world, (0, 0), (5, 5),
PlannerConfig(epsilon=5.0, alpha_prime=0.2))
for _ in range(150):
cert, sensed = planner.round()
print(f"[{cert.lb:.2f}, {cert.ub:.2f}] @ confidence {cert.confidence:.2f}, "
f"gap {cert.gap:.2f}")
PYTo develop or reproduce the paper numbers, work from a clone:
git clone https://github.com/Archerkattri/CERT-FLOW && cd CERT-FLOW
python -m venv cert_env && source cert_env/bin/activate
pip install -e ".[dev,fast,realworld]" h5py
pytest # full suite: 227 with datasets; data-dependent tests skip cleanly without data/Every quantitative claim traces to a script; the core sweep runs in ~100 s on
a multicore machine (CERTFLOW_WORKERS=N parallelizes seeds bit-identically).
| Result | Script | Documented in |
|---|---|---|
| Tier-0 coverage (17 conditions, provable + strict modes) | scripts/run_tier0.py |
docs/results/tier0-coverage.md |
| CERT vs Gaussian (path level) | scripts/run_tier0_baselines.py |
docs/results/tier0-coverage.md |
| Edge-level audit (Gaussian break) | scripts/run_gaussian_break.py |
docs/results/gaussian-break.md |
| Incremental repair latency (T3) | scripts/run_tier1_latency.py |
docs/results/tier1-latency.md |
| Ablations (κ churn, pre-widening) | scripts/run_ablations.py |
docs/results/ablations.md |
| Travel regret, unknown terrain | scripts/run_tier2.py |
docs/results/tier2-regret.md |
| Real traffic (METR-LA / PEMS-BAY) | scripts/run_metr_la.py [--pems-bay] |
docs/results/metr-la.md |
| MovingAI maps + maze negative control | scripts/run_movingai.py |
docs/results/movingai.md |
| External algorithms (AD*, VOI, TASP-degenerate) | scripts/run_extern_baselines.py |
docs/results/extern-baselines.md |
| CIA exchangeability collapse | scripts/run_cia_comparison.py |
docs/results/cia-comparison.md |
| E-Graphs + networkx anchors | scripts/run_repeated_queries.py |
docs/results/extern-baselines.md |
| Lifelong missions (memory vs memoryless) | scripts/run_lifelong.py |
docs/results/lifelong.md |
| Feature regimes (predictor, decision-uniform) | scripts/run_feature_regimes.py |
docs/results/feature-regimes.md |
| Scale + engine benchmarks | scripts/run_scale.py |
docs/results/scale.md |
| Road networks (DIMACS NY/FLA, ALT) | scripts/run_roadnet.py |
docs/results/published-speed-comparison.md |
| Certified Contraction Hierarchies | scripts/run_ch.py |
docs/results/published-speed-comparison.md |
All scripts accept --quick. Real-data runs need data/ (sources and loaders
in data/README.md; ~230 MB total, links inside).
- Score every paid observation with a drift-adjusted residual; weight by age (data-independent geometric weights; exchangeability is not assumed).
- Price each edge as
ĉ ± (λq + ρ·age): the conformal quantile pays for noise, the drift term pays for staleness. - Bound the optimum from both sides with two incremental searches (optimistic ℓ, conservative u) over a flat-array engine (numba kernels).
- Claim
LB ≤ OPT ≤ UBat an honestly-annealed confidence: weak claims during warm-up instead of silence; the claim visibly decays as the map ages. - Sense the edge that shrinks the certified gap fastest (route-critical, churn-aware); certification is a rate, not a state (T2′).
- When the certificate proves the map tight, that proof licenses preprocessing: an all-pairs oracle or certified Contraction Hierarchy answering in ns–µs, revoked the instant drift exceeds tolerance.
src/certflow/
types.py contracts (World, EdgeBelief, Certificate)
conformal.py weighted non-exchangeable quantiles, Δ_stale, ACI, blocks
cert.py the planner: certify → sense → repair loop, gates, annealing
sensing.py gap-shrink selection + baseline policies
fastgraph.py flat-array CSR engine (numba D* Lite, Dijkstra kernels)
snapshot.py certificate-gated all-pairs oracle (ns queries)
ch.py certified Contraction Hierarchies (231 µs on 264k-node NY)
roadnet.py DIMACS road graphs + exact ALT on landmark lower-bounds
drift.py / realworld.py / movingai.py synthetic, traffic-replay, game maps
episodes.py / harness.py / baselines.py runners, seeds, parametric strawman
docs/results/ one markdown per experiment: numbers, anomalies, verdicts
docs/specs/ design spec; docs/theory/ working notes
Paper: CERT: Certified Route Planning under Drifting Costs (preprint forthcoming; the citation entry will be updated once the DOI is live).
@software{attri2026certflow,
author = {Attri, Krishi},
title = {{CERT-FLOW}: Certified Route Planning under Drifting Costs},
year = {2026},
doi = {10.5281/zenodo.20633151},
url = {https://github.com/Archerkattri/CERT-FLOW}
}The DOI above is the concept DOI (always resolves to the latest archived version); CITATION.cff carries the same metadata in machine-readable form.
Paper preprint (extended version): DOI pending engrXiv moderation; this section will carry it once posted.
MIT, see LICENSE.
