Verification and debugging framework for AI agents.
Stop agents from acting on stale assumptions, making unsupported claims, or silently switching hypotheses mid-investigation. A workspace of TypeScript packages (understanding-gate, evidence ledger, claim gate, hypothesis tracker, runtime reality checker, debug playbook engine, readme-first resolver, domain router, grounding-wrapper, grounding-sdk, review-claim-gate, MCP server) that an agent harness wires into its session and tool-call lifecycle.
Most agent tooling helps a model talk about a problem.
agent-groundingmakes it prove what it has actually checked, what it has only assumed, and what it has ruled out, before the next destructive command runs.
Agents reach the stack two ways: a PreToolUse gate that blocks destructive commands until claims are grounded, and access surfaces (MCP, SDK, CLI) over a shared evidence ledger.
flowchart LR
subgraph clients["Agent harness / MCP clients"]
direction TB
cc["Claude Code"]
oc["OpenCode"]
hn["harness"]
end
subgraph gate["Pre-execution gate · PreToolUse"]
direction TB
ug["understanding-gate<br/>report approved before destructive tools"]
rrc["runtime-reality-checker<br/>blocks compose / systemctl / kill on drift"]
end
subgraph core["Verification core"]
direction TB
cg["claim-gate"]
ht["hypothesis-tracker"]
helpers["grounding-wrapper · domain-router<br/>readme-first-resolver<br/>debug-playbook-engine · review-claim-gate"]
el[("evidence-ledger<br/>~/.evidence-ledger/ledger.db")]
end
subgraph surfaces["Access surfaces"]
direction TB
mcp["grounding-mcp · JSON-RPC<br/>ledger_add · ledger_summary · claim_evaluate"]
sdk["grounding-sdk<br/>verify / track / validate"]
cli["evidence-ledger CLI"]
end
clients --> gate
clients --> surfaces
gate --> core
surfaces --> core
cg --> el
ht --> el
helpers --> el
git clone https://github.com/LanNguyenSi/agent-grounding && cd agent-grounding
npm install && npm run build
# Run the demo against a scratch session so it doesn't pollute the default ledger
LEDGER="node packages/evidence-ledger/dist/cli.js"
$LEDGER clear --session readme-demo # no-op on first run
$LEDGER fact "process is not running" \
--source "ps aux | grep clawd-monitor" \
--confidence high \
--session readme-demo
$LEDGER hypothesis "OOM killer terminated the process" \
--source "dmesg output" \
--confidence medium \
--session readme-demo
$LEDGER show --session readme-demoevidence-ledger is the headline package: every fact carries a source, every hypothesis lives separately from facts, rejected hypotheses stay visible, unknowns are acknowledged. The CLI is one of three surfaces; there's also a typed library API (@lannguyensi/evidence-ledger) and a JSON-RPC server (grounding-mcp) that any MCP client can call. Entries land in ~/.evidence-ledger/ledger.db; per-session isolation keeps demo data out of your real debugging sessions.
✓ Fact recorded:
✓ [#26] process is not running (ps aux | grep clawd-monitor) HIGH
? Hypothesis added:
? [#27] OOM killer terminated the process (dmesg output) MED
📋 Evidence Ledger — session: readme-demo
2 entries total
✓ FACTS (1)
✓ [#26] process is not running (ps aux | grep clawd-monitor) HIGH
? HYPOTHESES (1)
? [#27] OOM killer terminated the process (dmesg output) MED
(Entry IDs autoincrement globally across sessions, so your numbers will differ.)
Same data via ledger export --session readme-demo produces structured JSON for hand-off to another agent or a human. Same data via grounding-mcp's ledger_summary verb is what harness explain --trace and harness audit consume to replay policy decisions; see the harness integration for the wiring.
Every package is published under the @lannguyensi/ scope and installable directly:
# Library APIs (claim-gate, evidence-ledger, grounding-wrapper, and
# review-claim-gate ship both a library and a CLI; the CLI install
# below exposes the bin, this install just adds the importable API)
npm install @lannguyensi/claim-gate
npm install @lannguyensi/evidence-ledger
npm install @lannguyensi/grounding-sdk
npm install @lannguyensi/grounding-wrapper
npm install @lannguyensi/hypothesis-tracker
npm install @lannguyensi/review-claim-gate
npm install @lannguyensi/runtime-reality-checker
# CLIs (install globally to expose the bin)
npm install -g @lannguyensi/claim-gate # → claim-gate
npm install -g @lannguyensi/debug-playbook-engine # → debug-playbook
npm install -g @lannguyensi/domain-router # → domain-router
npm install -g @lannguyensi/evidence-ledger # → ledger
npm install -g @lannguyensi/grounding-wrapper # → grounding-wrapper
npm install -g @lannguyensi/readme-first-resolver # → readme-first
npm install -g @lannguyensi/review-claim-gate # → review-claim-gate
npm install -g @lannguyensi/understanding-gate # → understanding-gate
# MCP server (install globally or invoke via npx)
npm install -g @lannguyensi/grounding-mcp # → grounding-mcpThe git clone workflow above is for hacking on the monorepo itself; downstream consumers install only what they need from npm.
| If you want to... | Read |
|---|---|
| Track facts / hypotheses / rejected ideas / unknowns during a debugging session | packages/evidence-ledger |
| Block strong claims until evidence backs them | packages/claim-gate |
| Manage competing hypotheses and require evidence to switch between them | packages/hypothesis-tracker |
| Compare actual runtime state against documentation | packages/runtime-reality-checker |
| Guide an agent through a domain-specific diagnostic sequence | packages/debug-playbook-engine |
| Force an agent to read primary docs before any analysis | packages/readme-first-resolver |
| Route a keyword to the right repos / components / docs scope | packages/domain-router |
Use a single ergonomic facade (verify / track / validate) over the stack |
packages/grounding-sdk |
Gate merge_approval on tests + checklist + evidence-ledger entry |
packages/review-claim-gate |
| Ask agents to produce an Understanding Report before acting | packages/understanding-gate |
| Wire the stack into an MCP-speaking client (Claude Code, Codex, OpenCode) | packages/grounding-mcp |
| Plan the agent entry path (sequence, guardrails, phases) for downstream enforcement | packages/grounding-wrapper |
| Package | Description |
|---|---|
| understanding-gate | Asks agents to produce an Understanding Report before acting. Phase 2 (enforcement) shipped: prompt-hook gate, structured report parsing + persistence, and PreToolUse blocking of destructive tools until the report is approved. Published as @lannguyensi/understanding-gate |
| Package | Description |
|---|---|
| runtime-reality-checker | Compares actual runtime state against documentation |
| claim-gate | Blocks strong claims without verified evidence |
| hypothesis-tracker | Tracks competing hypotheses, requires evidence to switch |
| Package | Description |
|---|---|
| debug-playbook-engine | Guides agents through domain-specific diagnostic sequences |
| evidence-ledger | Structured evidence tracking during debugging |
| grounding-wrapper | Plans grounding sessions (recommended tool sequence, guardrails, phases). Pure planner; enforcement is external. |
| readme-first-resolver | Forces agents to read primary docs before any analysis |
| domain-router | Routes keywords to correct repos, components and docs scope |
| Package | Description |
|---|---|
| grounding-sdk | verify/track/validate, ergonomic in-process facade over the stack |
| review-claim-gate | merge_approval gate for PR-review subagents, fails closed unless tests pass, the checklist is complete, and ≥1 evidence-ledger entry exists |
| Package | Description |
|---|---|
| grounding-mcp | JSON-RPC MCP server that exposes ledger_add / ledger_summary / claim_evaluate_from_session to any MCP-speaking client |
AI agents are good at generating plausible explanations. They're bad at verifying them. This framework enforces discipline:
- Don't assume: check runtime state before diagnosing.
- Don't claim: gate strong assertions behind evidence.
- Don't forget: track all hypotheses, don't silently drop them.
- Don't skip steps: follow diagnostic playbooks in order.
- Don't guess scope: route to the correct domain first.
The motivating incident lives in an internal logbook: an agent investigated two agent-grounding tasks against a checkout that was 16 commits behind origin, declared both "stale" because the relevant directories didn't exist locally, and only caught the drift hours later when a third task forced a fresh git pull. Two corrections had to be walked back. The check that would have caught it (git fetch && git status before any structural claim) is exactly what runtime-reality-checker + claim-gate enforce, given a runtime that consults them.
Experimental, functional tools with tests, APIs may evolve. Each package has its own README with install + usage; this top-level README is a routing index. Build/contribution notes live in CONTRIBUTING.md.
agent-grounding is the Validate stage of the Project OS Human-Agent Dev Lifecycle:
- agent-planforge plans
- agent-tasks coordinates
- agent-grounding verifies
- agent-preflight gates pushes
- harness declares + enforces the policy boundary that calls into all of the above