Skip to content

EdgeF-4/veritas

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

VERITAS

Deterministic runtime verification for autonomous agents. When an agent says it did something — sent the email, wrote the file, issued the refund — VERITAS checks whether it actually happened by re-querying the real system, and returns PASS / FAIL / INCONCLUSIVE in plain code. No model grades the result. The thing being judged is never the judge.

Agents that evaluate their own work tend to conclude they succeeded — even when they produced nothing, produced garbage, or reused a stale result. VERITAS is the independent check that sits between an agent's "done" and anyone believing it.

What makes it trustworthy

  • Deterministic floor. The pass/fail verdict is computed from the contract, the run context, and the raw provider response — with a fixed predicate set, no language model. An optional refutation layer can suggest checks you missed, but it is a separate package that physically cannot change a verdict.
  • Ground truth, not a guess. VERITAS re-queries the external state the task should have changed (mailbox, calendar, Drive, payments, a database) and asserts against what it finds.
  • It proves the result belongs to this run. The hard part of verification is not "does an email exist" — it is "does this email belong to this run, and not a look-alike (from yesterday, or from a concurrent run)." A top-confidence (EXACT) pass requires an unforgeable, run-namespaced marker the agent had to stamp into something it actually created — so it cannot be earned by reporting an artifact the agent did not cause. A merely agent-recorded id guarded by a server timestamp proves the artifact is in-window, not that this run caused it, so it sits below the auto-pass floor (BOUNDED) unless the operator explicitly opts in for an id-isolated deployment. Everywhere causation cannot be established, VERITAS fails closed.
  • Evidence for every check. Never a bare verdict — the raw state, the candidates it considered, and each assertion's expected-vs-actual, in a reproducible, tamper-evident bundle.
  • Honest about its limits. A claim is only verified deterministically when a recoverable correlation key exists. UI-only flows, third-party acceptance, and fuzzy outcomes are reported as out of scope for the floor — never quietly passed.

The 60-second demo

# An agent claims it wrote today's brief to Drive and recorded what it did.
veritas verify --contract VERIFY.md --run ./run-record.json
✗ FAIL  daily-brief
  ✗ brief-doc-exists        the recorded file id resolves to a document created 2026-05-28,
                            outside today's window (correlation guard: VERIFIER_OBSERVED createdTime)
  – evidence               .veritas/runs/2026-05-30T08-02-11Z.json

The agent reported success and recorded a document id — but it pointed at yesterday's brief. VERITAS caught it because the document's own server timestamp falls outside the run window, and an agent-asserted id alone is never enough to pass.

Status

Early development. Architecture and decisions are public and reviewable:

  • ARCHITECTURE.md — the full design.
  • docs/adr/ — the load-bearing decisions (MADR format).
  • benchmark/ — the corpus that scores the verifier (the credibility anchor).

Design principles (non-negotiable)

  1. The verdict floor never calls a model. Enforced as a package boundary + a CI dependency-graph check, not a convention.
  2. Fail closed. Ambiguity, missing keys, and transport errors never become a silent PASS.
  3. Additive and non-gating. VERITAS drops onto an existing agent or cron job without owning its loop.
  4. Dependency-light core. The verdict path is standard-library only.

License

Apache 2.0 — see LICENSE.

About

Deterministic runtime verification for autonomous agents. Checks whether an agent actually did what it claimed by re-querying the real system. No model grades the result.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages