Refuse the unsafe step, cap the runaway cost.
A Nozick-grounded gate for LLM agents. Wraps any (prompt) -> completion function with two layered checks: a Truth-Tracking Gate (sensitivity + adherence + semantic-entropy proxy) and a Token Budget Contract (per-call by default, or per-agent by sharing a single BudgetLedger).
subjunctor is a gate, not an agent. You write the agent; subjunctor decides whether each candidate step should pass, refuse, escalate, or trip the budget contract (budget_exceed).
npm install subjunctorRequires Node >=20. The LLM SDK is not bundled — you inject your own CompletionFn.
import { gate, openaiAdapter } from "subjunctor";
import OpenAI from "openai";
const client = new OpenAI();
const completion = openaiAdapter({ client, model: "gpt-4o-mini" });
const verdict = await gate(
{
id: "step-42",
description: "Generate a SQL query against the analytics warehouse.",
prompt: "Write a SQL query that lists top-10 customers by revenue.",
answer: "SELECT customer_id, SUM(revenue) ... LIMIT 10",
tokens: { input: 280, output: 90 },
},
{
completion,
budget: {
soft_limit_tokens: 50_000,
hard_limit_tokens: 100_000,
mode: "hard_kill",
on_soft_warn: (r) => console.warn("budget soft-warn", r),
},
thresholds: { tau_u: 0.4, tau_s: 0.4, tau_a: 0.5 },
},
);
switch (verdict.kind) {
case "pass": /* execute the candidate */ break;
case "refuse": /* drop it, log reason */ break;
case "escalate": /* surface to a human */ break;
case "budget_exceed": /* stop the agent */ break;
}For a reusable bound configuration:
import { defineGate } from "subjunctor";
const guardedStep = defineGate({
completion,
thresholds: { tau_u: 0.4, tau_s: 0.4, tau_a: 0.5 },
});
const verdict = await guardedStep(candidate);For multi-step CoT answers, you can swap the uncertainty estimator from the default consistency sampler to a process reward verifier. It splits the answer into reasoning steps, asks your LLM to critique each step (ThinkPRM; Khalifa et al. 2025, arXiv:2504.16828), parses a SCORE: <0..1> line per step, and aggregates with a weakest-link rule by default — so a single bad step is enough to refuse:
import { gate } from "subjunctor";
const verdict = await gate(candidate, {
completion, // any CompletionFn — host ThinkPRM weights wherever
estimator: "process_reward_verifier",
prm_options: {
aggregator: "min", // "min" | "mean" | "last"
max_steps: 16,
parse_failure_score: 0, // default in v0.3.1+; explicit here for clarity
max_step_chars: 4000, // caps per-step text embedded in critique prompt
},
thresholds: { tau_u: 0.4, tau_s: 0.4, tau_a: 0.5 },
});The verifier is SDK-neutral: point completion at a ThinkPRM model on HF Inference, vLLM, OpenAI, Anthropic — anything that returns text. Override step_splitter or critique_template if your reasoning trace uses a non-default delimiter or you want a domain-specific critique prompt.
Each step issues one additional LLM call (capped by max_steps, default 16). These calls are not debited from BudgetLedger in v0.3.x — set hard_limit_calls defensively on the ledger when enabling this estimator. The verifier returns value = 1 - aggregated_score on the same [0, 1] scale as the other estimators, but PRM correctness signals typically concentrate near 0 or 1; a tighter tau_u (≈ 0.3–0.5) usually matches PRM semantics better than the default 0.6.
Parsing & injection defenses (v0.3.1) — the SCORE: line is parsed with a line-anchored regex (^...$, multiline) and the last anchored match is used, so a step containing the substring "SCORE: 1.0" inside prose cannot spoof the verdict and an LLM that self-corrects (e.g. "Initial SCORE: 0.9 ... Final SCORE: 0.2") is interpreted on its final score. Scientific-notation scores (1.5e-2) are now parsed correctly. If parsing still fails, the per-step score defaults to 0 (fail-closed); pass parse_failure_score: 0.5 to restore v0.3.0 neutral semantics. Each step's text is truncated at max_step_chars (default 4000) before being embedded in the critique prompt, and the embedded answer is capped at 4 × max_step_chars, to bound prompt-bloat amplification under adversarial input.
The default budget: field constructs a fresh BudgetLedger per gate() call. To cap cumulative cost across an agent's whole loop, build one ledger and pass it via ledger::
import { gate, BudgetLedger } from "subjunctor";
const ledger = new BudgetLedger({
config: {
soft_limit_tokens: 200_000,
hard_limit_tokens: 500_000,
hard_limit_calls: 1_000,
mode: "hard_kill",
},
});
for (const candidate of agent.steps()) {
const v = await gate(candidate, { completion, ledger });
if (v.kind === "budget_exceed") break;
}thresholds defaults to { tau_u: 0.6, tau_s: 0.5, tau_a: 0.5 } (conservative). The example above (0.4 / 0.4 / 0.5) is tighter and illustrative — tune for your domain.
- Not a benchmark or scoring submission. No empirical hallucination-detection score is claimed.
- Not an agent-driver framework that runs itself. It is a gate around your agent loop.
- Not a hallucination eliminator. Production-noise behavior of the entropy proxy is unverified at scale.
- Not a substitute for human review.
escalateexists for a reason.
The name subjunctor references Nozick's subjunctive conditional (Nozick 1981, Philosophical Explanations, Ch. 3) — the counterfactual form behind his sensitivity condition for knowledge.
| subjunctor concept | Nozick concept |
|---|---|
| Budget hard limit | Side constraint (result-independent prohibition) |
SensitivityProbe |
Sensitivity ("if it weren't true, I wouldn't believe it") |
AdherenceProbe |
Adherence ("in nearby possible worlds, I still believe it") |
ConsistencySampler |
Semantic-entropy proxy (Farquhar et al. 2024) |
ProcessRewardVerifier |
Step-level truth-tracking via ThinkPRM (Khalifa et al. 2025) |
Verdict decision tree |
Truth-tracking necessary conditions, ordered |
EntitlementSlot (interface) |
Entitlement theory (acquisition / transfer / rectification) — v0.2.x impl |
Each component has an independent rationale in epistemology (Nozick's truth-tracking, entitlement theory) and is not marketing veneer over a single hallucination detector.
- Counterfactual rewriting is performed by the same LLM that produced the candidate. The sensitivity probe therefore shares the model's blind spots. Independent-model probing is a v0.2.x roadmap item.
ConsistencySampleruses token-Jaccard distance as a cheap stand-in for true entailment-clustering semantic entropy. The Farquhar et al. (2024) Nature result reports AUROC ~0.79 on QA benchmarks; production-noise behavior at scale is not characterized here. An embedding/NLI backend is on the v0.2.x roadmap.- The evaluation in
eval/REPORT.mdis a circular self-validation. MockCompletionFns were written by the same author who wrote the gate. It demonstrates decision-tree behaviour, not field accuracy. - Budget Contract assumes accurate token counts. When
candidate.tokensis missing, the ledger falls back to local tiktoken counting, which can diverge slightly from provider-side accounting. v0.1.xextension slots are interface-only.EntitlementSlot,ReplaySlot,MetaSlotexist as TypeScript declarations so downstream code can already type against them; implementations land inv0.2.x.- Probe / estimator LLM calls are not counted against
BudgetLedgerin v0.1.x–v0.3.x. Onlycandidate.tokensrecorded by the caller is debited. Eachgate()call issues several internal LLM calls (consistency samples + sensitivity probes + adherence probes; theprocess_reward_verifierestimator additionally issues up tomax_stepsstep-critique calls pergate()). Operators handling adversarial input streams MUST set a defensivehard_limit_callsAND an upstream rate-limit. Internal-probe budgeting is planned for a later minor release. token_logprob_entropyrequires a CompletionFn that returns logprobs. Anthropic's Messages API does not. In default (strict: true) mode, the estimator throws when logprobs are missing — the gate then fails closed toescalate. Withstrict: falseit returnsvalue=1.0(maximum uncertainty) so the decision tree still does not silently fall open.
See eval/REPORT.md. Eight scenarios exercise each verdict path (pass / refuse / escalate / budget_exceed) plus a production-noise stability check. The report explicitly discloses the circular-test caveat above. The process_reward_verifier path (added in v0.3.0, hardened in v0.3.1) is not covered by eval/REPORT.md; its behaviour is exercised by test/estimator.process-reward-verifier.test.ts only.
The decision-tree evaluates checks in this priority:
Stage 0: budget hard_kill ? -> budget_exceed
Stage 1: uncertainty > tau_u ? -> refuse (high_uncertainty)
Stage 2: sensitivity < tau_s ? -> refuse (low_sensitivity)
Stage 3: adherence < tau_a ? -> escalate (low_adherence)
Stage 4: -> pass
The control-flow in gate() interleaves these checks with LLM calls:
- Record
candidate.tokensto the ledger. - If the post-record state would trip
hard_limit, short-circuit immediately to Stage 0 — no probe LLM calls are made. - Otherwise run the uncertainty estimator, then sensitivity, then adherence probes (each issues 1+ LLM call).
- After every probe stage, re-check the ledger; if it has tipped over (e.g. via external accounting hooked by the caller into the same ledger), short-circuit to Stage 0.
- Hand all scores to the pure-function
decide(), which applies the decision-tree order above.
Truth-tracking is fail-closed: any internal error — including invalid BudgetConfig, missing completion, or a throwing on_event sink — becomes escalate. See test/gate.test.ts for the contract.
subjunctor analyze trace.jsonl # streams a JSONL event log; summarises verdicts
subjunctor budget budget-config.json # replays a usage trace against a budget contract
subjunctor --versionnpm install
npm test # vitest, 80 tests
npm run coverage # v8 coverage; threshold 75%
npm run build # tsc -p tsconfig.build.json -> ./dist (+chmod +x dist/cli.js)
npm run eval # regenerate eval/REPORT.{json,md}See SECURITY.md for the private-advisory disclosure process and known v0.1.x in-scope items (including the probe-amplification budget gap above).
MIT. See LICENSE. Copyright 2026 hinanohart.
- Nozick, R. (1981). Philosophical Explanations, Ch. 3 — "Knowledge and Skepticism." Harvard University Press.
- Nozick, R. (1974). Anarchy, State, and Utopia — entitlement theory (acquisition / transfer / rectification).
- Farquhar, S., Kossen, J., Kuhn, L., & Gal, Y. (2024). "Detecting hallucinations in large language models using semantic entropy." Nature.
- Khalifa, M., Agarwal, R., Logeswaran, L., Kim, J., Peng, H., Lee, M., Lee, H., & Wang, L. (2025). "Process Reward Models That Think." arXiv:2504.16828.
- Miller, M. S. Object Capabilities — erights.org. (Reference only;
subjunctordoes not implement ocap patterns in v0.1.x.)