subjunctor

Refuse the unsafe step, cap the runaway cost.

A Nozick-grounded gate for LLM agents. Wraps any (prompt) -> completion function with two layered checks: a Truth-Tracking Gate (sensitivity + adherence + semantic-entropy proxy) and a Token Budget Contract (per-call by default, or per-agent by sharing a single BudgetLedger).

subjunctor is a gate, not an agent. You write the agent; subjunctor decides whether each candidate step should pass, refuse, escalate, or trip the budget contract (budget_exceed).

Install

npm install subjunctor

Requires Node >=20. The LLM SDK is not bundled — you inject your own CompletionFn.

Usage

import { gate, openaiAdapter } from "subjunctor";
import OpenAI from "openai";

const client = new OpenAI();
const completion = openaiAdapter({ client, model: "gpt-4o-mini" });

const verdict = await gate(
  {
    id: "step-42",
    description: "Generate a SQL query against the analytics warehouse.",
    prompt: "Write a SQL query that lists top-10 customers by revenue.",
    answer: "SELECT customer_id, SUM(revenue) ... LIMIT 10",
    tokens: { input: 280, output: 90 },
  },
  {
    completion,
    budget: {
      soft_limit_tokens: 50_000,
      hard_limit_tokens: 100_000,
      mode: "hard_kill",
      on_soft_warn: (r) => console.warn("budget soft-warn", r),
    },
    thresholds: { tau_u: 0.4, tau_s: 0.4, tau_a: 0.5 },
  },
);

switch (verdict.kind) {
  case "pass":          /* execute the candidate */ break;
  case "refuse":        /* drop it, log reason */    break;
  case "escalate":      /* surface to a human */     break;
  case "budget_exceed": /* stop the agent */         break;
}

For a reusable bound configuration:

import { defineGate } from "subjunctor";
const guardedStep = defineGate({
  completion,
  thresholds: { tau_u: 0.4, tau_s: 0.4, tau_a: 0.5 },
});
const verdict = await guardedStep(candidate);

ThinkPRM-style step verifier (v0.3.x)

For multi-step CoT answers, you can swap the uncertainty estimator from the default consistency sampler to a process reward verifier. It splits the answer into reasoning steps, asks your LLM to critique each step (ThinkPRM; Khalifa et al. 2025, arXiv:2504.16828), parses a SCORE: <0..1> line per step, and aggregates with a weakest-link rule by default — so a single bad step is enough to refuse:

import { gate } from "subjunctor";

const verdict = await gate(candidate, {
  completion,                         // any CompletionFn — host ThinkPRM weights wherever
  estimator: "process_reward_verifier",
  prm_options: {
    aggregator: "min",                // "min" | "mean" | "last"
    max_steps: 16,
    parse_failure_score: 0,           // default in v0.3.1+; explicit here for clarity
    max_step_chars: 4000,             // caps per-step text embedded in critique prompt
  },
  thresholds: { tau_u: 0.4, tau_s: 0.4, tau_a: 0.5 },
});

The verifier is SDK-neutral: point completion at a ThinkPRM model on HF Inference, vLLM, OpenAI, Anthropic — anything that returns text. Override step_splitter or critique_template if your reasoning trace uses a non-default delimiter or you want a domain-specific critique prompt.

Each step issues one additional LLM call (capped by max_steps, default 16). These calls are not debited from BudgetLedger in v0.3.x — set hard_limit_calls defensively on the ledger when enabling this estimator. The verifier returns value = 1 - aggregated_score on the same [0, 1] scale as the other estimators, but PRM correctness signals typically concentrate near 0 or 1; a tighter tau_u (≈ 0.3–0.5) usually matches PRM semantics better than the default 0.6.

Parsing & injection defenses (v0.3.1) — the SCORE: line is parsed with a line-anchored regex (^...$, multiline) and the last anchored match is used, so a step containing the substring "SCORE: 1.0" inside prose cannot spoof the verdict and an LLM that self-corrects (e.g. "Initial SCORE: 0.9 ... Final SCORE: 0.2") is interpreted on its final score. Scientific-notation scores (1.5e-2) are now parsed correctly. If parsing still fails, the per-step score defaults to 0 (fail-closed); pass parse_failure_score: 0.5 to restore v0.3.0 neutral semantics. Each step's text is truncated at max_step_chars (default 4000) before being embedded in the critique prompt, and the embedded answer is capped at 4 × max_step_chars, to bound prompt-bloat amplification under adversarial input.

Per-agent cumulative budget

The default budget: field constructs a fresh BudgetLedger per gate() call. To cap cumulative cost across an agent's whole loop, build one ledger and pass it via ledger::

import { gate, BudgetLedger } from "subjunctor";

const ledger = new BudgetLedger({
  config: {
    soft_limit_tokens: 200_000,
    hard_limit_tokens: 500_000,
    hard_limit_calls: 1_000,
    mode: "hard_kill",
  },
});

for (const candidate of agent.steps()) {
  const v = await gate(candidate, { completion, ledger });
  if (v.kind === "budget_exceed") break;
}

thresholds defaults to { tau_u: 0.6, tau_s: 0.5, tau_a: 0.5 } (conservative). The example above (0.4 / 0.4 / 0.5) is tighter and illustrative — tune for your domain.

What this is NOT

Not a benchmark or scoring submission. No empirical hallucination-detection score is claimed.
Not an agent-driver framework that runs itself. It is a gate around your agent loop.
Not a hallucination eliminator. Production-noise behavior of the entropy proxy is unverified at scale.
Not a substitute for human review. escalate exists for a reason.

Nozick rationale

The name subjunctor references Nozick's subjunctive conditional (Nozick 1981, Philosophical Explanations, Ch. 3) — the counterfactual form behind his sensitivity condition for knowledge.

subjunctor concept	Nozick concept
Budget hard limit	Side constraint (result-independent prohibition)
`SensitivityProbe`	Sensitivity ("if it weren't true, I wouldn't believe it")
`AdherenceProbe`	Adherence ("in nearby possible worlds, I still believe it")
`ConsistencySampler`	Semantic-entropy proxy (Farquhar et al. 2024)
`ProcessRewardVerifier`	Step-level truth-tracking via ThinkPRM (Khalifa et al. 2025)
`Verdict` decision tree	Truth-tracking necessary conditions, ordered
`EntitlementSlot` (interface)	Entitlement theory (acquisition / transfer / rectification) — v0.2.x impl

Each component has an independent rationale in epistemology (Nozick's truth-tracking, entitlement theory) and is not marketing veneer over a single hallucination detector.

Limitations (read this)

Counterfactual rewriting is performed by the same LLM that produced the candidate. The sensitivity probe therefore shares the model's blind spots. Independent-model probing is a v0.2.x roadmap item.
ConsistencySampler uses token-Jaccard distance as a cheap stand-in for true entailment-clustering semantic entropy. The Farquhar et al. (2024) Nature result reports AUROC ~0.79 on QA benchmarks; production-noise behavior at scale is not characterized here. An embedding/NLI backend is on the v0.2.x roadmap.
The evaluation in eval/REPORT.md is a circular self-validation. Mock CompletionFns were written by the same author who wrote the gate. It demonstrates decision-tree behaviour, not field accuracy.
Budget Contract assumes accurate token counts. When candidate.tokens is missing, the ledger falls back to local tiktoken counting, which can diverge slightly from provider-side accounting.
v0.1.x extension slots are interface-only. EntitlementSlot, ReplaySlot, MetaSlot exist as TypeScript declarations so downstream code can already type against them; implementations land in v0.2.x.
Probe / estimator LLM calls are not counted against BudgetLedger in v0.1.x–v0.3.x. Only candidate.tokens recorded by the caller is debited. Each gate() call issues several internal LLM calls (consistency samples + sensitivity probes + adherence probes; the process_reward_verifier estimator additionally issues up to max_steps step-critique calls per gate()). Operators handling adversarial input streams MUST set a defensive hard_limit_calls AND an upstream rate-limit. Internal-probe budgeting is planned for a later minor release.
token_logprob_entropy requires a CompletionFn that returns logprobs. Anthropic's Messages API does not. In default (strict: true) mode, the estimator throws when logprobs are missing — the gate then fails closed to escalate. With strict: false it returns value=1.0 (maximum uncertainty) so the decision tree still does not silently fall open.

Evaluation report

See eval/REPORT.md. Eight scenarios exercise each verdict path (pass / refuse / escalate / budget_exceed) plus a production-noise stability check. The report explicitly discloses the circular-test caveat above. The process_reward_verifier path (added in v0.3.0, hardened in v0.3.1) is not covered by eval/REPORT.md; its behaviour is exercised by test/estimator.process-reward-verifier.test.ts only.

Decision-tree order vs control-flow order

The decision-tree evaluates checks in this priority:

Stage 0: budget hard_kill ?     -> budget_exceed
Stage 1: uncertainty > tau_u ?  -> refuse (high_uncertainty)
Stage 2: sensitivity < tau_s ?  -> refuse (low_sensitivity)
Stage 3: adherence < tau_a ?    -> escalate (low_adherence)
Stage 4:                        -> pass

The control-flow in gate() interleaves these checks with LLM calls:

Record candidate.tokens to the ledger.
If the post-record state would trip hard_limit, short-circuit immediately to Stage 0 — no probe LLM calls are made.
Otherwise run the uncertainty estimator, then sensitivity, then adherence probes (each issues 1+ LLM call).
After every probe stage, re-check the ledger; if it has tipped over (e.g. via external accounting hooked by the caller into the same ledger), short-circuit to Stage 0.
Hand all scores to the pure-function decide(), which applies the decision-tree order above.

Truth-tracking is fail-closed: any internal error — including invalid BudgetConfig, missing completion, or a throwing on_event sink — becomes escalate. See test/gate.test.ts for the contract.

CLI

subjunctor analyze trace.jsonl          # streams a JSONL event log; summarises verdicts
subjunctor budget budget-config.json    # replays a usage trace against a budget contract
subjunctor --version

Development

npm install
npm test                # vitest, 80 tests
npm run coverage        # v8 coverage; threshold 75%
npm run build           # tsc -p tsconfig.build.json -> ./dist (+chmod +x dist/cli.js)
npm run eval            # regenerate eval/REPORT.{json,md}

Security

See SECURITY.md for the private-advisory disclosure process and known v0.1.x in-scope items (including the probe-amplification budget gap above).

License

References

Nozick, R. (1981). Philosophical Explanations, Ch. 3 — "Knowledge and Skepticism." Harvard University Press.
Nozick, R. (1974). Anarchy, State, and Utopia — entitlement theory (acquisition / transfer / rectification).
Farquhar, S., Kossen, J., Kuhn, L., & Gal, Y. (2024). "Detecting hallucinations in large language models using semantic entropy." Nature.
Khalifa, M., Agarwal, R., Logeswaran, L., Kim, J., Peng, H., Lee, M., Lee, H., & Wang, L. (2025). "Process Reward Models That Think." arXiv:2504.16828.
Miller, M. S. Object Capabilities — erights.org. (Reference only; subjunctor does not implement ocap patterns in v0.1.x.)

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
.github		.github
eval		eval
scripts		scripts
src		src
test		test
.gitignore		.gitignore
.gitleaks.toml		.gitleaks.toml
CHANGELOG.md		CHANGELOG.md
CITATION.cff		CITATION.cff
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
SAFETY.md		SAFETY.md
SECURITY.md		SECURITY.md
package-lock.json		package-lock.json
package.json		package.json
spec.json		spec.json
tsconfig.build.json		tsconfig.build.json
tsconfig.json		tsconfig.json
vitest.config.ts		vitest.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

subjunctor

Install

Usage

ThinkPRM-style step verifier (v0.3.x)

Per-agent cumulative budget

What this is NOT

Nozick rationale

Limitations (read this)

Evaluation report

Decision-tree order vs control-flow order

CLI

Development

Security

License

References

About

Uh oh!

Releases 7

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

subjunctor

Install

Usage

ThinkPRM-style step verifier (v0.3.x)

Per-agent cumulative budget

What this is NOT

Nozick rationale

Limitations (read this)

Evaluation report

Decision-tree order vs control-flow order

CLI

Development

Security

License

References

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 7

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages