Skip to content

hinanohart/subjunctor

subjunctor

Refuse the unsafe step, cap the runaway cost.

A Nozick-grounded gate for LLM agents. Wraps any (prompt) -> completion function with two layered checks: a Truth-Tracking Gate (sensitivity + adherence + semantic-entropy proxy) and a Token Budget Contract (per-call by default, or per-agent by sharing a single BudgetLedger).

subjunctor is a gate, not an agent. You write the agent; subjunctor decides whether each candidate step should pass, refuse, escalate, or trip the budget contract (budget_exceed).

CI License: MIT

Install

npm install subjunctor

Requires Node >=20. The LLM SDK is not bundled — you inject your own CompletionFn.

Usage

import { gate, openaiAdapter } from "subjunctor";
import OpenAI from "openai";

const client = new OpenAI();
const completion = openaiAdapter({ client, model: "gpt-4o-mini" });

const verdict = await gate(
  {
    id: "step-42",
    description: "Generate a SQL query against the analytics warehouse.",
    prompt: "Write a SQL query that lists top-10 customers by revenue.",
    answer: "SELECT customer_id, SUM(revenue) ... LIMIT 10",
    tokens: { input: 280, output: 90 },
  },
  {
    completion,
    budget: {
      soft_limit_tokens: 50_000,
      hard_limit_tokens: 100_000,
      mode: "hard_kill",
      on_soft_warn: (r) => console.warn("budget soft-warn", r),
    },
    thresholds: { tau_u: 0.4, tau_s: 0.4, tau_a: 0.5 },
  },
);

switch (verdict.kind) {
  case "pass":          /* execute the candidate */ break;
  case "refuse":        /* drop it, log reason */    break;
  case "escalate":      /* surface to a human */     break;
  case "budget_exceed": /* stop the agent */         break;
}

For a reusable bound configuration:

import { defineGate } from "subjunctor";
const guardedStep = defineGate({
  completion,
  thresholds: { tau_u: 0.4, tau_s: 0.4, tau_a: 0.5 },
});
const verdict = await guardedStep(candidate);

ThinkPRM-style step verifier (v0.3.x)

For multi-step CoT answers, you can swap the uncertainty estimator from the default consistency sampler to a process reward verifier. It splits the answer into reasoning steps, asks your LLM to critique each step (ThinkPRM; Khalifa et al. 2025, arXiv:2504.16828), parses a SCORE: <0..1> line per step, and aggregates with a weakest-link rule by default — so a single bad step is enough to refuse:

import { gate } from "subjunctor";

const verdict = await gate(candidate, {
  completion,                         // any CompletionFn — host ThinkPRM weights wherever
  estimator: "process_reward_verifier",
  prm_options: {
    aggregator: "min",                // "min" | "mean" | "last"
    max_steps: 16,
    parse_failure_score: 0,           // default in v0.3.1+; explicit here for clarity
    max_step_chars: 4000,             // caps per-step text embedded in critique prompt
  },
  thresholds: { tau_u: 0.4, tau_s: 0.4, tau_a: 0.5 },
});

The verifier is SDK-neutral: point completion at a ThinkPRM model on HF Inference, vLLM, OpenAI, Anthropic — anything that returns text. Override step_splitter or critique_template if your reasoning trace uses a non-default delimiter or you want a domain-specific critique prompt.

Each step issues one additional LLM call (capped by max_steps, default 16). These calls are not debited from BudgetLedger in v0.3.x — set hard_limit_calls defensively on the ledger when enabling this estimator. The verifier returns value = 1 - aggregated_score on the same [0, 1] scale as the other estimators, but PRM correctness signals typically concentrate near 0 or 1; a tighter tau_u (≈ 0.3–0.5) usually matches PRM semantics better than the default 0.6.

Parsing & injection defenses (v0.3.1) — the SCORE: line is parsed with a line-anchored regex (^...$, multiline) and the last anchored match is used, so a step containing the substring "SCORE: 1.0" inside prose cannot spoof the verdict and an LLM that self-corrects (e.g. "Initial SCORE: 0.9 ... Final SCORE: 0.2") is interpreted on its final score. Scientific-notation scores (1.5e-2) are now parsed correctly. If parsing still fails, the per-step score defaults to 0 (fail-closed); pass parse_failure_score: 0.5 to restore v0.3.0 neutral semantics. Each step's text is truncated at max_step_chars (default 4000) before being embedded in the critique prompt, and the embedded answer is capped at 4 × max_step_chars, to bound prompt-bloat amplification under adversarial input.

Per-agent cumulative budget

The default budget: field constructs a fresh BudgetLedger per gate() call. To cap cumulative cost across an agent's whole loop, build one ledger and pass it via ledger::

import { gate, BudgetLedger } from "subjunctor";

const ledger = new BudgetLedger({
  config: {
    soft_limit_tokens: 200_000,
    hard_limit_tokens: 500_000,
    hard_limit_calls: 1_000,
    mode: "hard_kill",
  },
});

for (const candidate of agent.steps()) {
  const v = await gate(candidate, { completion, ledger });
  if (v.kind === "budget_exceed") break;
}

thresholds defaults to { tau_u: 0.6, tau_s: 0.5, tau_a: 0.5 } (conservative). The example above (0.4 / 0.4 / 0.5) is tighter and illustrative — tune for your domain.

What this is NOT

  • Not a benchmark or scoring submission. No empirical hallucination-detection score is claimed.
  • Not an agent-driver framework that runs itself. It is a gate around your agent loop.
  • Not a hallucination eliminator. Production-noise behavior of the entropy proxy is unverified at scale.
  • Not a substitute for human review. escalate exists for a reason.

Nozick rationale

The name subjunctor references Nozick's subjunctive conditional (Nozick 1981, Philosophical Explanations, Ch. 3) — the counterfactual form behind his sensitivity condition for knowledge.

subjunctor concept Nozick concept
Budget hard limit Side constraint (result-independent prohibition)
SensitivityProbe Sensitivity ("if it weren't true, I wouldn't believe it")
AdherenceProbe Adherence ("in nearby possible worlds, I still believe it")
ConsistencySampler Semantic-entropy proxy (Farquhar et al. 2024)
ProcessRewardVerifier Step-level truth-tracking via ThinkPRM (Khalifa et al. 2025)
Verdict decision tree Truth-tracking necessary conditions, ordered
EntitlementSlot (interface) Entitlement theory (acquisition / transfer / rectification) — v0.2.x impl

Each component has an independent rationale in epistemology (Nozick's truth-tracking, entitlement theory) and is not marketing veneer over a single hallucination detector.

Limitations (read this)

  1. Counterfactual rewriting is performed by the same LLM that produced the candidate. The sensitivity probe therefore shares the model's blind spots. Independent-model probing is a v0.2.x roadmap item.
  2. ConsistencySampler uses token-Jaccard distance as a cheap stand-in for true entailment-clustering semantic entropy. The Farquhar et al. (2024) Nature result reports AUROC ~0.79 on QA benchmarks; production-noise behavior at scale is not characterized here. An embedding/NLI backend is on the v0.2.x roadmap.
  3. The evaluation in eval/REPORT.md is a circular self-validation. Mock CompletionFns were written by the same author who wrote the gate. It demonstrates decision-tree behaviour, not field accuracy.
  4. Budget Contract assumes accurate token counts. When candidate.tokens is missing, the ledger falls back to local tiktoken counting, which can diverge slightly from provider-side accounting.
  5. v0.1.x extension slots are interface-only. EntitlementSlot, ReplaySlot, MetaSlot exist as TypeScript declarations so downstream code can already type against them; implementations land in v0.2.x.
  6. Probe / estimator LLM calls are not counted against BudgetLedger in v0.1.x–v0.3.x. Only candidate.tokens recorded by the caller is debited. Each gate() call issues several internal LLM calls (consistency samples + sensitivity probes + adherence probes; the process_reward_verifier estimator additionally issues up to max_steps step-critique calls per gate()). Operators handling adversarial input streams MUST set a defensive hard_limit_calls AND an upstream rate-limit. Internal-probe budgeting is planned for a later minor release.
  7. token_logprob_entropy requires a CompletionFn that returns logprobs. Anthropic's Messages API does not. In default (strict: true) mode, the estimator throws when logprobs are missing — the gate then fails closed to escalate. With strict: false it returns value=1.0 (maximum uncertainty) so the decision tree still does not silently fall open.

Evaluation report

See eval/REPORT.md. Eight scenarios exercise each verdict path (pass / refuse / escalate / budget_exceed) plus a production-noise stability check. The report explicitly discloses the circular-test caveat above. The process_reward_verifier path (added in v0.3.0, hardened in v0.3.1) is not covered by eval/REPORT.md; its behaviour is exercised by test/estimator.process-reward-verifier.test.ts only.

Decision-tree order vs control-flow order

The decision-tree evaluates checks in this priority:

Stage 0: budget hard_kill ?     -> budget_exceed
Stage 1: uncertainty > tau_u ?  -> refuse (high_uncertainty)
Stage 2: sensitivity < tau_s ?  -> refuse (low_sensitivity)
Stage 3: adherence < tau_a ?    -> escalate (low_adherence)
Stage 4:                        -> pass

The control-flow in gate() interleaves these checks with LLM calls:

  1. Record candidate.tokens to the ledger.
  2. If the post-record state would trip hard_limit, short-circuit immediately to Stage 0 — no probe LLM calls are made.
  3. Otherwise run the uncertainty estimator, then sensitivity, then adherence probes (each issues 1+ LLM call).
  4. After every probe stage, re-check the ledger; if it has tipped over (e.g. via external accounting hooked by the caller into the same ledger), short-circuit to Stage 0.
  5. Hand all scores to the pure-function decide(), which applies the decision-tree order above.

Truth-tracking is fail-closed: any internal error — including invalid BudgetConfig, missing completion, or a throwing on_event sink — becomes escalate. See test/gate.test.ts for the contract.

CLI

subjunctor analyze trace.jsonl          # streams a JSONL event log; summarises verdicts
subjunctor budget budget-config.json    # replays a usage trace against a budget contract
subjunctor --version

Development

npm install
npm test                # vitest, 80 tests
npm run coverage        # v8 coverage; threshold 75%
npm run build           # tsc -p tsconfig.build.json -> ./dist (+chmod +x dist/cli.js)
npm run eval            # regenerate eval/REPORT.{json,md}

Security

See SECURITY.md for the private-advisory disclosure process and known v0.1.x in-scope items (including the probe-amplification budget gap above).

License

MIT. See LICENSE. Copyright 2026 hinanohart.

References

  • Nozick, R. (1981). Philosophical Explanations, Ch. 3 — "Knowledge and Skepticism." Harvard University Press.
  • Nozick, R. (1974). Anarchy, State, and Utopia — entitlement theory (acquisition / transfer / rectification).
  • Farquhar, S., Kossen, J., Kuhn, L., & Gal, Y. (2024). "Detecting hallucinations in large language models using semantic entropy." Nature.
  • Khalifa, M., Agarwal, R., Logeswaran, L., Kim, J., Peng, H., Lee, M., Lee, H., & Wang, L. (2025). "Process Reward Models That Think." arXiv:2504.16828.
  • Miller, M. S. Object Capabilitieserights.org. (Reference only; subjunctor does not implement ocap patterns in v0.1.x.)

About

Refuse the unsafe step, cap the runaway cost. A Nozick-grounded gate for LLM agents (Truth-Tracking + Token Budget Contract).

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors