pythea

LLM reliability research from Hassana Labs.

Three tools, one goal: catch models that know but don't use.

🍓 Strawberry: Procedural Hallucination Toolkit

Ask Claude to count the r's in "strawberry." It writes "s-t-r-a-w-b-e-r-r-y," identifies each r, gets to 3. Then outputs "2."

The model didn't lack information. The answer was right there—in text it generated moments earlier. The computation worked. The routing failed.

This toolkit detects those failures mathematically.

pip install pythea
python -m strawberry.factual_recall \
  --question "Which US senators from Minnesota graduated from Princeton" \
  --out report.json

What it catches:

RAG that retrieves but doesn't read
Chain-of-thought that cites steps it ignored
Self-verification that validates without checking
Citation confabulation (decorative sources)

How: Scrub the cited evidence, measure confidence change. No change? The model was confabulating.

Integrations: Strawberry ships with an MCP server (strawberry.mcp_server) that can be used from Claude Code or OpenAI Codex (Codex CLI / IDE extension) to run detect_hallucination and audit_trace_budget as tools.

Agent workflows (Codex skills): This repo includes repo-scoped Codex skills under .codex/skills/:

rca-fix-agent: evidence-first debugging agent implemented as a composition of planning-agent + execution-agent (reproduce → evidence → hypotheses → Strawberry-verified ROOT_CAUSE → plan-of-record → execute patches with Strawberry-gated acceptance → run a mandatory final verification test → iterate).
proof-attack-agent: a focused, anti-repeat brute-force “proof gap attack” loop (usually for a stuck Lean goal). It uses Strawberry verification as a heartbeat after every micro-plan update and maintains an Attempt Ledger so you don’t retry the same dead ends.
proof-repair-agent: plan-driven, evidence-first proof repair / proof synthesis agent for LaTeX + a formal backstop (Lean/Coq/...). It uses planning-agent to produce a plan-of-record, uses proof-attack-agent to explore stuck gaps, checks for theorem drift, and iterates until theorem Y is machine-checked.
planning-agent: evidence-first hierarchical planning agent that does local environment forensics + web lookup, then Strawberry-verifies each plan step against an explicit “what does success look like?” spec.
execution-agent: Strawberry-gated plan execution agent that ships code patches, accepting each step only when “this step has succeeded” is provably true (tests/output/diffs), and (for full runs) requires passing the plan’s mandatory final verification test.

→ Proof Repair Agent guide

→ Proof Attack Agent guide

→ Full docs

🔌 Thea API Client

Lightweight client for the Thea Mini Reasoning API.

from pythea import TheaClient

with TheaClient(base_url="https://...") as client:
    resp = client.unified_answer(
        question="What is 2+2?",
        backend="aoai-pool",
        m=6,
    )
    print(resp.get("answer"))

→ Full docs

📊 Offline QMV Probing

Model-agnostic permutation-mixture evaluation via Bernoulli first-token logprob probes.

from pythea.offline import qmv

res = qmv.evaluate_permutation_family(
    probe=probe,
    parts=parts,
    cfg=qmv.PermutationEvalConfig(m=6, num_bands=2, seed=0),
)
print(res.q_bar, res.q_lo, res.js_bound)

→ Full docs

Install

git clone https://github.com/leochlon/pythea.git
cd pythea
pip install -e .

Extras:

pip install -e ".[dev]"      # tests
pip install -e ".[offline]"  # tiktoken for logit bias
pip install -e ".[vllm]"     # local inference

Repo layout

pythea/
├── strawberry/          # Procedural hallucination toolkit
│   ├── README.md
│   └── src/
├── src/pythea/          # Thea client + QMV probing
├── docs/                # Detailed documentation
├── examples/
├── tests/
└── benchmarks/

Citation

@article{hassanalabs2026procedural,
  title={An Information-Theoretic and Causal Theory of Procedural Hallucinations},
  author={{Hassana Labs}},
  journal={arXiv preprint},
  year={2026}
}

License

MIT — see LICENSE.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

pythea

🍓 Strawberry: Procedural Hallucination Toolkit

🔌 Thea API Client

📊 Offline QMV Probing

Install

Repo layout

Citation

License

About

Uh oh!

Releases

Packages

Contributors 5

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.codex/skills/rca-fix-agent		.codex/skills/rca-fix-agent
benchmarks		benchmarks
docs		docs
examples		examples
nala-mcp		nala-mcp
src/pythea		src/pythea
strawberry		strawberry
tests		tests
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md
pyproject.toml		pyproject.toml

License

leochlon/pythea

Folders and files

Latest commit

History

Repository files navigation

pythea

🍓 Strawberry: Procedural Hallucination Toolkit

🔌 Thea API Client

📊 Offline QMV Probing

Install

Repo layout

Citation

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Languages

Packages