Evidence-first coding agent for Claude Code CLI. Port of burkeholland/anvil from GitHub Copilot CLI.
Your coding agent should prove its work. This one does.
/anvil verifies every change before you see it β builds, tests, lints, runs IDE diagnostics, then has other AI models (Claude, GPT, Gemini, Ollama) try to break it. Every check is INSERTed into a SQLite ledger; the final Evidence Bundle is a SELECT, not prose. If the INSERT didn't happen, the verification didn't happen.
- Adversarial review. Every change gets attacked by 1 (Medium task) or 3 (Large / π΄ task) different models in parallel. Claude runs as a Task subagent; GPT / Gemini / Ollama run via Bash scripts. They disagree with each other. That's the point.
- SQL verification ledger. Every build, test, lint, diagnostics, and reviewer verdict lives in
anvil_checks. The Evidence Bundle isSELECT ... ORDER BY phaseβ the agent can't fake it. - Baseline snapshots. Before touching anything, anvil records your project's state. After, it diffs. Regressions are caught before you see the code.
- Pushback. Anvil is a senior engineer, not an order taker. If your request introduces tech debt, duplicates existing code, or has a dangerous edge case, it tells you before writing a single line.
- Session memory. A PostToolUse hook tracks every file anvil edits. Next session, if you touch that file again, anvil recalls past failures and accounts for them.
- Git autopilot. Checks git state, stashes uncommitted work, creates a branch, commits the result with a clean rollback command.
- Context7 docs. Bundled MCP server fetches up-to-date library docs so anvil doesn't hallucinate APIs.
- Caveman mode (optional). Compress
/anvil's prose ~75% via the separatecavemanskill β structured artifacts (Evidence Bundle, pushback, code, commits) stay verbatim. Off by default; see Caveman output mode below.
- Claude Code CLI (2.1.x or newer; plugin support).
- Python 3.9+ on your
PATHβ used for the ledger and external reviewer scripts (stdlib only, nopip install). - Git (obviously).
- Node.js (only needed for the bundled Context7 MCP server, launched via
npx). - Optional reviewer credentials (see Reviewer setup below). Anvil gracefully degrades if a reviewer isn't configured.
In any Claude Code session:
/plugin marketplace add allut/claude-anvil
/plugin install claude-anvil@allut-claude-anvil
git clone https://github.com/allut/claude-anvil.git
claude --plugin-dir ./claude-anvilInitialize the ledger once (it'll auto-init on first /anvil run, but this lets you verify the install):
python scripts/anvil-ledger.py init
# -> anvil ledger ready at ~/.claude-anvil/anvil.dbRun /anvil-setup from any Claude Code session. The wizard will:
- Ask which reviewers to enable (Claude / Gemini / Ollama / OpenAI-compatible).
- Collect API keys, validate each with a small live request, and write them to
~/.claude-anvil/config.json(chmod0600on POSIX). - Pick the Claude reviewer's model (
sonnet/haiku/opus). - Offer to
ollama pullif Ollama is enabled and the model isn't pulled yet. - Optionally create bare shortcuts so you can type
/anvilinstead of/claude-anvil:anvil(see below).
You can re-run /anvil-setup any time to reconfigure, or /anvil-setup reset to wipe and start over.
The wizard's final step offers to write ~/.claude/commands/anvil.md and ~/.claude/commands/anvil-setup.md, which expose the commands without the plugin namespace prefix. After accepting, you can use /anvil and /anvil-setup directly in any conversation.
Re-running /anvil-setup refreshes the shortcuts when the plugin updates. To create them manually:
python "${CLAUDE_PLUGIN_ROOT}/scripts/anvil-config.py" create-shortcuts "${CLAUDE_PLUGIN_ROOT}"| Reviewer | Cost | Get a key |
|---|---|---|
| Claude (Task subagent) | Covered by your Claude Code plan | Already installed β wizard picks the model (sonnet default, haiku faster/cheaper, opus deepest) |
| Ollama (local) | Free | Install from ollama.com; the wizard offers to ollama pull your chosen model |
| Gemini (Google AI Studio) | Free tier | aistudio.google.com/apikey |
| OpenAI-compatible | Paid for openai.com GPT-5; free via OpenRouter / Groq | The wizard offers OpenAI / OpenRouter / Groq presets, or a custom endpoint |
Default roster after wizard: Medium = first enabled reviewer; Large = up to 3 enabled reviewers (priority claude > gemini > openai > ollama).
Power users: any ANVIL_* env var still wins over config.json, so the legacy .env-based flow keeps working β see .env.example for the full list. The dispatcher (anvil-review.py) reads env first, then config.json, then defaults.
Provider quirks handled automatically:
- GPT-5 and o1/o3: the
temperaturefield is omitted (those models reject anything but1). - OpenRouter / Groq:
HTTP-RefererandX-Titleattribution headers are sent unconditionally. If a free-tier model rejectsresponse_format: {type: json_object}, the wizard's "JSON mode = off" toggle (orANVIL_OPENAI_JSON_MODE=off) makes anvil fall back to extracting JSON from the reply body.
A commit gate blocks git commit during an active /anvil session when the ledger is missing evidence in any of the three phases (baseline, after, review). Commits outside an anvil session are unaffected.
Anvil can route its free-form prose through the caveman skill, which compresses chatter ~75% while keeping full technical accuracy. Anvil's structured artifacts stay verbatim β the Evidence Bundle and its tables, pushback callouts (
Prerequisite:
cavemanis a separate skill, not bundled with claude-anvil. Install it to~/.claude/skills/caveman/first. If it's absent, your chosen level is still saved but/anvilfalls back to normal prose until the skill is installed β enabling it is harmless either way.
Two dialects, three intensities each:
| Dialect | Levels | What it sounds like |
|---|---|---|
| Standard | lite / full / ultra |
Caveman English (full is the default intensity) |
| WΓ©nyΓ‘n ζθ¨ | wenyan-lite / wenyan-full / wenyan-ultra |
Classical-Chinese caveman |
Enable it either way:
- Via the wizard:
/anvil-setupasks "Compress /anvil's prose with caveman mode?" and, if you opt in, the intensity. - Via env var:
ANVIL_CAVEMAN_LEVEL=ultra(or any level above;off/none/disableddisables it). The env var wins overconfig.json, like every otherANVIL_*override.
Default (unset) = off. Check the active level with python scripts/anvil-config.py caveman (prints the level or off).
From any Claude Code session inside the project you want to edit:
/anvil fix the off-by-one in getNthItem(), plus a regression test
What happens next, on a Medium task:
- Boost β your request is rewritten into a precise spec (hidden unless the intent changed).
- Git hygiene β checks for uncommitted changes / being on
main; pushes back if anything looks risky. - Recall β queries the SQLite ledger for past anvil runs that touched the same files, so surprises from previous sessions get surfaced.
- Survey β greps the codebase for existing helpers you should extend instead of re-inventing.
- Baseline β runs the project's build/tests/diagnostics and INSERTs the results as
phase='baseline'. - Implement β edits the code.
- The Forge β reruns build/tests/diagnostics (as
phase='after'), then stages the diff and unleashes one reviewer (Claude by default). The reviewer's JSON verdict is INSERTed asphase='review', check_name='review-claude'. - Evidence Bundle β rendered from
SELECT * FROM anvil_checks WHERE task_id = .... Any regression, any reviewer finding, shows up. - Commit β auto-commits with a one-line rollback.
Large tasks (multi-file, auth/crypto/payments, π΄ files) additionally show a plan step you approve via AskUserQuestion, run three reviewers in parallel, and include an operational-readiness check (secrets, observability, graceful degradation).
Small tasks (typos, renames, config tweaks) skip the ledger β just Quick Verify + optional commit.
# All checks for a task
python scripts/anvil-ledger.py select-bundle fix-login-crash
# Has a file been touched by past anvil runs?
python scripts/anvil-ledger.py recall auth.ts
# Did anything break on this file in the past?
python scripts/anvil-ledger.py recall-issues auth.ts
# List learned facts (build commands, codebase conventions, etc.)
python scripts/anvil-ledger.py memory-listThe raw DB is at ~/.claude-anvil/anvil.db. Open it with any SQLite browser.
-
EPERM: operation not permitted, rename β¦ allut-claude-anvil β allut-claude-anvil.bak(Windows) β Claude Code backs up the old plugin directory before updating it. Windows blocks the rename if any process holds a handle to a file inside that directory (common culprits: Windows Defender real-time scanning, Windows Explorer, or Claude Code itself). Fix:- Exit Claude Code completely.
- Double-click
plugin/scripts/fix-windows-plugin.bat(or run it from any terminal β no PowerShell execution policy change required). It removes the stale plugin directories and prints a confirmation. - Reopen Claude Code and run
/plugin marketplace add allut/claude-anvilagain.
To prevent recurrence, add
%USERPROFILE%\.claude\pluginsto Windows Defender's exclusion list (Windows Security β Virus & threat protection β Manage settings β Exclusions). -
anvil-ledger.py: schema missingβ runpython scripts/anvil-ledger.py initfrom the plugin directory so the script can resolvesql/schema.sql. -
"Ollama unreachable" β is the daemon running?
curl http://localhost:11434/api/tags. -
Gemini 429 / quota exceeded β the default
gemini-2.5-flashgives 1,500 req/day free. If you're still hitting limits, re-run/anvil-setupand generate a new API key at https://aistudio.google.com/apikey, or switch to a different reviewer. -
Reviewer returned "concern: no parseable verdict" β the model ignored the JSON-only instruction. Happens occasionally with smaller/older models;
anvil-review.pyattempts to extract the first JSON object from the reply and falls back toconcernif that fails. -
PostToolUse hook isn't tracking edits β the hook only runs while an anvil session is open (
sessions.ended_at IS NULL). If/anvilcrashed partway through, runpython scripts/anvil-ledger.py end-session <id> "crashed"to close it.
The Setup Wizard stores API keys in the OS credential store when available β no plaintext lands in config.json:
| Platform | Storage |
|---|---|
| macOS | macOS Keychain (security CLI); config.json holds the marker "keychain" |
| Windows | DPAPI-encrypted Base64 inline in config.json; unreadable on any other machine or user account |
| Linux | libsecret via secret-tool; config.json holds the marker "keychain" |
| Fallback | Plaintext in config.json (chmod 0600) if no keychain backend is detected |
Run python scripts/anvil-config.py keychain-status to see which backend is active and where each key is stored.
The legacy .env file (if you use it) is also gitignored. Double-check your shell history and process listing regardless of storage tier.
Keys vs. diffs: API keys are sent only to their issuing service (Gemini key β Google, OpenAI key β OpenAI) as authentication credentials β they are not shared with other providers. What is sent to each enabled provider is the staged code diff for review; don't point GPT/Gemini at diffs you can't share with a third party.
The Anvil Loop prompt is adapted from burkeholland/anvil (MIT). The Copilot CLI primitives (ask_user, store_memory, session_store, Copilot multi-vendor subagents) were translated to their Claude Code equivalents.
MIT. See LICENSE.