Skip to content

fix(subagents): pin model + strip plugins/MCP/thinking on every spawn#52

Merged
amitpaz1 merged 1 commit into
mainfrom
fix/cheap-subagents
May 9, 2026
Merged

fix(subagents): pin model + strip plugins/MCP/thinking on every spawn#52
amitpaz1 merged 1 commit into
mainfrom
fix/cheap-subagents

Conversation

@amitpaz1
Copy link
Copy Markdown
Collaborator

@amitpaz1 amitpaz1 commented May 9, 2026

Summary

  • The capture-extract / dream / graph_extraction subagents spawn claude -p with no --model, no --settings, and no --strict-mcp-config, so each spawn inherits the user's full Claude Code stack — default model (often Opus), alwaysThinkingEnabled, effortLevel, every enabled plugin, and every MCP server (including lore itself, recursively).
  • Real-world cost on a near-empty extraction: ~$0.35 / spawn with ~46k cache-creation tokens — most of it spent rebuilding a system prompt the subagent never used. With ~580 captured sessions on a single user's machine that adds up to roughly $200 of background spend.
  • New lore.subagent_config module materializes a minimal MCP config (lore-only or empty) plus a settings override (no plugins, no thinking, low effort) under ~/.lore/subagent/ and returns the flag set per role. All three spawn sites now append --model, --strict-mcp-config, --mcp-config, --settings.
  • Defaults: capture & graph_extraction → Haiku 4.5; dream → Sonnet 4.6 (multi-step reflection, runs at most once / 24h). Overridable via LORE_SUBAGENT_MODEL, LORE_DREAM_MODEL, LORE_GRAPH_MODEL.

Result

End-to-end smoke test on a one-shot extraction:

Before After Change
total_cost_usd $0.35468 $0.01251 ~28× cheaper
cache_creation_input_tokens 46,494 7,676 ~6× smaller
model claude-opus-4-7 claude-haiku-4-5-20251001

Test plan

  • tests/test_subagent_config.py — 12 new tests covering model defaults, env-var override chain, MCP config bodies (lore-only vs empty), settings body, claude_flags() shape, lazy materialization, idempotent rewrite.
  • tests/services/test_graph_extraction.py::TestSpawnClaudeArgs — extended to assert --model, --strict-mcp-config, --mcp-config, --settings are present (regression guard against silently re-inheriting the user's stack).
  • Full unit suite: pytest tests/ --ignore=tests/integration → 2746 passed, 1183 skipped, 0 failed.
  • ruff check clean on all touched files.
  • Live smoke test: claude -p accepts the flag set, model override took effect, cost numbers above.

🤖 Generated with Claude Code

The capture-extract / dream / graph_extraction subagents were spawning
`claude -p` with no `--model`, no `--settings`, and no `--strict-mcp-config`.
That meant each spawn inherited the user's full Claude Code stack:
default model (Opus), `alwaysThinkingEnabled`, `effortLevel`, every
enabled plugin, and every MCP server (including lore itself,
recursively). Real cost on a near-empty extraction was ~$0.35 / spawn
with 46k cache-creation tokens — most of it spent rebuilding a system
prompt the subagent never used.

This adds `lore.subagent_config`, which materializes a minimal MCP
config (lore-only or empty) plus a settings override (no plugins,
no thinking, low effort) under `~/.lore/subagent/` and returns the
flag set for each role. Capture and graph_extraction default to Haiku
4.5; dream defaults to Sonnet 4.6 because it does multi-step
reflection (and runs at most once per 24h). Overridable via
`LORE_SUBAGENT_MODEL`, `LORE_DREAM_MODEL`, `LORE_GRAPH_MODEL`.

Smoke-tested end-to-end: a one-shot extraction now costs ~$0.012
(28x cheaper) with cache-creation down from 46k to 7.7k tokens.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@amitpaz1 amitpaz1 merged commit d5dfb72 into main May 9, 2026
6 checks passed
amitpaz1 added a commit that referenced this pull request May 9, 2026
…ions (#53)

Each ``claude -p`` capture-extract / dream / graph_extraction subagent
is itself a Claude Code session. The user's PostToolUse / Stop /
SessionEnd hooks therefore fire for the subagent's *own* tool uses.
Without a guard, this cascades:

  1. Active session does N tool uses → PostToolUse hook → spawns
     capture-extract subagent.
  2. Subagent's own tool uses (Read, Bash, Grep …) re-fire PostToolUse,
     which writes to *its* session_id buffer.jsonl.
  3. After ``LORE_CAPTURE_N`` entries, the hook spawns *another*
     capture-extract for the subagent — and so on.

Observed in production for one user: ~700 spawns/hour, ~$34/h on
Haiku, 685 distinct sessions hit in 60 minutes despite only one
interactive ``claude`` process running. ``--strict-mcp-config`` from
PR #52 isolates MCP but does not isolate hooks.

Two layers of defense:

  1. ``subagent_config.settings_body()`` now writes ``hooks`` as
     empty arrays. ``--settings`` overrides the user's hooks for
     the subagent's session.
  2. New ``SubagentConfig.env_overrides()`` returns
     ``LORE_AUTO_SAVE=false`` and ``LORE_DREAM_AUTO=false``. The
     hook scripts honor these as master kill switches and exit 0
     immediately. All three spawn sites now pass
     ``env={**os.environ, **cfg.env_overrides()}`` so the guard
     survives any caching of ``settings.json`` by the running
     parent Claude Code process.

3 new tests: hooks-empty in materialized settings, env_overrides
shape, env_overrides parity across roles. Existing
TestSpawnClaudeArgs extended to assert recursion-guard env vars
are passed to subprocess.Popen.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant