Claude Code has no spending ceiling. TokenXRay adds one.
I spent $104 in a single Claude Code session. Then I audited all 686 of mine and found that 10% of sessions burned 89% of the money — $12,560 out of $14,000+ total. The culprit: context that grows quadratically, cache creation fees nobody mentions, and tool results that ride in context forever.
![]() CLI — overview · session drill-down · diagnose |
Status line — live cost + hints in every Claude Code session |
pip install tokenxray
tokenxray --install-hook --confirm # one-time setup, then forget about itZero dependencies. Pure Python stdlib. Python 3.9+.
TokenXRay has three layers: a status line always visible at the bottom of Claude Code, hooks that run automatically after every tool call, and a CLI you run when you want to review spending patterns.
No other Claude Code tool will forcibly halt a runaway session. TokenXRay can. When hard_stop is enabled in config, the hook exits with code 2 after every tool call once a ceiling is crossed — Claude Code cannot continue until you start a fresh session.
Enable it once:
{"hard_stop": true, "hard_stop_turns": 120, "hard_stop_cost": 10}When the ceiling hits, every tool call fires this in the conversation:
[TokenXRay] HARD STOP — cost limit ($10.43/$10) reached. Session blocked.
Please wrap up and start a fresh session.
Checkpoint was auto-saved earlier. Disable with: hard_stop=false in ~/.tokenxray/config.json
Claude Code stops. The session cost stops. Off by default — enable only if you want a hard ceiling. The advisory split warning fires regardless.
The hook fires after every tool call. When you cross a cost threshold, you see it immediately in the conversation — no checking required:
[TokenXRay] $3.14 spent (crossed $3) — Sonnet 4.6, 28 turns, ctx 42K, ~$0.11/turn
Default thresholds: $1, $3, $5, $10, $25, $50. Fully configurable. Alerts fire once per threshold per session and don't repeat.
After --install-hook, a persistent status line appears at the bottom of every Claude Code session showing live session health:
Opus │ $5.60 │ ▸▸ │ T42 │ ~$0.13/t │ ctx 38% │ ~43 left │ 27m
| Field | Meaning |
|---|---|
Opus |
Current model |
$5.60 |
Total session cost (color-coded: green < $1, yellow < $5, red > $15) |
▸▸ |
Spend velocity — more arrows = burning faster per turn |
T42 |
Turn count |
~$0.13/t |
Average cost per turn |
ctx 38% |
Context window usage (color-coded: green < 50%, yellow < 75%, red > 90%) |
~43 left |
Estimated turns remaining before context fills |
27m |
Session duration |
Actionable hints — when a trigger fires, the status line switches from full metrics to a focused action line:
Opus │ $5.60 │ ctx 87% │ 🔥 ctx 87% — split now or lose context
Hints are prioritized — only the highest-priority action shows:
| Priority | Trigger | Hint |
|---|---|---|
| P1 | Rate limit < 3 requests left | ⚠ N req left — pause or hit rate limit |
| P2 | Context > 85% | 🔥 ctx X% — run: tokenxray --checkpoint · new session |
| P3 | Context > 60% | ⚠ ctx X% — new session soon · saves ~60% tokens |
| P4 | Opus + cost > $3 | → /model sonnet — same task, 5x cheaper |
| P5 | 80+ turns + cost > $2 | → run: tokenxray --checkpoint · new session · say: read checkpoint.md.loaded |
| P6 | 30+ turns, >$0.10/turn | Trajectory alert — at this rate, session will hit $X |
Disable hints (keep metrics): set "statusline_hints": false in ~/.tokenxray/config.json.
After --install-hook, every Claude Code session gets three hooks that run silently in the background:
- Cost hook — tracks your running cost after every tool use. Shows a status line every 10 turns. Alerts when you cross $1/$3/$5/$10/$25/$50. At 60 turns or $5, auto-saves your session state to
.claude/checkpoint.md. If you're on Opus, nudges you once to consider switching to Sonnet (5x cheaper for most tasks). At 30+ turns with high cost velocity (>$0.10/turn), fires a trajectory alert: "At this rate, session will hit $X — consider a split soon." - Resume hook — when you start a new session, shows the last 3 sessions for the current project (date, turns, cost, model) from
~/.tokenxray/history.jsonl, detects any pending checkpoint, and prints where to pick up. Claude does not read the checkpoint automatically — tell it to: "read .claude/checkpoint.md.loaded". One-shot: fires once per session, then gets out of the way. - Subagent hook — fires before each
Agenttool call. Full warning on first call per session, brief reminder every 5 calls. Agents spawn new context windows at full cost — this nudges you to consider whether delegation is worth it.
Your daily workflow:
Open Claude Code → status line shows live metrics at bottom
↓
Checkpoint detected? Stats shown in conversation
↓
Tell Claude: "read .claude/checkpoint.md.loaded" → context restored
↓
Work normally → cost hook + status line track silently
↓
Hit 60 turns or $5 → checkpoint auto-saved
↓
Status line hint fires? Take the action or keep going
↓
Start fresh → next session picks up where you left off
You never run tokenxray during a session. The hooks and status line handle it.
[TokenXRay] Opus — turn 40, $12.50 total, ~$0.31/turn, ctx 85K
[TokenXRay] You're on Opus ($15/MTok input) — Sonnet costs 5x less and handles most coding tasks well.
[TokenXRay] Consider switching: /model claude-sonnet-4-6
[TokenXRay] Consider splitting this session! (60 turns, $5.20, ctx 90K)
[TokenXRay] Auto-checkpoint saved to .claude/checkpoint.md
Run these when you want to understand your spending patterns and change habits:
tokenxray # Overview — where your money goes
tokenxray --diagnose # Specific recommendations
tokenxray --doctor # Health check — binary, hooks, DB, pricing freshness
tokenxray --session <id> # Deep dive into one session
tokenxray --dashboard # Interactive HTML charts
tokenxray --projects # Cost by project
tokenxray --mcp # MCP tool audit — find dead-weight servers
tokenxray --memory-impact # crossmem hit-rate — which saved memories did the agent actually use?
tokenxray --rules # Generate a CLAUDE.md from your behavioral patternsTokenXRay - Session Overview
----------------------------------------------------------------------
686 sessions 53,952 total turns $14,044 total cost
Segment Breakdown:
1-10 turns: 388 sessions avg $0.19 total $74 ░░░░░░░░░░ <1%
11-30: 110 sessions avg $3.18 total $350 ░░░░░░░░░░ 2%
31-100: 117 sessions avg $9.08 total $1,062 █░░░░░░░░░ 8%
100+: 71 sessions avg $177 total $12,560 ████████░░ 89%
The retrospective analysis is the most valuable part. After a few --diagnose runs, you start naturally scoping sessions better — "I'll do the refactor, then start fresh for tests." That's where the real savings come from.
Customize hook thresholds in ~/.tokenxray/config.json:
{
"split_turns": 60,
"split_cost": 5,
"alert_thresholds": [1, 3, 5, 10, 25, 50],
"status_interval": 10,
"debug_log": false,
"hard_stop": false,
"hard_stop_turns": 120,
"hard_stop_cost": 50,
"opus_nudge": true,
"opus_nudge_turn": 20,
"opus_nudge_cost": 5.0,
"subagent_warn": true,
"subagent_warn_interval": 5,
"statusline_hints": true
}By default, hooks only print user-facing messages in conversation. If you need deeper diagnostics, enable internal hook logging:
{"debug_log": true}This writes hook events to ~/.tokenxray/debug.log (cost/resume/subagent hooks). Useful when a warning seems missing and you want to verify whether the hook fired, skipped, or hit an exception-safe path.
tail -f ~/.tokenxray/debug.logSet "debug_log": false when done.
| Tool | Source | Notes |
|---|---|---|
| Claude Code | ~/.claude/projects/**/*.jsonl |
Full token breakdown: input, output, cache read, cache create |
| Gemini CLI | ~/.gemini/tmp/*/chats/session-*.json |
Input, output, cached, thinking tokens |
| GitHub Copilot | VS Code workspace storage | Estimated from message lengths (token events are ephemeral) |
tokenxray --source claude # Claude only
tokenxray --source gemini # Gemini only
tokenxray --source copilot # Copilot only
tokenxray --source all # Everything (default)Your data stays local. TokenXRay reads files on your machine. Nothing is sent anywhere.
| Flag | Description |
|---|---|
--top N |
Show top N sessions (default: 15) |
--path <dir> |
Custom path to session logs directory |
--no-color |
Disable colored output |
--baseline / --compare |
Save baseline, compare after changing habits |
--export csv |
Export sessions to CSV |
--checkpoint |
Manually extract session state |
--doctor |
Health check — hook install, DB, binary, pricing freshness |
--mcp |
MCP tool audit — dead-weight servers, unused tools, schema cost estimate |
--mcp --enumerate-tools |
Spawn each configured MCP server and get exact tool counts |
--memory-impact |
crossmem hit-rate analysis — which injected memories the agent actually cited |
--rules |
Behavioral fingerprinting — generate a CLAUDE.md from your session patterns |
--rules-dry-run |
Preview generated CLAUDE.md without writing it |
--crossmem-impact |
Estimate token savings attributable to crossmem context injection |
--install-hook --confirm |
(Re-)install hooks; detects stale versions and prompts upgrade |
Every MCP server you configure globally loads its full tool schema into Claude's context on every session start — roughly 185 tokens per tool. At 84 tools across a few servers, that's ~15K tokens per session, silently added even when you never call a single MCP tool.
tokenxray --mcp # Audit from call history
tokenxray --mcp --enumerate-tools # Also spawn servers to get exact tool countsOutput shows per-server call rates, never-called tools, and a dead-weight estimate: sessions that loaded schemas but called zero MCP tools. The fix is usually moving from global ~/.claude.json config to project-level .mcp.json so servers only load where you actually use them.
Read the detailed analysis: I Spent $104 in a Single AI Coding Session. Then I Found Where $8,000 Disappeared.
MIT

