Real token benchmarks for AI coding agents. No estimates. No bias.
Measures token consumption by running identical tasks and reading official client logs via CodexBar.
brew install --cask codexbar
pip install collar
bash run.sh "Write a Python script that reads CSV sales data and outputs a grouped report"
python3 parse.py| Agent | Measurement |
|---|---|
| collar (DAG-first) | dag insights |
| Claude Code | Official usage logs |
| OpenAI Codex | Official usage logs |
Collar saves 7,655 tokens (35%) vs Claude Code on identical task.