One command to measure Collar token savings.
dogbenchOutput:
╔══════════════════════════════════════════╗
║ dogbench — token benchmark ║
╚══════════════════════════════════════════╝
┌──────────────────────┬──────────┬───────────┬──────────┐
│ Agent │ Input │ Output │ Cost │
├──────────────────────┼──────────┼───────────┼──────────┤
│ collar (DAG-first) │ 1,234 │ 567 │ $0.0072 │
│ Claude Code │ 3,456 │ 1,200 │ $0.0284 │
│ OpenAI Codex │ 2,890 │ 980 │ $0.0221 │
└──────────────────────┴──────────┴───────────┴──────────┘
✓ collar saved 2,855 tokens (55%) vs Claude Code
That's $0.0212 saved on this task alone.
# Clone (or pull if already cloned)
git clone https://github.com/specdog/dogbench.git 2>/dev/null || (cd dogbench && git pull origin feat/init)
cd dogbench
# Install (use collar's venv if you have it)
pip install -e . 2>/dev/null || ~/collar/.venv/bin/pip install -e .
# Dotdog (skip if already installed)
which dotdog || npm install -g dotdogNo conflicts. Works on fresh installs and existing setups.
Requires: collar, and optionally Claude Code / Codex CLI for comparison.
cd ~/dogbench && git pull origin main && ~/collar/.venv/bin/pip install -e . && ./dogbench --all --json| Setup | Command |
|---|---|
| Collar only | ./dogbench --json |
| Collar + dotdog | ./dogbench --dag --json |
| Payload pack/unpack | ./dogbench --bench pack --json |
| Token savings delta | ./dogbench --bench savings --json |
# Same harness, different engines — proves DAG savings are architectural
dogbench --bench savings --model gpt-5
dogbench --bench savings --model claude-sonnet-4
# Same engine, different harnesses — raw collar vs compact Collar/DAG
dogbench --compare claude codex --model claude-sonnet-4
# Collar + DeepSeek vs Collar + Codex (same harness, different backends)
collar chat -q "your task" --model deepseek-v4-pro
mv ~/.dag/sessions/ ~/.dag/sessions_backup
collar chat -q "your task" --model gpt-5
# Compare the two session directories| Flag | Format | Use |
|---|---|---|
| (none) | Text table | Human readable |
--json |
JSON | CI/CD, scripts |
--dag |
.dag file | Collar native reading |
JSON output includes per-agent token counts, wall time, and cost in USD.
- Detects which agents are installed
- Runs the same task through each
- Reads official usage logs (CodexBar, collar sessions, official CLIs)
- Outputs a clean comparison table with savings
No estimates. No bias. Your own machine, your own tokens.
Edit agents.yaml:
agents:
my-agent:
name: "My Agent"
command: 'my-agent-cli "{prompt}"'
log_type: jsonl
log_path: "~/path/to/usage/*.jsonl"
log_key: "usage.total_tokens"PRs welcome. Any CLI tool that accepts a prompt and writes usage logs can be benchmarked.
# Run benchmark
./dogbench --json > results.json
# Submit via PR (preferred)
git checkout -b results/yourname
git add results.json
git commit -m "results: add benchmark data"
git push origin results/yourname
# Open a PR at https://github.com/specdog/dogbench
# Or post to an issue
open https://github.com/specdog/dogbench/issues/new# Quick test (collar only, no comparison)
./dogbench "hi"
# Full test (all installed agents)
./dogbench --all --json