dogbench

One command to measure Collar token savings.

dogbench

Output:

╔══════════════════════════════════════════╗
║        dogbench — token benchmark        ║
╚══════════════════════════════════════════╝

  ┌──────────────────────┬──────────┬───────────┬──────────┐
  │ Agent                │  Input   │  Output   │   Cost   │
  ├──────────────────────┼──────────┼───────────┼──────────┤
  │ collar (DAG-first)   │    1,234 │       567 │ $0.0072  │
  │ Claude Code          │    3,456 │     1,200 │ $0.0284  │
  │ OpenAI Codex         │    2,890 │       980 │ $0.0221  │
  └──────────────────────┴──────────┴───────────┴──────────┘

  ✓ collar saved 2,855 tokens (55%) vs Claude Code
    That's $0.0212 saved on this task alone.

Install

# Clone (or pull if already cloned)
git clone https://github.com/specdog/dogbench.git 2>/dev/null || (cd dogbench && git pull origin feat/init)
cd dogbench

# Install (use collar's venv if you have it)
pip install -e . 2>/dev/null || ~/collar/.venv/bin/pip install -e .

# Dotdog (skip if already installed)
which dotdog || npm install -g dotdog

No conflicts. Works on fresh installs and existing setups.

Requires: collar, and optionally Claude Code / Codex CLI for comparison.

One-Liner

cd ~/dogbench && git pull origin main && ~/collar/.venv/bin/pip install -e . && ./dogbench --all --json

Usage

Setup	Command
Collar only	`./dogbench --json`
Collar + dotdog	`./dogbench --dag --json`
Payload pack/unpack	`./dogbench --bench pack --json`
Token savings delta	`./dogbench --bench savings --json`

Comparison Scenarios

# Same harness, different engines — proves DAG savings are architectural
dogbench --bench savings --model gpt-5
dogbench --bench savings --model claude-sonnet-4

# Same engine, different harnesses — raw collar vs compact Collar/DAG
dogbench --compare claude codex --model claude-sonnet-4

# Collar + DeepSeek vs Collar + Codex (same harness, different backends)
collar chat -q "your task" --model deepseek-v4-pro
mv ~/.dag/sessions/ ~/.dag/sessions_backup
collar chat -q "your task" --model gpt-5
# Compare the two session directories

Output Formats

Flag	Format	Use
(none)	Text table	Human readable
`--json`	JSON	CI/CD, scripts
`--dag`	.dag file	Collar native reading

JSON output includes per-agent token counts, wall time, and cost in USD.

How It Works

Detects which agents are installed
Runs the same task through each
Reads official usage logs (CodexBar, collar sessions, official CLIs)
Outputs a clean comparison table with savings

No estimates. No bias. Your own machine, your own tokens.

Add Your Own Agent

Edit agents.yaml:

agents:
  my-agent:
    name: "My Agent"
    command: 'my-agent-cli "{prompt}"'
    log_type: jsonl
    log_path: "~/path/to/usage/*.jsonl"
    log_key: "usage.total_tokens"

PRs welcome. Any CLI tool that accepts a prompt and writes usage logs can be benchmarked.

Share Your Results

# Run benchmark
./dogbench --json > results.json

# Submit via PR (preferred)
git checkout -b results/yourname
git add results.json
git commit -m "results: add benchmark data"
git push origin results/yourname
# Open a PR at https://github.com/specdog/dogbench

# Or post to an issue
open https://github.com/specdog/dogbench/issues/new

Verify

# Quick test (collar only, no comparison)
./dogbench "hi"
# Full test (all installed agents)
./dogbench --all --json

Name		Name	Last commit message	Last commit date
Latest commit History 159 Commits
.github/workflows		.github/workflows
dogbench.egg-info		dogbench.egg-info
dogbench_pkg		dogbench_pkg
node_modules		node_modules
specs/dogbench		specs/dogbench
tests		tests
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
VERSION		VERSION
agents.yaml		agents.yaml
benchmark.md		benchmark.md
dogbench		dogbench
dogbench.dag		dogbench.dag
dogbench.dog		dogbench.dog
dogbench.py		dogbench.py
go.mod		go.mod
go.sum		go.sum
index.html		index.html
main.go		main.go
package-lock.json		package-lock.json
package.json		package.json
parse.py		parse.py
results.dag		results.dag
results.json		results.json
run.sh		run.sh
sales.py		sales.py
sample.csv		sample.csv
setup.py		setup.py
telemetry.py		telemetry.py
test_sales.py		test_sales.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

dogbench

Install

One-Liner

Usage

Comparison Scenarios

Output Formats

How It Works

Add Your Own Agent

Share Your Results

Verify

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

dogbench

Install

One-Liner

Usage

Comparison Scenarios

Output Formats

How It Works

Add Your Own Agent

Share Your Results

Verify

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages