Skip to content

specdog/agent-bench

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 

Repository files navigation

agent-bench

Real token benchmarks for AI coding agents. No estimates. No bias.

Measures token consumption by running identical tasks and reading official client logs via CodexBar.

Quick Start

brew install --cask codexbar
pip install collar
bash run.sh "Write a Python script that reads CSV sales data and outputs a grouped report"
python3 parse.py

Supported Agents

Agent Measurement
collar (DAG-first) dag insights
Claude Code Official usage logs
OpenAI Codex Official usage logs

Sample Output

Collar saves 7,655 tokens (35%) vs Claude Code on identical task.

About

Real token benchmarks for AI coding agents — collar vs Claude Code vs Codex. Uses CodexBar + official client logs. No estimates.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors