Benchmark Zoo

Benchmark Zoo is the definitive collection on every benchmark framework out there. We implemented the same simple dummy benchmark in 42 different perf test frameworks, load testers, even unit testing frameworks.

The outputs of each benchmarking tool are stored in tests/data.

Finally, we used those outputs to generate a parser to go with each of the benchmark frameworks. Therefore if you need to extract benchmark results from a log file or from a separate artifact file, then the goal is that this repository should always have the parser you need.

And you don't even need to tell benchzoo which parser to use. Hand benchzoo.sniff(content) any benchmark output — all 42 supported frameworks are autodetected from content alone, via a four-tier signature matcher (JSON top-level keys → XML root element → CSV header row → distinctive text substring):

asv, benchmark-ips, benchmark-js, benchmarkdotnet, benchmarktools-jl, cargo-bench, catch2, clickbench, custom-csv, custom-json, dotnet-test, gatling, go-test-bench, google-benchmark, hey, hyperfine, jmeter, jmh, junit-go, junit-standard, k6, lighthouse, locust, memtier, mitata, mocha, perf-stat, pgbench, phpbench, playwright, pytest-benchmark, redis-benchmark, sysbench, time, tinybench, vegeta, vitest-bench, wrk, wrk2.

junit-standard covers jest-junit, Maven Surefire / vanilla Java JUnit, CTest and Catch2's junit reporter — all four emit structurally indistinguishable <testsuite> XML, and a single shared parser reads testcase name + time verbatim. gotestsum's junit XML has the same shape but is distinguished by its <property name="go.version" ...> fingerprint, which routes to junit-go's Test-prefix-stripping parser.

Hard invariant: sniff never returns a wrong framework. When content is genuinely ambiguous it returns None and you fall through to explicit selection or the LLM parsers below.

As a catch-all, there are also two LLM-backed parsers that extract benchmark results from arbitrary input by prompting an LLM with benchzoo's output schema plus a few worked examples from the corpus:

llm_anthropic — calls the Anthropic API directly. Highest accuracy; requires ANTHROPIC_API_KEY and pip install anthropic.
llm_local — calls a local Ollama instance. Default model is qwen2.5-coder:3b; also works with other small coder- or instruction-tuned models (Llama 3.2, Phi-3.5, Gemma 2). Offline; deterministic with temperature=0.

Both are experimental and non-deterministic by nature. Not appropriate for production change-detection pipelines; great for triage, one-off proprietary formats, or bootstrapping a new deterministic parser.

Status

42 frameworks end-to-end (sample benchmark + workflow + real CI-captured fixture + parser + ground-truth tests); 370+ passing parser tests. 32 frameworks parse plain stdout / CI-log output (the *_text parsers) — so results printed to a job log with no uploaded artifact still ingest. The library is pip install -e .-able; the corpus runs on every push.

Pre-1.0: parser API may still move; not yet on PyPI.

Install and use

pip install -e .

import benchzoo

content = open("output.json").read()

# 1. Explicit — best when you have out-of-band info (e.g. a CI
#    artifact name, a config setting) that tells you which
#    framework produced `content`.
results = benchzoo.find_parser("hyperfine", "json").parse(content)

# 2. Sniff-based — best-effort content detection. Returns the
#    framework name (string) or None. Never returns a wrong name:
#    when the input is ambiguous, it's None and the caller decides
#    what to do next.
framework = benchzoo.sniff(content)      # e.g. "hyperfine" or None
if framework:
    results = benchzoo.find_parser(framework).parse(content)
else:
    # 3. Catch-all — hand off to the LLM fallback parsers for
    #    formats benchzoo doesn't have a dedicated parser for.
    from benchzoo.parsers import llm_local
    results = llm_local.parse(content)

# Browse the registry directly if you're wiring up your own
# dispatcher.
for framework, formats in benchzoo.PARSERS.items():
    print(framework, list(formats))

Every parser has the same contract: parse(content: bytes | str) -> list[dict], returning benchzoo's common output schema described in docs/design.md.

Running the tests

pip install -e ".[dev]"
pytest

LLM parser tests skip by default; set BENCHZOO_RUN_LLM_ANTHROPIC=1 (+ ANTHROPIC_API_KEY) or BENCHZOO_RUN_LLM_LOCAL=1 to enable them.

Supported frameworks

Dedicated benchmark libraries

Language	Framework	Parser(s)
Rust	criterion	`criterion_estimates`, `criterion_bencher`
Rust	cargo bench (libtest)	`cargo_bench_libtest` (shares `criterion_bencher`)
C++	Google Benchmark	`google_benchmark_json`, `google_benchmark_csv`
C++	Catch2	`catch2_xml`, `junit_standard`, `catch2_text`
Java/JVM	JMH	`jmh_json`, `jmh_csv`, `jmh_text`
C# / .NET	BenchmarkDotNet	`benchmarkdotnet_json`, `benchmarkdotnet_csv`, `benchmarkdotnet_text`
Go	`go test -bench`	`go_bench_text`, `go_bench_json`
Python	pytest-benchmark	`pytest_benchmark_json`, `junit_pytest`, `pytest_benchmark_text`
Python	asv (airspeed velocity)	`asv`, `asv_text`
JS	benchmark.js	`benchmark_js`
JS	tinybench	`tinybench`
JS	mitata	`mitata`, `mitata_text`
JS/TS	vitest bench	`vitest_bench`, `vitest_bench_text`
Julia	BenchmarkTools.jl	`benchmarktools_jl`, `benchmarktools_jl_text`
PHP	PHPBench	`phpbench_xml`, `phpbench_text`
Ruby	benchmark-ips	`benchmark_ips`, `benchmark_ips_text`

Load / HTTP testing

Framework	Parser(s)
k6	`k6_summary`, `k6_ndjson`
wrk	`wrk`
wrk2	`wrk2`
hey	`hey`
vegeta	`vegeta_json`
Locust	`locust_csv`, `locust_text`
JMeter	`jmeter_csv`, `jmeter_text`
Gatling	`gatling_log`

Databases

Framework	Parser(s)
pgbench	`pgbench`
sysbench	`sysbench`
redis-benchmark	`redis_benchmark_csv`, `redis_benchmark_text`
memtier_benchmark	`memtier_json`, `memtier_text`
ClickBench	`clickbench`

Frontend

Framework	Parser(s)
Lighthouse	`lighthouse`

Unit-test runners (as timing source)

Framework	Parser(s)
mocha (--reporter json)	`mocha_json`
Jest (jest-junit)	`junit_standard`, `junit_jest_text`
go test (gotestsum)	`junit_go`, `junit_go_text`
JUnit 5 / Maven Surefire	`junit_standard`
dotnet test (TRX)	`dotnet_test_trx`
CTest (--output-junit)	`junit_standard`, `ctest_text`
Playwright	`playwright_json`

Generic / escape hatches

Framework	Parser(s)
hyperfine	`hyperfine_json`, `hyperfine_csv`, `hyperfine_text`
Unix time (bash + GNU)	`time_builtin`, `time_gnu`
perf stat	`perf_stat_text`
custom JSON	`custom_bigger_is_better`, `custom_smaller_is_better`
custom CSV	`custom_csv`

Optional: LLM fallback parsers

benchzoo.parsers.llm_anthropic and benchzoo.parsers.llm_local (see intro above) cover formats that don't match any of the built-in parsers. Their evaluation harness lives at tests/parsers/test_llm.py — set BENCHZOO_RUN_LLM_ANTHROPIC=1 (+ ANTHROPIC_API_KEY) or BENCHZOO_RUN_LLM_LOCAL=1 to run it against the real fixture corpus.

Adding a framework

The process is documented in docs/workflow-conventions.md and the Definition of Done enumerates the six surfaces every framework must ship.

The short version: implement the canonical sample benchmark (four fixed tests, the simplest of which sleeps for exactly 2.15 seconds), put it under frameworks/<category>/<name>/, add a .github/workflows/<name>.yml that captures the output as an artifact, write the parser against the real captured fixture (grep for 2.15 in the output — that's test 1's wall time), and add ground-truth tests.

To iterate on a parser without round-tripping through GitHub CI, run the workflow locally via nektos/act. See docs/act.md for the install, image selection, and full invocation recipe.

Name		Name	Last commit message	Last commit date
Latest commit History 178 Commits
.github/workflows		.github/workflows
docs		docs
frameworks		frameworks
src/benchzoo		src/benchzoo
tests		tests
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
PLAN.md		PLAN.md
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Benchmark Zoo

Status

Install and use

Running the tests

Supported frameworks

Dedicated benchmark libraries

Load / HTTP testing

Databases

Frontend

Unit-test runners (as timing source)

Generic / escape hatches

Optional: LLM fallback parsers

Adding a framework

More Documentation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Benchmark Zoo

Status

Install and use

Running the tests

Supported frameworks

Dedicated benchmark libraries

Load / HTTP testing

Databases

Frontend

Unit-test runners (as timing source)

Generic / escape hatches

Optional: LLM fallback parsers

Adding a framework

More Documentation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages