ADE-Bench (Analytics and Data Engineering Benchmark) is a benchmarking framework for evaluating AI agents on data analyst tasks. It's modeled after terminal-bench but specialized for dbt and SQL workflows.
- Containers: Include dbt + database (DuckDB/SQLite/PostgreSQL) instead of general dev environments
- Tests: SQL queries validate results instead of pytest
- Tasks: Focus on data transformations, aggregations, and analytics
harness.py: Main orchestrator for running benchmarkstrial_handler.py: Manages individual task executionsql_parser.py: Validates task results using SQL queriesdocker_compose_manager.py: Handles dbt/database containers
ade-bench/
├── ade_bench/ # Core Python package
├── tasks/ # Individual task definitions
├── docker/base/ # Base Docker images
├── shared/defaults/ # Default configurations
├── experiments/ # Benchmark results
└── datasets/ # Dataset configurations
Each task contains:
task.yaml: Metadata and configurationdbt_project/: dbt project filestests/: SQL validation queriesexpected/: Expected query resultssolution.sh: Reference solutionDockerfile: Container setup (optional, uses defaults)
uv run wizard# With sage agent
uv run scripts_python/run_harness.py --agent sage --task-ids task1 task2--agent: Agent type (sage, claude, codex, gemini, etc.)--model: LLM model for AI agents--dataset-config: YAML file defining task collection--n-concurrent-trials: Parallel execution (default: 4)--no-rebuild: Skip Docker rebuilds--cleanup: Remove Docker resources after run--plugin-set: Plugin set names fromexperiment_sets/plugin-sets.yaml(space-separated). Controls skills, MCP servers, and allowed tools. Defaults tonone(no plugins).
-
Creating Tasks:
- Use
uv run wizardor manually create structure - Add seed data to
data/ - Write dbt models in
dbt_project/models/ - Create SQL tests in
tests/ - Define expected results in
expected/
- Use
-
Testing:
- Run with sage agent first to validate task
- Check logs in
experiments/[run_id]/ - Verify SQL test results
-
Adding Agents:
- Extend
BaseAgentclass - Register in
NamedAgentFactory - Implement
run()method
- Extend
This project supports git worktrees for parallel development:
git worktree add .worktrees/feature-name -b feature/feature-nameDatabase files: Large database files (.duckdb, .parquet) are gitignored and only exist in the main repository at shared/databases/. The code automatically resolves these from the main repo even when running in a worktree.
Other shared resources: Config files (shared/config/), scripts (shared/scripts/), migrations (shared/migrations/), and projects (shared/projects/) are worktree-aware and can be modified per worktree.
Base images provided:
Dockerfile.duckdb-dbt: For DuckDB-based tasksDockerfile.snowflake-dbt: For Snowflake-based tasksDockerfile.snowflake-dbtf: For Snowflake-based tasks
Default docker-compose files handle:
- Container networking
- Health checks (for PostgreSQL)
- Volume mounting for dbt projects and data
Database support:
- DuckDB: File-based, great for analytical workloads
- SQLite: File-based, lightweight and simple
- PostgreSQL: Server-based, full-featured RDBMS
- Snowflake: Cloud-based data warehouse with dbt integration
Tasks can now use shared databases to avoid duplicating large datasets:
shared/databases/
├── duckdb/ # DuckDB database files
├── sqlite/ # SQLite database files
├── postgres/ # PostgreSQL initialization scripts
├── snowflake/ # Snowflake initialization scripts
└── catalog.yaml # Database metadata
In task.yaml:
database:
source: shared # Use shared database
name: shopify # Database name (without extension)
type: duckdb # Database typeFor local databases (default behavior):
database:
source: local # Or omit database config entirely
path: data/ # Local data directoryNote: Shared databases are always copied into containers to prevent corruption. The original shared database files are never modified by tasks.
from ade_bench.database import DatabasePoolManager
# Initialize manager
pool = DatabasePoolManager()
# Register a new database
pool.register_database(
db_path=Path("my_data.duckdb"),
description="E-commerce dataset",
tables=["orders", "products", "customers"]
)
# List available databases
for db in pool.list_databases():
print(f"{db.name} ({db.type.value}): {db.description}")-
Create example tasks demonstrating:
- Basic aggregations
- Window functions
- CTEs and complex transformations
- dbt tests and documentation
-
Implement AI agents:
- Port terminus agent from terminal-bench
- Add support for Claude, GPT-4, etc.
-
Enhance testing:
- Support for comparing DataFrames
- Tolerance for floating-point comparisons
- Performance benchmarks
-
Add features:
- S3 upload for results
- Database storage for tracking
- Visualization dashboard