🪄 Installation

A Living Benchmark for Machine Learning on Tabular Data 💫

🚀 Leaderboard	📂 Example Scripts	📊 Dataset Curation	📄 ArXiv Paper

TabArena is a living benchmarking system that makes benchmarking tabular machine learning models a reliable experience. TabArena implements best practices to ensure methods are represented at their peak potential, including cross-validated ensembles, strong hyperparameter search spaces contributed by the method authors, early stopping, model refitting, parallel bagging, memory usage estimation, and more.

TabArena currently consists of:

51 manually curated tabular datasets representing real-world tabular data tasks.
9 to 30 evaluated splits per dataset.
16 tabular machine learning methods, including 3 tabular foundation models.
25,000,000 trained models across the benchmark, with all validation and test predictions cached to enable tuning and post-hoc ensembling analysis.
A live TabArena leaderboard showcasing the results.

🕹️ Quickstart

Benchmarking and Running TabArena Models

Please refer to our example scripts for using TabArena.

Datasets

Please refer to our dataset curation repository to learn more about or contributed data!

Evaluation & Reproducing Results

To locally reproduce individual configurations and compare with the TabArena results of those configurations, refer to examples/tabarena/run_quickstart_tabarena.py.

To locally reproduce all tables and figures in the paper using the raw results data, run examples/tabarena/run_generate_paper_figures.py

To locally generate the latest results leaderboard, run examples/tabarena/run_generate_main_leaderboard.py

🪄 Installation

To install TabArena, ensure you are using Python 3.9-3.12. Then, run the following:

Evaluation (Leaderboard / Metrics)

If you don't intend to fit models, this is the simplest installation.

git clone https://github.com/autogluon/tabrepo.git
pip install -e tabrepo/

Benchmark (Fitting Models)

If you intend to fit models, this is required.

git clone https://github.com/autogluon/tabrepo.git
pip install -e tabrepo/[benchmark]

# use GIT_LFS_SKIP_SMUDGE=1 in front of the command if installing TabDPT fails due to a broken LFS/pip setup
# GIT_LFS_SKIP_SMUDGE=1 uv pip install -e tabrepo/[benchmark]

Developer Install

With this installation, you will have the latest version of AutoGluon in editable form.

git clone https://github.com/autogluon/autogluon.git
./autogluon/full_install.sh

git clone https://github.com/autogluon/tabrepo.git
pip install -e tabrepo/[benchmark]

Example Install + Run

git clone [email protected]:autogluon/tabrepo.git
cd tabrepo
pip install uv
uv init -p 3.11
uv sync
git clone https://github.com/autogluon/tabrepo.git
uv pip install -e tabrepo/[benchmark]
git clone [email protected]:TabArena/tabarena_benchmarking_examples.git
cd tabarena_benchmarking_examples/tabarena_minimal_example/custom_tabarena_model 
python run_tabarena_lite.py

Downloading and using TabArena Artifacts

Artifacts will by default be downloaded into ~/.cache/tabarena/. You can change this by specifying the environment variable TABARENA_CACHE.

The types of artifacts are:

Raw data -> The original results that are used to derive all other artifacts. Contains per-child test predictions from the bagged models, along with detailed metadata and system information absent from the processed results. Very large, often 100 GB per method type.
Processed data -> The minimal information needed for simulating HPO, portfolios, and generating the leaderboard. Often 10 GB per method type.
Results -> Pandas DataFrames of the results for each config and HPO setting on each task. Contains information such as test error, validation error, train time, and inference time. Generated from processed data. Used to generate leaderboards. Very small, often under 1 MB per method type.
Leaderboards -> Aggregated metrics comparing methods. Contains information such as ELO, win-rate, average rank, and improvability. Generated from a list of results files. Under 1 MB for all methods.
Figures & Plots -> Generated from results and leaderboards.

Raw Data

Refer to examples/tabarena/inspect_raw_data.py

Processed Data

Refer to examples/tabarena/inspect_processed_data.py

Results

Instructions TBD

📄 Publication for TabArena

If you use TabArena in a scientific publication, we would appreciate a reference to the following paper:

TabArena: A Living Benchmark for Machine Learning on Tabular Data, Nick Erickson, Lennart Purucker, Andrej Tschalzev, David Holzmüller, Prateek Mutalik Desai, David Salinas, Frank Hutter, Preprint., 2025

Link to publication: arXiv

Bibtex entry:

@article{erickson2025tabarena,
  title={TabArena: A Living Benchmark for Machine Learning on Tabular Data}, 
  author={Nick Erickson and Lennart Purucker and Andrej Tschalzev and David Holzmüller and Prateek Mutalik Desai and David Salinas and Frank Hutter},
  year={2025},
  journal={arXiv preprint arXiv:2506.16791},
  url={https://arxiv.org/abs/2506.16791}, 
}

Relation to TabRepo

TabArena was built upon and now replaces TabRepo. To see details about TabRepo, the portfolio simulation repository, refer to tabrepo.md.

Name		Name	Last commit message	Last commit date
Latest commit History 824 Commits
.github/workflows		.github/workflows
data/configs		data/configs
examples		examples
scripts		scripts
tabflow		tabflow
tabrepo		tabrepo
tst		tst
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
setup.py		setup.py
tabrepo.md		tabrepo.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

A Living Benchmark for Machine Learning on Tabular Data 💫

🕹️ Quickstart

Benchmarking and Running TabArena Models

Datasets

Evaluation & Reproducing Results

More Documentation

🪄 Installation

Evaluation (Leaderboard / Metrics)

Benchmark (Fitting Models)

Developer Install

Example Install + Run

Downloading and using TabArena Artifacts

Raw Data

Processed Data

Results

📄 Publication for TabArena

Relation to TabRepo

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 12

Uh oh!

Languages

License

autogluon/tabrepo

Folders and files

Latest commit

History

Repository files navigation

A Living Benchmark for Machine Learning on Tabular Data 💫

🕹️ Quickstart

Benchmarking and Running TabArena Models

Datasets

Evaluation & Reproducing Results

More Documentation

🪄 Installation

Evaluation (Leaderboard / Metrics)

Benchmark (Fitting Models)

Developer Install

Example Install + Run

Downloading and using TabArena Artifacts

Raw Data

Processed Data

Results

📄 Publication for TabArena

Relation to TabRepo

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 12

Uh oh!

Languages

Packages