Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
44 commits
Select commit Hold shift + click to select a range
2696a49
use inspect-ai to evaluate aime25 and gsm8k
NathanHB Oct 7, 2025
578d530
revert file
NathanHB Oct 7, 2025
21fa870
working for 3 tasks
NathanHB Oct 7, 2025
27b2af1
parallel evals of tasks
NathanHB Oct 7, 2025
b9a610d
adds gpqa diamond to inspect
NathanHB Oct 8, 2025
25c1128
move tasks to individual files
NathanHB Oct 13, 2025
0d42edf
move tasks to individual files
NathanHB Oct 13, 2025
6cc3c04
enable extended tasks as well
NathanHB Oct 13, 2025
4c38951
run precomit hook
NathanHB Oct 13, 2025
d2fd5e1
fix mkqa
NathanHB Oct 13, 2025
2ddb0f9
chaange extended suite to lighteval
NathanHB Oct 13, 2025
ee97122
chaange extended suite to lighteval
NathanHB Oct 14, 2025
e2c8e22
add metdata to tasks
NathanHB Oct 14, 2025
c980ddb
add metdata to tasks
NathanHB Oct 14, 2025
57fe390
remove license notice and put docstring on top of file
NathanHB Oct 14, 2025
ee081f2
homogenize tags
NathanHB Oct 14, 2025
1ed1602
add docstring for all multilingual tasks
NathanHB Oct 14, 2025
f4b0e27
add docstring for all multilingual tasks
NathanHB Oct 14, 2025
81d9e4e
add name and dataset to metadata
NathanHB Oct 15, 2025
b734532
use TASKS_TABLE for multilingual tasks
NathanHB Oct 15, 2025
c3911fc
use TASKS_TABLE for default tasks
NathanHB Oct 15, 2025
e439f70
use TASKS_TABLE for default tasks
NathanHB Oct 15, 2025
6447ee7
loads all tasks correclty
NathanHB Oct 15, 2025
88754bf
move community tasks to default tasks and update doc
NathanHB Oct 16, 2025
5445f5c
move community tasks to default tasks and update doc
NathanHB Oct 16, 2025
f53bd76
Merge remote-tracking branch 'origin/main' into nathan-reorg-tasks
NathanHB Oct 16, 2025
6a0c615
revert uneeded changes
NathanHB Oct 16, 2025
1435e38
fix doc build
NathanHB Oct 16, 2025
15f41f2
fix doc build
NathanHB Oct 16, 2025
74e5c0f
remove custom tasks and let user decide if loading multilingual tasks
NathanHB Oct 16, 2025
aad136c
load-tasks multilingual fix
NathanHB Oct 16, 2025
242bc43
update doc
NathanHB Oct 16, 2025
6806bf8
remove uneeded file
NathanHB Oct 16, 2025
e94fa59
update readme
NathanHB Oct 16, 2025
8800d1a
update readme
NathanHB Oct 16, 2025
970f33b
update readme
NathanHB Oct 16, 2025
b8c26dc
fix test
NathanHB Oct 16, 2025
764de72
add back the custom tasks
NathanHB Oct 17, 2025
a326ea8
add back the custom tasks
NathanHB Oct 17, 2025
81081cd
fix tasks
NathanHB Oct 17, 2025
74b40f6
fix tasks
NathanHB Oct 17, 2025
083fb1b
fix tasks
NathanHB Oct 17, 2025
2dab2bf
fix tests
NathanHB Oct 17, 2025
57ca0e5
fix tests
NathanHB Oct 17, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
10 changes: 8 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,9 @@
<a href="https://huggingface.co/docs/lighteval/main/en/index" target="_blank">
<img alt="Documentation" src="https://img.shields.io/badge/Documentation-4F4F4F?style=for-the-badge&logo=readthedocs&logoColor=white" />
</a>
<a href="https://huggingface.co/spaces/SaylorTwift/benchmark_finder" target="_blank">
<img alt="Open Benchmark Index" src="https://img.shields.io/badge/Open%20Benchmark%20Index-4F4F4F?style=for-the-badge&logo=huggingface&logoColor=white" />
</a>
</p>

---
Expand All @@ -39,7 +42,10 @@ sample-by-sample results* to debug and see how your models stack-up.

## Available Tasks

Lighteval supports **7,000+ evaluation tasks** across multiple domains and languages. Here's an overview of some *popular benchmarks*:
Lighteval supports **1000+ evaluation tasks** across multiple domains and
languages. Use [this
space](https://huggingface.co/spaces/SaylorTwift/benchmark_finder) to find what
you need, or, here's an overview of some *popular benchmarks*:


### 📚 **Knowledge**
Expand All @@ -62,7 +68,7 @@ Lighteval supports **7,000+ evaluation tasks** across multiple domains and langu

### 🌍 **Multilingual Evaluation**
- **Cross-lingual**: XTREME, Flores200 (200 languages), XCOPA, XQuAD
- **Language-specific**:
- **Language-specific**:
- **Arabic**: ArabicMMLU
- **Filipino**: FilBench
- **French**: IFEval-fr, GPQA-fr, BAC-fr
Expand Down
114 changes: 0 additions & 114 deletions community_tasks/_template.py

This file was deleted.

61 changes: 0 additions & 61 deletions community_tasks/aimo_evals.py

This file was deleted.

87 changes: 0 additions & 87 deletions community_tasks/oz_evals.py

This file was deleted.

2 changes: 0 additions & 2 deletions community_tasks/slr_bench_requirements.txt

This file was deleted.

38 changes: 9 additions & 29 deletions docs/source/adding-a-custom-task.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -2,37 +2,17 @@

Lighteval provides a flexible framework for creating custom evaluation tasks. This guide explains how to create and integrate new tasks into the evaluation system.

## Task Categories

Before creating a custom task, consider which category it belongs to:

### Core Evaluations
Core evaluations are evaluations that only require standard logic in their
metrics and processing, and that we will add to our test suite to ensure non-regression through time. They already see high usage in the community.

### Extended Evaluations
Extended evaluations are evaluations that require custom logic in their
metrics (complex normalization, an LLM as a judge, etc.), that we added to
facilitate the life of users. They already see high usage in the community.

### Community Evaluations
Community evaluations are submissions by the community of new tasks.

A popular community evaluation can move to become an extended or core evaluation over time.

> [!TIP]
> You can find examples of custom tasks in the [community_tasks](https://github.com/huggingface/lighteval/tree/main/community_tasks) directory.

## Step-by-Step Creation of a Custom Task
## Step-by-Step Creation of a Task

> [!WARNING]
> To contribute your custom task to the Lighteval repository, you would first need
> To contribute your task to the Lighteval repository, you would first need
> to install the required dev dependencies by running `pip install -e .[dev]`
> and then run `pre-commit install` to install the pre-commit hooks.

### Step 1: Create the Task File

First, create a Python file under the `community_tasks` directory.
First, create a Python file or directory under the `src/lighteval/tasks/tasks` directory.
A directory is helpfull if you need to split your file into multiple ones, just make sure to have one of the file named `main.py`.

### Step 2: Define the Prompt Function

Expand Down Expand Up @@ -135,12 +115,12 @@ class CustomSubsetTask(LightevalTaskConfig):
evaluation_splits=["test"],
few_shots_split="train",
few_shots_select="random_sampling_from_train",
suite=["community"],
suite=["lighteval"],
generation_size=256,
stop_sequence=["\n", "Question:"],
)

SUBSET_TASKS = [CustomSubsetTask(name=f"mytask:{subset}", hf_subset=subset) for subset in SAMPLE_SUBSETS]
SUBSET_TASKS = [CustomSubsetTask(name=f"task:{subset}", hf_subset=subset) for subset in SAMPLE_SUBSETS]
```

### Step 5: Add Tasks to the Table
Expand Down Expand Up @@ -169,7 +149,7 @@ Once your file is created, you can run the evaluation with the following command
```bash
lighteval accelerate \
"model_name=HuggingFaceH4/zephyr-7b-beta" \
"community|{custom_task}|{fewshots}" \
"lighteval|{task}|{fewshots}" \
--custom-tasks {path_to_your_custom_task_file}
```

Expand All @@ -179,12 +159,12 @@ lighteval accelerate \
# Run a custom task with zero-shot evaluation
lighteval accelerate \
"model_name=openai-community/gpt2" \
"community|myothertask|0" \
"lighteval|myothertask|0" \
--custom-tasks community_tasks/my_custom_task.py

# Run a custom task with few-shot evaluation
lighteval accelerate \
"model_name=openai-community/gpt2" \
"community|myothertask|3" \
"lighteval|myothertask|3" \
--custom-tasks community_tasks/my_custom_task.py
```
Loading
Loading