huggingface · NathanHB · Oct 29, 2025 · Oct 7, 2025 · Oct 7, 2025 · Oct 7, 2025
diff --git a/README.md b/README.md
@@ -25,6 +25,9 @@
   <a href="https://huggingface.co/docs/lighteval/main/en/index" target="_blank">
     <img alt="Documentation" src="https://img.shields.io/badge/Documentation-4F4F4F?style=for-the-badge&logo=readthedocs&logoColor=white" />
   </a>
+  <a href="https://huggingface.co/spaces/SaylorTwift/benchmark_finder" target="_blank">
+    <img alt="Open Benchmark Index" src="https://img.shields.io/badge/Open%20Benchmark%20Index-4F4F4F?style=for-the-badge&logo=huggingface&logoColor=white" />
+  </a>
 </p>
 
 ---
@@ -39,7 +42,10 @@ sample-by-sample results* to debug and see how your models stack-up.
 
 ## Available Tasks
 
-Lighteval supports **7,000+ evaluation tasks** across multiple domains and languages. Here's an overview of some *popular benchmarks*:
+Lighteval supports **1000+ evaluation tasks** across multiple domains and
+languages. Use [this
+space](https://huggingface.co/spaces/SaylorTwift/benchmark_finder) to find what
+you need, or, here's an overview of some *popular benchmarks*:
 
 
 ### 📚 **Knowledge**
@@ -62,7 +68,7 @@ Lighteval supports **7,000+ evaluation tasks** across multiple domains and langu
 
 ### 🌍 **Multilingual Evaluation**
 - **Cross-lingual**: XTREME, Flores200 (200 languages), XCOPA, XQuAD
-- **Language-specific**: 
+- **Language-specific**:
   - **Arabic**: ArabicMMLU
   - **Filipino**: FilBench
   - **French**: IFEval-fr, GPQA-fr, BAC-fr

diff --git a/community_tasks/_template.py b/community_tasks/_template.py
diff --git a/community_tasks/aimo_evals.py b/community_tasks/aimo_evals.py
diff --git a/community_tasks/oz_evals.py b/community_tasks/oz_evals.py
diff --git a/community_tasks/slr_bench_requirements.txt b/community_tasks/slr_bench_requirements.txt
diff --git a/docs/source/adding-a-custom-task.mdx b/docs/source/adding-a-custom-task.mdx
@@ -2,37 +2,17 @@
 
 Lighteval provides a flexible framework for creating custom evaluation tasks. This guide explains how to create and integrate new tasks into the evaluation system.
 
-## Task Categories
-
-Before creating a custom task, consider which category it belongs to:
-
-### Core Evaluations
-Core evaluations are evaluations that only require standard logic in their
-metrics and processing, and that we will add to our test suite to ensure non-regression through time. They already see high usage in the community.
-
-### Extended Evaluations
-Extended evaluations are evaluations that require custom logic in their
-metrics (complex normalization, an LLM as a judge, etc.), that we added to
-facilitate the life of users. They already see high usage in the community.
-
-### Community Evaluations
-Community evaluations are submissions by the community of new tasks.
-
-A popular community evaluation can move to become an extended or core evaluation over time.
-
-> [!TIP]
-> You can find examples of custom tasks in the [community_tasks](https://github.com/huggingface/lighteval/tree/main/community_tasks) directory.
-
-## Step-by-Step Creation of a Custom Task
+## Step-by-Step Creation of a Task
 
 > [!WARNING]
-> To contribute your custom task to the Lighteval repository, you would first need
+> To contribute your task to the Lighteval repository, you would first need
 > to install the required dev dependencies by running `pip install -e .[dev]`
 > and then run `pre-commit install` to install the pre-commit hooks.
 
 ### Step 1: Create the Task File
 
-First, create a Python file under the `community_tasks` directory.
+First, create a Python file or directory under the `src/lighteval/tasks/tasks` directory.
+A directory is helpfull if you need to split your file into multiple ones, just make sure to have one of the file named `main.py`.
 
 ### Step 2: Define the Prompt Function
 
@@ -135,12 +115,12 @@ class CustomSubsetTask(LightevalTaskConfig):
             evaluation_splits=["test"],
             few_shots_split="train",
             few_shots_select="random_sampling_from_train",
-            suite=["community"],
+            suite=["lighteval"],
             generation_size=256,
             stop_sequence=["\n", "Question:"],
         )
 
-SUBSET_TASKS = [CustomSubsetTask(name=f"mytask:{subset}", hf_subset=subset) for subset in SAMPLE_SUBSETS]
+SUBSET_TASKS = [CustomSubsetTask(name=f"task:{subset}", hf_subset=subset) for subset in SAMPLE_SUBSETS]
 ```
 
 ### Step 5: Add Tasks to the Table
@@ -169,7 +149,7 @@ Once your file is created, you can run the evaluation with the following command
 ```bash
 lighteval accelerate \
     "model_name=HuggingFaceH4/zephyr-7b-beta" \
-    "community|{custom_task}|{fewshots}" \
+    "lighteval|{task}|{fewshots}" \
     --custom-tasks {path_to_your_custom_task_file}
 ```
 
@@ -179,12 +159,12 @@ lighteval accelerate \
 # Run a custom task with zero-shot evaluation
 lighteval accelerate \
     "model_name=openai-community/gpt2" \
-    "community|myothertask|0" \
+    "lighteval|myothertask|0" \
     --custom-tasks community_tasks/my_custom_task.py
 
 # Run a custom task with few-shot evaluation
 lighteval accelerate \
     "model_name=openai-community/gpt2" \
-    "community|myothertask|3" \
+    "lighteval|myothertask|3" \
     --custom-tasks community_tasks/my_custom_task.py
 ```