fix: add offline dataset support via LOCAL_DATASET_PATH #807

xjli360 · 2025-06-10T07:14:13Z

This PR implements offline mode support for lighteval, allowing users to run evaluations without internet access.

By setting the environment variable:

export LOCAL_DATASET_PATH=/path/to/local/dataset

More details in: 802

xjli360 · 2025-07-07T10:14:34Z

hi, I'd appreciate it if you could review my changes. I'm happy to make any adjustments or answer any questions.
Thank you for your time and for maintaining this project!

mjpost · 2025-07-15T19:50:54Z

@xjli360 I think this implementation is unnecessary. If you want to do offline evaluation, you can follow the following procedure:

Run evaluation somewhere you have internet access. This will download files to ~/.cache/huggingface
Copy that cache directory to your offline serve

I have done this successfully with a custom evaluation script.

xjli360 · 2025-07-17T02:48:39Z

@xjli360 I think this implementation is unnecessary. If you want to do offline evaluation, you can follow the following procedure:

Run evaluation somewhere you have internet access. This will download files to ~/.cache/huggingface

Copy that cache directory to your offline serve

I have done this successfully with a custom evaluation script.

Thank you for the suggestion, but I believe this implementation is still necessary for the following reasons:

In many production or enterprise environments, machines with GPUs are often restricted from accessing the internet for security or policy reasons. In such cases, datasets can only be downloaded on CPU-only machines with internet access and then transferred manually to GPU machines. The proposed approach of relying on evaluation "somewhere with internet access" is not always feasible or convenient in these scenarios.
Additionally, not all resources required during evaluation are cached under ~/.cache/huggingface. For example, certain benchmarks—such as tinybenchmark—are downloaded at runtime and stored elsewhere. Relying solely on the Hugging Face cache does not guarantee complete offline compatibility.

Therefore, having explicit support for an offline dataset path (e.g., LOCAL_DATASET_PATH) provides a more robust and generalizable solution for offline evaluation in constrained environments.

fix: add offline dataset support via LOCAL_DATASET_PATH

8a6a62e

NathanHB mentioned this pull request Jun 17, 2025

[BUG] support direct evaluation of local API and local data sets? #793

Open

xjli360 added 2 commits June 26, 2025 16:49

Merge branch 'main' into main

73c4820

Merge branch 'main' into main

9895efc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: add offline dataset support via LOCAL_DATASET_PATH #807

fix: add offline dataset support via LOCAL_DATASET_PATH #807

Uh oh!

xjli360 commented Jun 10, 2025

Uh oh!

xjli360 commented Jul 7, 2025

Uh oh!

mjpost commented Jul 15, 2025 •

edited

Loading

Uh oh!

xjli360 commented Jul 17, 2025

Uh oh!

Uh oh!

fix: add offline dataset support via LOCAL_DATASET_PATH #807

Are you sure you want to change the base?

fix: add offline dataset support via LOCAL_DATASET_PATH #807

Uh oh!

Conversation

xjli360 commented Jun 10, 2025

Uh oh!

xjli360 commented Jul 7, 2025

Uh oh!

mjpost commented Jul 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

xjli360 commented Jul 17, 2025

Uh oh!

Uh oh!

mjpost commented Jul 15, 2025 •

edited

Loading