Skip to content

fix: add offline dataset support via LOCAL_DATASET_PATH #807

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

xjli360
Copy link

@xjli360 xjli360 commented Jun 10, 2025

This PR implements offline mode support for lighteval, allowing users to run evaluations without internet access.

By setting the environment variable:

export LOCAL_DATASET_PATH=/path/to/local/dataset

More details in: 802

@xjli360
Copy link
Author

xjli360 commented Jul 7, 2025

hi, I'd appreciate it if you could review my changes. I'm happy to make any adjustments or answer any questions.
Thank you for your time and for maintaining this project!

@mjpost
Copy link

mjpost commented Jul 15, 2025

@xjli360 I think this implementation is unnecessary. If you want to do offline evaluation, you can follow the following procedure:

  • Run evaluation somewhere you have internet access. This will download files to ~/.cache/huggingface
  • Copy that cache directory to your offline serve

I have done this successfully with a custom evaluation script.

@xjli360
Copy link
Author

xjli360 commented Jul 17, 2025

@xjli360 I think this implementation is unnecessary. If you want to do offline evaluation, you can follow the following procedure:

  • Run evaluation somewhere you have internet access. This will download files to ~/.cache/huggingface
  • Copy that cache directory to your offline serve

I have done this successfully with a custom evaluation script.

Thank you for the suggestion, but I believe this implementation is still necessary for the following reasons:

  1. In many production or enterprise environments, machines with GPUs are often restricted from accessing the internet for security or policy reasons. In such cases, datasets can only be downloaded on CPU-only machines with internet access and then transferred manually to GPU machines. The proposed approach of relying on evaluation "somewhere with internet access" is not always feasible or convenient in these scenarios.

  2. Additionally, not all resources required during evaluation are cached under ~/.cache/huggingface. For example, certain benchmarks—such as tinybenchmark—are downloaded at runtime and stored elsewhere. Relying solely on the Hugging Face cache does not guarantee complete offline compatibility.

Therefore, having explicit support for an offline dataset path (e.g., LOCAL_DATASET_PATH) provides a more robust and generalizable solution for offline evaluation in constrained environments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants