diff --git a/language/deepseek-r1/README.md b/language/deepseek-r1/README.md index 2a4d85d7f2..a6c30a6155 100644 --- a/language/deepseek-r1/README.md +++ b/language/deepseek-r1/README.md @@ -1,6 +1,6 @@ -# Mlperf Inference DeepSeek Reference Implementation +# MLPerf Inference DeepSeek Reference Implementation -## Automated command to run the benchmark via MLFlow +## Automated command to run the benchmark via MLCFlow Please see the [new docs site](https://docs.mlcommons.org/inference/benchmarks/language/deepseek-r1/) for an automated way to run this benchmark across different available implementations and do an end-to-end submission with or without docker. @@ -13,6 +13,22 @@ You can also do pip install mlc-scripts and then use `mlcr` commands for downloa - DeepSeek-R1 model is automatically downloaded as part of setup - Checkpoint conversion is done transparently when needed. +**Using the MLC R2 Downloader** + +Download the model using the MLCommons R2 Downloader: + +```bash +bash <(curl -s https://raw.githubusercontent.com/mlcommons/r2-downloader/refs/heads/main/mlc-r2-downloader.sh) \ + https://inference.mlcommons-storage.org/metadata/deepseek-r1-0528.uri +``` + +To specify a custom download directory, use the `-d` flag: +```bash +bash <(curl -s https://raw.githubusercontent.com/mlcommons/r2-downloader/refs/heads/main/mlc-r2-downloader.sh) \ + -d /path/to/download/directory \ + https://inference.mlcommons-storage.org/metadata/deepseek-r1-0528.uri +``` + ## Dataset Download The dataset is an ensemble of the datasets: AIME, MATH500, gpqa, MMLU-Pro, livecodebench(code_generation_lite). They are covered by the following licenses: @@ -23,49 +39,40 @@ The dataset is an ensemble of the datasets: AIME, MATH500, gpqa, MMLU-Pro, livec - MMLU-Pro: [MIT](https://opensource.org/license/mit) - livecodebench(code_generation_lite): [CC](https://creativecommons.org/share-your-work/cclicenses/) -### Preprocessed - -**Using MLCFlow Automation** - -``` -mlcr get,dataset,whisper,_preprocessed,_mlc,_rclone --outdirname= -j -``` +### Preprocessed & Calibration -**Using Native method** +**Using the MLC R2 Downloader** -You can use Rclone to download the preprocessed dataset from a Cloudflare R2 bucket. +Download the full preprocessed dataset and calibration dataset using the MLCommons R2 Downloader: -To run Rclone on Windows, you can download the executable [here](https://rclone.org/install/#windows). -To install Rclone on Linux/macOS/BSD systems, run: -``` -sudo -v ; curl https://rclone.org/install.sh | sudo bash -``` -Once Rclone is installed, run the following command to authenticate with the bucket: -``` -rclone config create mlc-inference s3 provider=Cloudflare access_key_id=f65ba5eef400db161ea49967de89f47b secret_access_key=fbea333914c292b854f14d3fe232bad6c5407bf0ab1bebf78833c2b359bdfd2b endpoint=https://c2686074cb2caf5cbaf6d134bdba8b47.r2.cloudflarestorage.com +```bash +bash <(curl -s https://raw.githubusercontent.com/mlcommons/r2-downloader/refs/heads/main/mlc-r2-downloader.sh) \ +-d ./ https://inference.mlcommons-storage.org/metadata/deepseek-r1-datasets-fp8-eval.uri ``` -You can then navigate in the terminal to your desired download directory and run the following command to download the dataset: -``` -rclone copy mlc-inference:mlcommons-inference-wg-public/deepseek_r1/datasets/mlperf_deepseek_r1_dataset_4388_fp8_eval.pkl ./ -P +This will download the full preprocessed dataset file (`mlperf_deepseek_r1_dataset_4388_fp8_eval.pkl`) and the calibration dataset file (`mlperf_deepseek_r1_calibration_dataset_500_fp8_eval.pkl`). + +To specify a custom download directory, use the `-d` flag: +```bash +bash <(curl -s https://raw.githubusercontent.com/mlcommons/r2-downloader/refs/heads/main/mlc-r2-downloader.sh) \ + -d /path/to/download/directory \ + https://inference.mlcommons-storage.org/metadata/deepseek-r1-datasets-fp8-eval.uri ``` -### Calibration +### Preprocessed **Using MLCFlow Automation** ``` -mlcr get,preprocessed,dataset,deepseek-r1,_calibration,_mlc,_rclone --outdirname= -j +mlcr get,preprocessed,dataset,deepseek-r1,_validation,_mlc,_r2-downloader --outdirname= -j ``` -**Using Native method** - -Download and install Rclone as described in the previous section. +### Calibration -Then navigate in the terminal to your desired download directory and run the following command to download the dataset: +**Using MLCFlow Automation** ``` -rclone copy mlc-inference:mlcommons-inference-wg-public/deepseek_r1/datasets/mlperf_deepseek_r1_calibration_dataset_500_fp8_eval.pkl ./ -P +mlcr get,preprocessed,dataset,deepseek-r1,_calibration,_mlc,_r2-downloader --outdirname= -j ``` ## Docker @@ -204,7 +211,7 @@ The following table shows which backends support different evaluation and MLPerf **Using MLCFlow Automation** ``` -TBD +mlcr run,accuracy,mlperf,_dataset_deepseek-r1 --result_dir= ``` **Using Native method** diff --git a/language/llama3.1-8b/README.md b/language/llama3.1-8b/README.md index 9dd571411b..1ba19c204b 100644 --- a/language/llama3.1-8b/README.md +++ b/language/llama3.1-8b/README.md @@ -104,7 +104,7 @@ You need to request for access to [MLCommons](http://llama3-1.mlcommons.org/) an **Official Model download using MLCFlow Automation** You can download the model automatically via the below command ``` -TBD +mlcr get,ml-model,llama3,_mlc,_8b,_r2-downloader --outdirname= -j ``` @@ -137,59 +137,57 @@ Downloading llama3.1-8b model from Hugging Face will require an [**access token* ### Preprocessed -You can use Rclone to download the preprocessed dataset from a Cloudflare R2 bucket. - -To run Rclone on Windows, you can download the executable [here](https://rclone.org/install/#windows). -To install Rclone on Linux/macOS/BSD systems, run: -``` -sudo -v ; curl https://rclone.org/install.sh | sudo bash -``` -Once Rclone is installed, run the following command to authenticate with the bucket: -``` -rclone config create mlc-inference s3 provider=Cloudflare access_key_id=f65ba5eef400db161ea49967de89f47b secret_access_key=fbea333914c292b854f14d3fe232bad6c5407bf0ab1bebf78833c2b359bdfd2b endpoint=https://c2686074cb2caf5cbaf6d134bdba8b47.r2.cloudflarestorage.com -``` -You can then navigate in the terminal to your desired download directory and run the following command to download the dataset: +Download the preprocessed datasets using the MLCommons downloader: #### Full dataset (datacenter) **Using MLCFlow Automation** ``` -mlcr get,dataset,cnndm,_validation,_datacenter,_llama3,_mlc,_rclone --outdirname= -j +mlcr get,dataset,cnndm,_validation,_datacenter,_llama3,_mlc,_r2-downloader --outdirname= -j ``` **Native method** +```bash +bash <(curl -s https://raw.githubusercontent.com/mlcommons/r2-downloader/refs/heads/main/mlc-r2-downloader.sh) \ + https://inference.mlcommons-storage.org/metadata/llama3-1-8b-cnn-eval.uri ``` -rclone copy mlc-inference:mlcommons-inference-wg-public/llama3.1_8b/datasets/cnn_eval.json ./ -P -``` +This will download `cnn_eval.json`. #### 5000 samples (edge) **Using MLCFlow Automation** ``` -mlcr get,dataset,cnndm,_validation,_edge,_llama3,_mlc,_rclone --outdirname= -j +mlcr get,dataset,cnndm,_validation,_edge,_llama3,_mlc,_r2-downloader --outdirname= -j ``` **Native method** +```bash +bash <(curl -s https://raw.githubusercontent.com/mlcommons/r2-downloader/refs/heads/main/mlc-r2-downloader.sh) \ + https://inference.mlcommons-storage.org/metadata/llama3-1-8b-sample-cnn-eval-5000.uri ``` -rclone copy mlc-inference:mlcommons-inference-wg-public/llama3.1_8b/datasets/cnn_eval_5000.json ./ -P -``` + +This will download `sample_cnn_eval_5000.json`. + #### Calibration **Using MLCFlow Automation** ``` -mlcr get,dataset,cnndm,_calibration,_llama3,_mlc,_rclone --outdirname= -j +mlcr get,dataset,cnndm,_calibration,_llama3,_mlc,_r2-downloader --outdirname= -j ``` **Native method** +```bash +bash <(curl -s https://raw.githubusercontent.com/mlcommons/r2-downloader/refs/heads/main/mlc-r2-downloader.sh) \ + https://inference.mlcommons-storage.org/metadata/llama3-1-8b-cnn-dailymail-calibration.uri ``` -rclone copy mlc-inference:mlcommons-inference-wg-public/llama3.1_8b/datasets/cnn_dailymail_calibration.json ./ -P -``` - -You can also download the calibration dataset from the Cloudflare R2 bucket by running the following command: +This will download `cnn_dailymail_calibration.json`. -``` -rclone copy mlc-inference:mlcommons-inference-wg-public/llama3.1_8b/cnn_eval.json ./ -P +To specify a custom download directory for any of these, use the `-d` flag: +```bash +bash <(curl -s https://raw.githubusercontent.com/mlcommons/r2-downloader/refs/heads/main/mlc-r2-downloader.sh) \ + -d /path/to/download/directory \ + ``` diff --git a/speech2text/README.md b/speech2text/README.md index aae95e3c43..972fb965c2 100644 --- a/speech2text/README.md +++ b/speech2text/README.md @@ -102,26 +102,24 @@ VLLM_TARGET_DEVICE=cpu pip install --break-system-packages . --no-build-isolatio You can download the model automatically via the below command ``` -mlcr get,ml-model,whisper,_rclone,_mlc --outdirname= -j +mlcr get,ml-model,whisper,_r2-downloader,_mlc --outdirname= -j ``` -**Official Model download using native method** +**Official Model download using MLC R2 Downloader** -You can use Rclone to download the preprocessed dataset from a Cloudflare R2 bucket. +Download the Whisper model using the MLCommons downloader: -To run Rclone on Windows, you can download the executable [here](https://rclone.org/install/#windows). -To install Rclone on Linux/macOS/BSD systems, run: -``` -sudo -v ; curl https://rclone.org/install.sh | sudo bash -``` -Once Rclone is installed, run the following command to authenticate with the bucket: -``` -rclone config create mlc-inference s3 provider=Cloudflare access_key_id=f65ba5eef400db161ea49967de89f47b secret_access_key=fbea333914c292b854f14d3fe232bad6c5407bf0ab1bebf78833c2b359bdfd2b endpoint=https://c2686074cb2caf5cbaf6d134bdba8b47.r2.cloudflarestorage.com +```bash +bash <(curl -s https://raw.githubusercontent.com/mlcommons/r2-downloader/refs/heads/main/mlc-r2-downloader.sh) -d whisper/model https://inference.mlcommons-storage.org/metadata/whisper-model.uri ``` -You can then navigate in the terminal to your desired download directory and run the following command to download the model: -``` -rclone copy mlc-inference:mlcommons-inference-wg-public/Whisper/model/ ./ -P +This will download the Whisper model files. + +To specify a custom download directory, use the `-d` flag: +```bash +bash <(curl -s https://raw.githubusercontent.com/mlcommons/r2-downloader/refs/heads/main/mlc-r2-downloader.sh) \ + -d /path/to/download/directory \ + https://inference.mlcommons-storage.org/metadata/whisper-model.uri ``` ### External Download (Not recommended for official submission) @@ -153,16 +151,24 @@ We use dev-clean and dev-other splits, which are approximately 10 hours. **Using MLCFlow Automation** ``` -mlcr get,dataset,whisper,_preprocessed,_mlc,_rclone --outdirname= -j +mlcr get,dataset,whisper,_preprocessed,_mlc,_r2-downloader --outdirname= -j ``` -**Native method** +**Using MLC R2 Downloader** -Download and install rclone as decribed in the [MLCommons Download section](#mlcommons-download) +Download the preprocessed dataset using the MLCommons R2 Downloader: -You can then navigate in the terminal to your desired download directory and run the following command to download the dataset: +```bash +bash <(curl -s https://raw.githubusercontent.com/mlcommons/r2-downloader/refs/heads/main/mlc-r2-downloader.sh) -d whisper/dataset https://inference.mlcommons-storage.org/metadata/whisper-dataset.uri ``` -rclone copy mlc-inference:mlcommons-inference-wg-public/Whisper/dataset/ ./ -P + +This will download the LibriSpeech dataset files. + +To specify a custom download directory, use the `-d` flag: +```bash +bash <(curl -s https://raw.githubusercontent.com/mlcommons/r2-downloader/refs/heads/main/mlc-r2-downloader.sh) \ + -d /path/to/download/directory \ + https://inference.mlcommons-storage.org/metadata/whisper-dataset.uri ``` ### Unprocessed