Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
67 changes: 37 additions & 30 deletions language/deepseek-r1/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Mlperf Inference DeepSeek Reference Implementation
# MLPerf Inference DeepSeek Reference Implementation

## Automated command to run the benchmark via MLFlow
## Automated command to run the benchmark via MLCFlow

Please see the [new docs site](https://docs.mlcommons.org/inference/benchmarks/language/deepseek-r1/) for an automated way to run this benchmark across different available implementations and do an end-to-end submission with or without docker.

Expand All @@ -13,6 +13,22 @@ You can also do pip install mlc-scripts and then use `mlcr` commands for downloa
- DeepSeek-R1 model is automatically downloaded as part of setup
- Checkpoint conversion is done transparently when needed.

**Using the MLC R2 Downloader**

Download the model using the MLCommons R2 Downloader:

```bash
bash <(curl -s https://raw.githubusercontent.com/mlcommons/r2-downloader/refs/heads/main/mlc-r2-downloader.sh) \
https://inference.mlcommons-storage.org/metadata/deepseek-r1-0528.uri
```

To specify a custom download directory, use the `-d` flag:
```bash
bash <(curl -s https://raw.githubusercontent.com/mlcommons/r2-downloader/refs/heads/main/mlc-r2-downloader.sh) \
-d /path/to/download/directory \
https://inference.mlcommons-storage.org/metadata/deepseek-r1-0528.uri
```

## Dataset Download

The dataset is an ensemble of the datasets: AIME, MATH500, gpqa, MMLU-Pro, livecodebench(code_generation_lite). They are covered by the following licenses:
Expand All @@ -23,49 +39,40 @@ The dataset is an ensemble of the datasets: AIME, MATH500, gpqa, MMLU-Pro, livec
- MMLU-Pro: [MIT](https://opensource.org/license/mit)
- livecodebench(code_generation_lite): [CC](https://creativecommons.org/share-your-work/cclicenses/)

### Preprocessed

**Using MLCFlow Automation**

```
mlcr get,dataset,whisper,_preprocessed,_mlc,_rclone --outdirname=<path to download> -j
```
### Preprocessed & Calibration

**Using Native method**
**Using the MLC R2 Downloader**

You can use Rclone to download the preprocessed dataset from a Cloudflare R2 bucket.
Download the full preprocessed dataset and calibration dataset using the MLCommons R2 Downloader:

To run Rclone on Windows, you can download the executable [here](https://rclone.org/install/#windows).
To install Rclone on Linux/macOS/BSD systems, run:
```
sudo -v ; curl https://rclone.org/install.sh | sudo bash
```
Once Rclone is installed, run the following command to authenticate with the bucket:
```
rclone config create mlc-inference s3 provider=Cloudflare access_key_id=f65ba5eef400db161ea49967de89f47b secret_access_key=fbea333914c292b854f14d3fe232bad6c5407bf0ab1bebf78833c2b359bdfd2b endpoint=https://c2686074cb2caf5cbaf6d134bdba8b47.r2.cloudflarestorage.com
```bash
bash <(curl -s https://raw.githubusercontent.com/mlcommons/r2-downloader/refs/heads/main/mlc-r2-downloader.sh) \
-d ./ https://inference.mlcommons-storage.org/metadata/deepseek-r1-datasets-fp8-eval.uri
```
You can then navigate in the terminal to your desired download directory and run the following command to download the dataset:

```
rclone copy mlc-inference:mlcommons-inference-wg-public/deepseek_r1/datasets/mlperf_deepseek_r1_dataset_4388_fp8_eval.pkl ./ -P
This will download the full preprocessed dataset file (`mlperf_deepseek_r1_dataset_4388_fp8_eval.pkl`) and the calibration dataset file (`mlperf_deepseek_r1_calibration_dataset_500_fp8_eval.pkl`).

To specify a custom download directory, use the `-d` flag:
```bash
bash <(curl -s https://raw.githubusercontent.com/mlcommons/r2-downloader/refs/heads/main/mlc-r2-downloader.sh) \
-d /path/to/download/directory \
https://inference.mlcommons-storage.org/metadata/deepseek-r1-datasets-fp8-eval.uri
```

### Calibration
### Preprocessed

**Using MLCFlow Automation**

```
mlcr get,preprocessed,dataset,deepseek-r1,_calibration,_mlc,_rclone --outdirname=<path to download> -j
mlcr get,preprocessed,dataset,deepseek-r1,_validation,_mlc,_r2-downloader --outdirname=<path to download> -j
```

**Using Native method**

Download and install Rclone as described in the previous section.
### Calibration

Then navigate in the terminal to your desired download directory and run the following command to download the dataset:
**Using MLCFlow Automation**

```
rclone copy mlc-inference:mlcommons-inference-wg-public/deepseek_r1/datasets/mlperf_deepseek_r1_calibration_dataset_500_fp8_eval.pkl ./ -P
mlcr get,preprocessed,dataset,deepseek-r1,_calibration,_mlc,_r2-downloader --outdirname=<path to download> -j
```

## Docker
Expand Down Expand Up @@ -204,7 +211,7 @@ The following table shows which backends support different evaluation and MLPerf
**Using MLCFlow Automation**

```
TBD
mlcr run,accuracy,mlperf,_dataset_deepseek-r1 --result_dir=<Path to directory where files are generated after the benchmark run>
```

**Using Native method**
Expand Down
50 changes: 24 additions & 26 deletions language/llama3.1-8b/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -104,7 +104,7 @@ You need to request for access to [MLCommons](http://llama3-1.mlcommons.org/) an
**Official Model download using MLCFlow Automation**
You can download the model automatically via the below command
```
TBD
mlcr get,ml-model,llama3,_mlc,_8b,_r2-downloader --outdirname=<path to download> -j
```


Expand Down Expand Up @@ -137,59 +137,57 @@ Downloading llama3.1-8b model from Hugging Face will require an [**access token*

### Preprocessed

You can use Rclone to download the preprocessed dataset from a Cloudflare R2 bucket.

To run Rclone on Windows, you can download the executable [here](https://rclone.org/install/#windows).
To install Rclone on Linux/macOS/BSD systems, run:
```
sudo -v ; curl https://rclone.org/install.sh | sudo bash
```
Once Rclone is installed, run the following command to authenticate with the bucket:
```
rclone config create mlc-inference s3 provider=Cloudflare access_key_id=f65ba5eef400db161ea49967de89f47b secret_access_key=fbea333914c292b854f14d3fe232bad6c5407bf0ab1bebf78833c2b359bdfd2b endpoint=https://c2686074cb2caf5cbaf6d134bdba8b47.r2.cloudflarestorage.com
```
You can then navigate in the terminal to your desired download directory and run the following command to download the dataset:
Download the preprocessed datasets using the MLCommons downloader:

#### Full dataset (datacenter)

**Using MLCFlow Automation**
```
mlcr get,dataset,cnndm,_validation,_datacenter,_llama3,_mlc,_rclone --outdirname=<path to download> -j
mlcr get,dataset,cnndm,_validation,_datacenter,_llama3,_mlc,_r2-downloader --outdirname=<path to download> -j
```

**Native method**
```bash
bash <(curl -s https://raw.githubusercontent.com/mlcommons/r2-downloader/refs/heads/main/mlc-r2-downloader.sh) \
https://inference.mlcommons-storage.org/metadata/llama3-1-8b-cnn-eval.uri
```
rclone copy mlc-inference:mlcommons-inference-wg-public/llama3.1_8b/datasets/cnn_eval.json ./ -P
```
This will download `cnn_eval.json`.

#### 5000 samples (edge)

**Using MLCFlow Automation**
```
mlcr get,dataset,cnndm,_validation,_edge,_llama3,_mlc,_rclone --outdirname=<path to download> -j
mlcr get,dataset,cnndm,_validation,_edge,_llama3,_mlc,_r2-downloader --outdirname=<path to download> -j
```

**Native method**
```bash
bash <(curl -s https://raw.githubusercontent.com/mlcommons/r2-downloader/refs/heads/main/mlc-r2-downloader.sh) \
https://inference.mlcommons-storage.org/metadata/llama3-1-8b-sample-cnn-eval-5000.uri
```
rclone copy mlc-inference:mlcommons-inference-wg-public/llama3.1_8b/datasets/cnn_eval_5000.json ./ -P
```

This will download `sample_cnn_eval_5000.json`.


#### Calibration

**Using MLCFlow Automation**
```
mlcr get,dataset,cnndm,_calibration,_llama3,_mlc,_rclone --outdirname=<path to download> -j
mlcr get,dataset,cnndm,_calibration,_llama3,_mlc,_r2-downloader --outdirname=<path to download> -j
```

**Native method**
```bash
bash <(curl -s https://raw.githubusercontent.com/mlcommons/r2-downloader/refs/heads/main/mlc-r2-downloader.sh) \
https://inference.mlcommons-storage.org/metadata/llama3-1-8b-cnn-dailymail-calibration.uri
```
rclone copy mlc-inference:mlcommons-inference-wg-public/llama3.1_8b/datasets/cnn_dailymail_calibration.json ./ -P
```

You can also download the calibration dataset from the Cloudflare R2 bucket by running the following command:
This will download `cnn_dailymail_calibration.json`.

```
rclone copy mlc-inference:mlcommons-inference-wg-public/llama3.1_8b/cnn_eval.json ./ -P
To specify a custom download directory for any of these, use the `-d` flag:
```bash
bash <(curl -s https://raw.githubusercontent.com/mlcommons/r2-downloader/refs/heads/main/mlc-r2-downloader.sh) \
-d /path/to/download/directory \
<URI>
```


Expand Down
44 changes: 25 additions & 19 deletions speech2text/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -102,26 +102,24 @@ VLLM_TARGET_DEVICE=cpu pip install --break-system-packages . --no-build-isolatio

You can download the model automatically via the below command
```
mlcr get,ml-model,whisper,_rclone,_mlc --outdirname=<path_to_download> -j
mlcr get,ml-model,whisper,_r2-downloader,_mlc --outdirname=<path_to_download> -j
```

**Official Model download using native method**
**Official Model download using MLC R2 Downloader**

You can use Rclone to download the preprocessed dataset from a Cloudflare R2 bucket.
Download the Whisper model using the MLCommons downloader:

To run Rclone on Windows, you can download the executable [here](https://rclone.org/install/#windows).
To install Rclone on Linux/macOS/BSD systems, run:
```
sudo -v ; curl https://rclone.org/install.sh | sudo bash
```
Once Rclone is installed, run the following command to authenticate with the bucket:
```
rclone config create mlc-inference s3 provider=Cloudflare access_key_id=f65ba5eef400db161ea49967de89f47b secret_access_key=fbea333914c292b854f14d3fe232bad6c5407bf0ab1bebf78833c2b359bdfd2b endpoint=https://c2686074cb2caf5cbaf6d134bdba8b47.r2.cloudflarestorage.com
```bash
bash <(curl -s https://raw.githubusercontent.com/mlcommons/r2-downloader/refs/heads/main/mlc-r2-downloader.sh) -d whisper/model https://inference.mlcommons-storage.org/metadata/whisper-model.uri
```
You can then navigate in the terminal to your desired download directory and run the following command to download the model:

```
rclone copy mlc-inference:mlcommons-inference-wg-public/Whisper/model/ ./ -P
This will download the Whisper model files.

To specify a custom download directory, use the `-d` flag:
```bash
bash <(curl -s https://raw.githubusercontent.com/mlcommons/r2-downloader/refs/heads/main/mlc-r2-downloader.sh) \
-d /path/to/download/directory \
https://inference.mlcommons-storage.org/metadata/whisper-model.uri
```

### External Download (Not recommended for official submission)
Expand Down Expand Up @@ -153,16 +151,24 @@ We use dev-clean and dev-other splits, which are approximately 10 hours.

**Using MLCFlow Automation**
```
mlcr get,dataset,whisper,_preprocessed,_mlc,_rclone --outdirname=<path to download> -j
mlcr get,dataset,whisper,_preprocessed,_mlc,_r2-downloader --outdirname=<path to download> -j
```

**Native method**
**Using MLC R2 Downloader**

Download and install rclone as decribed in the [MLCommons Download section](#mlcommons-download)
Download the preprocessed dataset using the MLCommons R2 Downloader:

You can then navigate in the terminal to your desired download directory and run the following command to download the dataset:
```bash
bash <(curl -s https://raw.githubusercontent.com/mlcommons/r2-downloader/refs/heads/main/mlc-r2-downloader.sh) -d whisper/dataset https://inference.mlcommons-storage.org/metadata/whisper-dataset.uri
```
rclone copy mlc-inference:mlcommons-inference-wg-public/Whisper/dataset/ ./ -P

This will download the LibriSpeech dataset files.

To specify a custom download directory, use the `-d` flag:
```bash
bash <(curl -s https://raw.githubusercontent.com/mlcommons/r2-downloader/refs/heads/main/mlc-r2-downloader.sh) \
-d /path/to/download/directory \
https://inference.mlcommons-storage.org/metadata/whisper-dataset.uri
```

### Unprocessed
Expand Down