Skip to content

Commit c1f697e

Browse files
anivarAnivar A Aravind
authored andcommitted
Update download instructions to use MLCommons R2 downloader with correct URIs
- Remove rclone-based download instructions - Replace .json URLs with correct .uri files from metadata directory - Update download commands for DeepSeek-R1, Llama 3.1 8b, and Whisper - Use new MLCommons downloader infrastructure - Remove file size information from download instructions
1 parent 14a378e commit c1f697e

File tree

3 files changed

+17
-17
lines changed

3 files changed

+17
-17
lines changed

language/deepseek-r1/README.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -37,16 +37,16 @@ Download the preprocessed dataset using the MLCommons downloader:
3737

3838
```bash
3939
bash <(curl -s https://raw.githubusercontent.com/mlcommons/r2-downloader/refs/heads/main/mlc-r2-downloader.sh) \
40-
https://inference.mlcommons-storage.org/deepseek-r1%2Fdataset.json
40+
https://inference.mlcommons-storage.org/metadata/deepseek-r1-datasets-fp8-eval.uri
4141
```
4242

43-
This will download the dataset file `mlperf_deepseek_r1_dataset_4388_fp8_eval.pkl` (~163MB).
43+
This will download the dataset file `mlperf_deepseek_r1_dataset_4388_fp8_eval.pkl`.
4444

4545
To specify a custom download directory, use the `-d` flag:
4646
```bash
4747
bash <(curl -s https://raw.githubusercontent.com/mlcommons/r2-downloader/refs/heads/main/mlc-r2-downloader.sh) \
4848
-d /path/to/download/directory \
49-
https://inference.mlcommons-storage.org/deepseek-r1%2Fdataset.json
49+
https://inference.mlcommons-storage.org/metadata/deepseek-r1-datasets-fp8-eval.uri
5050
```
5151

5252
### Calibration
@@ -63,7 +63,7 @@ Download the calibration dataset using the MLCommons downloader:
6363

6464
```bash
6565
bash <(curl -s https://raw.githubusercontent.com/mlcommons/r2-downloader/refs/heads/main/mlc-r2-downloader.sh) \
66-
https://inference.mlcommons-storage.org/deepseek-r1%2Fcalibration.json
66+
https://inference.mlcommons-storage.org/metadata/deepseek-r1-0528.uri
6767
```
6868

6969
This will download the calibration dataset file `mlperf_deepseek_r1_calibration_dataset_500_fp8_eval.pkl`.
@@ -72,7 +72,7 @@ To specify a custom download directory, use the `-d` flag:
7272
```bash
7373
bash <(curl -s https://raw.githubusercontent.com/mlcommons/r2-downloader/refs/heads/main/mlc-r2-downloader.sh) \
7474
-d /path/to/download/directory \
75-
https://inference.mlcommons-storage.org/deepseek-r1%2Fcalibration.json
75+
https://inference.mlcommons-storage.org/metadata/deepseek-r1-0528.uri
7676
```
7777

7878
## Docker

language/llama3.1-8b/README.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -149,9 +149,9 @@ mlcr get,dataset,cnndm,_validation,_datacenter,_llama3,_mlc,_rclone --outdirname
149149
**Native method**
150150
```bash
151151
bash <(curl -s https://raw.githubusercontent.com/mlcommons/r2-downloader/refs/heads/main/mlc-r2-downloader.sh) \
152-
https://inference.mlcommons-storage.org/llama3.1-8b%2Fcnn-eval.json
152+
https://inference.mlcommons-storage.org/metadata/llama3-1-8b-cnn-eval.uri
153153
```
154-
This will download `cnn_eval.json` (~267MB).
154+
This will download `cnn_eval.json`.
155155

156156
#### 5000 samples (edge)
157157

@@ -163,9 +163,9 @@ mlcr get,dataset,cnndm,_validation,_edge,_llama3,_mlc,_rclone --outdirname=<path
163163
**Native method**
164164
```bash
165165
bash <(curl -s https://raw.githubusercontent.com/mlcommons/r2-downloader/refs/heads/main/mlc-r2-downloader.sh) \
166-
https://inference.mlcommons-storage.org/llama3.1-8b%2Fsample-cnn-eval-5000.json
166+
https://inference.mlcommons-storage.org/metadata/llama3-1-8b-sample-cnn-eval-5000.uri
167167
```
168-
This will download `sample_cnn_eval_5000.json` (~95MB).
168+
This will download `sample_cnn_eval_5000.json`.
169169

170170
#### Calibration
171171

@@ -177,9 +177,9 @@ mlcr get,dataset,cnndm,_calibration,_llama3,_mlc,_rclone --outdirname=<path to d
177177
**Native method**
178178
```bash
179179
bash <(curl -s https://raw.githubusercontent.com/mlcommons/r2-downloader/refs/heads/main/mlc-r2-downloader.sh) \
180-
https://inference.mlcommons-storage.org/llama3.1-8b%2Fcnn-dailymail-calibration.json
180+
https://inference.mlcommons-storage.org/metadata/llama3-1-8b-cnn-dailymail-calibration.uri
181181
```
182-
This will download `cnn_dailymail_calibration.json` (~21MB).
182+
This will download `cnn_dailymail_calibration.json`.
183183

184184
To specify a custom download directory for any of these, use the `-d` flag:
185185
```bash

speech2text/README.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -111,16 +111,16 @@ Download the Whisper model using the MLCommons downloader:
111111

112112
```bash
113113
bash <(curl -s https://raw.githubusercontent.com/mlcommons/r2-downloader/refs/heads/main/mlc-r2-downloader.sh) \
114-
https://inference.mlcommons-storage.org/whisper%2Fmodel.json
114+
https://inference.mlcommons-storage.org/metadata/whisper-model.uri
115115
```
116116

117-
This will download the Whisper model files (~25GB).
117+
This will download the Whisper model files.
118118

119119
To specify a custom download directory, use the `-d` flag:
120120
```bash
121121
bash <(curl -s https://raw.githubusercontent.com/mlcommons/r2-downloader/refs/heads/main/mlc-r2-downloader.sh) \
122122
-d /path/to/download/directory \
123-
https://inference.mlcommons-storage.org/whisper%2Fmodel.json
123+
https://inference.mlcommons-storage.org/metadata/whisper-model.uri
124124
```
125125

126126
### External Download (Not recommended for official submission)
@@ -161,16 +161,16 @@ Download the preprocessed dataset using the MLCommons downloader:
161161

162162
```bash
163163
bash <(curl -s https://raw.githubusercontent.com/mlcommons/r2-downloader/refs/heads/main/mlc-r2-downloader.sh) \
164-
https://inference.mlcommons-storage.org/whisper%2Fdataset.json
164+
https://inference.mlcommons-storage.org/metadata/whisper-dataset.uri
165165
```
166166

167-
This will download the LibriSpeech dataset files (~4.6GB).
167+
This will download the LibriSpeech dataset files.
168168

169169
To specify a custom download directory, use the `-d` flag:
170170
```bash
171171
bash <(curl -s https://raw.githubusercontent.com/mlcommons/r2-downloader/refs/heads/main/mlc-r2-downloader.sh) \
172172
-d /path/to/download/directory \
173-
https://inference.mlcommons-storage.org/whisper%2Fdataset.json
173+
https://inference.mlcommons-storage.org/metadata/whisper-dataset.uri
174174
```
175175

176176
### Unprocessed

0 commit comments

Comments
 (0)