Skip to content

Commit 14a378e

Browse files
committed
Remove rclone references and update download instructions for DeepSeek-R1, Llama 3.1 8b, and Whisper
- Replace rclone-based download instructions with new MLCommons downloader infrastructure - Update DeepSeek-R1, Llama 3.1 8b, and Whisper READMEs to use https://inference.mlcommons-storage.org - Maintain MLCFlow automation commands alongside native download methods - Add file size information for each download - Include -d flag documentation for custom download directories Fixes #2265
1 parent 6481ff4 commit 14a378e

File tree

3 files changed

+64
-53
lines changed

3 files changed

+64
-53
lines changed

language/deepseek-r1/README.md

Lines changed: 23 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -33,21 +33,20 @@ mlcr get,dataset,whisper,_preprocessed,_mlc,_rclone --outdirname=<path to downlo
3333

3434
**Using Native method**
3535

36-
You can use Rclone to download the preprocessed dataset from a Cloudflare R2 bucket.
36+
Download the preprocessed dataset using the MLCommons downloader:
3737

38-
To run Rclone on Windows, you can download the executable [here](https://rclone.org/install/#windows).
39-
To install Rclone on Linux/macOS/BSD systems, run:
40-
```
41-
sudo -v ; curl https://rclone.org/install.sh | sudo bash
42-
```
43-
Once Rclone is installed, run the following command to authenticate with the bucket:
44-
```
45-
rclone config create mlc-inference s3 provider=Cloudflare access_key_id=f65ba5eef400db161ea49967de89f47b secret_access_key=fbea333914c292b854f14d3fe232bad6c5407bf0ab1bebf78833c2b359bdfd2b endpoint=https://c2686074cb2caf5cbaf6d134bdba8b47.r2.cloudflarestorage.com
38+
```bash
39+
bash <(curl -s https://raw.githubusercontent.com/mlcommons/r2-downloader/refs/heads/main/mlc-r2-downloader.sh) \
40+
https://inference.mlcommons-storage.org/deepseek-r1%2Fdataset.json
4641
```
47-
You can then navigate in the terminal to your desired download directory and run the following command to download the dataset:
4842

49-
```
50-
rclone copy mlc-inference:mlcommons-inference-wg-public/deepseek_r1/datasets/mlperf_deepseek_r1_dataset_4388_fp8_eval.pkl ./ -P
43+
This will download the dataset file `mlperf_deepseek_r1_dataset_4388_fp8_eval.pkl` (~163MB).
44+
45+
To specify a custom download directory, use the `-d` flag:
46+
```bash
47+
bash <(curl -s https://raw.githubusercontent.com/mlcommons/r2-downloader/refs/heads/main/mlc-r2-downloader.sh) \
48+
-d /path/to/download/directory \
49+
https://inference.mlcommons-storage.org/deepseek-r1%2Fdataset.json
5150
```
5251

5352
### Calibration
@@ -60,12 +59,20 @@ mlcr get,preprocessed,dataset,deepseek-r1,_calibration,_mlc,_rclone --outdirname
6059

6160
**Using Native method**
6261

63-
Download and install Rclone as described in the previous section.
64-
65-
Then navigate in the terminal to your desired download directory and run the following command to download the dataset:
62+
Download the calibration dataset using the MLCommons downloader:
6663

64+
```bash
65+
bash <(curl -s https://raw.githubusercontent.com/mlcommons/r2-downloader/refs/heads/main/mlc-r2-downloader.sh) \
66+
https://inference.mlcommons-storage.org/deepseek-r1%2Fcalibration.json
6767
```
68-
rclone copy mlc-inference:mlcommons-inference-wg-public/deepseek_r1/datasets/mlperf_deepseek_r1_calibration_dataset_500_fp8_eval.pkl ./ -P
68+
69+
This will download the calibration dataset file `mlperf_deepseek_r1_calibration_dataset_500_fp8_eval.pkl`.
70+
71+
To specify a custom download directory, use the `-d` flag:
72+
```bash
73+
bash <(curl -s https://raw.githubusercontent.com/mlcommons/r2-downloader/refs/heads/main/mlc-r2-downloader.sh) \
74+
-d /path/to/download/directory \
75+
https://inference.mlcommons-storage.org/deepseek-r1%2Fcalibration.json
6976
```
7077

7178
## Docker

language/llama3.1-8b/README.md

Lines changed: 18 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -137,18 +137,7 @@ Downloading llama3.1-8b model from Hugging Face will require an [**access token*
137137

138138
### Preprocessed
139139

140-
You can use Rclone to download the preprocessed dataset from a Cloudflare R2 bucket.
141-
142-
To run Rclone on Windows, you can download the executable [here](https://rclone.org/install/#windows).
143-
To install Rclone on Linux/macOS/BSD systems, run:
144-
```
145-
sudo -v ; curl https://rclone.org/install.sh | sudo bash
146-
```
147-
Once Rclone is installed, run the following command to authenticate with the bucket:
148-
```
149-
rclone config create mlc-inference s3 provider=Cloudflare access_key_id=f65ba5eef400db161ea49967de89f47b secret_access_key=fbea333914c292b854f14d3fe232bad6c5407bf0ab1bebf78833c2b359bdfd2b endpoint=https://c2686074cb2caf5cbaf6d134bdba8b47.r2.cloudflarestorage.com
150-
```
151-
You can then navigate in the terminal to your desired download directory and run the following command to download the dataset:
140+
Download the preprocessed datasets using the MLCommons downloader:
152141

153142
#### Full dataset (datacenter)
154143

@@ -158,9 +147,11 @@ mlcr get,dataset,cnndm,_validation,_datacenter,_llama3,_mlc,_rclone --outdirname
158147
```
159148

160149
**Native method**
150+
```bash
151+
bash <(curl -s https://raw.githubusercontent.com/mlcommons/r2-downloader/refs/heads/main/mlc-r2-downloader.sh) \
152+
https://inference.mlcommons-storage.org/llama3.1-8b%2Fcnn-eval.json
161153
```
162-
rclone copy mlc-inference:mlcommons-inference-wg-public/llama3.1_8b/datasets/cnn_eval.json ./ -P
163-
```
154+
This will download `cnn_eval.json` (~267MB).
164155

165156
#### 5000 samples (edge)
166157

@@ -170,9 +161,11 @@ mlcr get,dataset,cnndm,_validation,_edge,_llama3,_mlc,_rclone --outdirname=<path
170161
```
171162

172163
**Native method**
164+
```bash
165+
bash <(curl -s https://raw.githubusercontent.com/mlcommons/r2-downloader/refs/heads/main/mlc-r2-downloader.sh) \
166+
https://inference.mlcommons-storage.org/llama3.1-8b%2Fsample-cnn-eval-5000.json
173167
```
174-
rclone copy mlc-inference:mlcommons-inference-wg-public/llama3.1_8b/datasets/sample_cnn_eval_5000.json ./ -P
175-
```
168+
This will download `sample_cnn_eval_5000.json` (~95MB).
176169

177170
#### Calibration
178171

@@ -182,14 +175,17 @@ mlcr get,dataset,cnndm,_calibration,_llama3,_mlc,_rclone --outdirname=<path to d
182175
```
183176

184177
**Native method**
178+
```bash
179+
bash <(curl -s https://raw.githubusercontent.com/mlcommons/r2-downloader/refs/heads/main/mlc-r2-downloader.sh) \
180+
https://inference.mlcommons-storage.org/llama3.1-8b%2Fcnn-dailymail-calibration.json
185181
```
186-
rclone copy mlc-inference:mlcommons-inference-wg-public/llama3.1_8b/datasets/cnn_dailymail_calibration.json ./ -P
187-
```
182+
This will download `cnn_dailymail_calibration.json` (~21MB).
188183

189-
You can also download the calibration dataset from the Cloudflare R2 bucket by running the following command:
190-
191-
```
192-
rclone copy mlc-inference:mlcommons-inference-wg-public/llama3.1_8b/cnn_eval.json ./ -P
184+
To specify a custom download directory for any of these, use the `-d` flag:
185+
```bash
186+
bash <(curl -s https://raw.githubusercontent.com/mlcommons/r2-downloader/refs/heads/main/mlc-r2-downloader.sh) \
187+
-d /path/to/download/directory \
188+
<URI>
193189
```
194190

195191

speech2text/README.md

Lines changed: 23 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -107,21 +107,20 @@ mlcr get,ml-model,whisper,_rclone,_mlc --outdirname=<path_to_download> -j
107107

108108
**Official Model download using native method**
109109

110-
You can use Rclone to download the preprocessed dataset from a Cloudflare R2 bucket.
110+
Download the Whisper model using the MLCommons downloader:
111111

112-
To run Rclone on Windows, you can download the executable [here](https://rclone.org/install/#windows).
113-
To install Rclone on Linux/macOS/BSD systems, run:
114-
```
115-
sudo -v ; curl https://rclone.org/install.sh | sudo bash
116-
```
117-
Once Rclone is installed, run the following command to authenticate with the bucket:
118-
```
119-
rclone config create mlc-inference s3 provider=Cloudflare access_key_id=f65ba5eef400db161ea49967de89f47b secret_access_key=fbea333914c292b854f14d3fe232bad6c5407bf0ab1bebf78833c2b359bdfd2b endpoint=https://c2686074cb2caf5cbaf6d134bdba8b47.r2.cloudflarestorage.com
112+
```bash
113+
bash <(curl -s https://raw.githubusercontent.com/mlcommons/r2-downloader/refs/heads/main/mlc-r2-downloader.sh) \
114+
https://inference.mlcommons-storage.org/whisper%2Fmodel.json
120115
```
121-
You can then navigate in the terminal to your desired download directory and run the following command to download the model:
122116

123-
```
124-
rclone copy mlc-inference:mlcommons-inference-wg-public/Whisper/model/ ./ -P
117+
This will download the Whisper model files (~25GB).
118+
119+
To specify a custom download directory, use the `-d` flag:
120+
```bash
121+
bash <(curl -s https://raw.githubusercontent.com/mlcommons/r2-downloader/refs/heads/main/mlc-r2-downloader.sh) \
122+
-d /path/to/download/directory \
123+
https://inference.mlcommons-storage.org/whisper%2Fmodel.json
125124
```
126125

127126
### External Download (Not recommended for official submission)
@@ -158,11 +157,20 @@ mlcr get,dataset,whisper,_preprocessed,_mlc,_rclone --outdirname=<path to downlo
158157

159158
**Native method**
160159

161-
Download and install rclone as decribed in the [MLCommons Download section](#mlcommons-download)
160+
Download the preprocessed dataset using the MLCommons downloader:
162161

163-
You can then navigate in the terminal to your desired download directory and run the following command to download the dataset:
162+
```bash
163+
bash <(curl -s https://raw.githubusercontent.com/mlcommons/r2-downloader/refs/heads/main/mlc-r2-downloader.sh) \
164+
https://inference.mlcommons-storage.org/whisper%2Fdataset.json
164165
```
165-
rclone copy mlc-inference:mlcommons-inference-wg-public/Whisper/dataset/ ./ -P
166+
167+
This will download the LibriSpeech dataset files (~4.6GB).
168+
169+
To specify a custom download directory, use the `-d` flag:
170+
```bash
171+
bash <(curl -s https://raw.githubusercontent.com/mlcommons/r2-downloader/refs/heads/main/mlc-r2-downloader.sh) \
172+
-d /path/to/download/directory \
173+
https://inference.mlcommons-storage.org/whisper%2Fdataset.json
166174
```
167175

168176
### Unprocessed

0 commit comments

Comments
 (0)