@@ -137,18 +137,7 @@ Downloading llama3.1-8b model from Hugging Face will require an [**access token*
137
137
138
138
### Preprocessed
139
139
140
- You can use Rclone to download the preprocessed dataset from a Cloudflare R2 bucket.
141
-
142
- To run Rclone on Windows, you can download the executable [ here] ( https://rclone.org/install/#windows ) .
143
- To install Rclone on Linux/macOS/BSD systems, run:
144
- ```
145
- sudo -v ; curl https://rclone.org/install.sh | sudo bash
146
- ```
147
- Once Rclone is installed, run the following command to authenticate with the bucket:
148
- ```
149
- rclone config create mlc-inference s3 provider=Cloudflare access_key_id=f65ba5eef400db161ea49967de89f47b secret_access_key=fbea333914c292b854f14d3fe232bad6c5407bf0ab1bebf78833c2b359bdfd2b endpoint=https://c2686074cb2caf5cbaf6d134bdba8b47.r2.cloudflarestorage.com
150
- ```
151
- You can then navigate in the terminal to your desired download directory and run the following command to download the dataset:
140
+ Download the preprocessed datasets using the MLCommons downloader:
152
141
153
142
#### Full dataset (datacenter)
154
143
@@ -158,9 +147,11 @@ mlcr get,dataset,cnndm,_validation,_datacenter,_llama3,_mlc,_rclone --outdirname
158
147
```
159
148
160
149
** Native method**
150
+ ``` bash
151
+ bash <( curl -s https://raw.githubusercontent.com/mlcommons/r2-downloader/refs/heads/main/mlc-r2-downloader.sh) \
152
+ https://inference.mlcommons-storage.org/llama3.1-8b%2Fcnn-eval.json
161
153
```
162
- rclone copy mlc-inference:mlcommons-inference-wg-public/llama3.1_8b/datasets/cnn_eval.json ./ -P
163
- ```
154
+ This will download ` cnn_eval.json ` (~ 267MB).
164
155
165
156
#### 5000 samples (edge)
166
157
@@ -170,9 +161,11 @@ mlcr get,dataset,cnndm,_validation,_edge,_llama3,_mlc,_rclone --outdirname=<path
170
161
```
171
162
172
163
** Native method**
164
+ ``` bash
165
+ bash <( curl -s https://raw.githubusercontent.com/mlcommons/r2-downloader/refs/heads/main/mlc-r2-downloader.sh) \
166
+ https://inference.mlcommons-storage.org/llama3.1-8b%2Fsample-cnn-eval-5000.json
173
167
```
174
- rclone copy mlc-inference:mlcommons-inference-wg-public/llama3.1_8b/datasets/sample_cnn_eval_5000.json ./ -P
175
- ```
168
+ This will download ` sample_cnn_eval_5000.json ` (~ 95MB).
176
169
177
170
#### Calibration
178
171
@@ -182,14 +175,17 @@ mlcr get,dataset,cnndm,_calibration,_llama3,_mlc,_rclone --outdirname=<path to d
182
175
```
183
176
184
177
** Native method**
178
+ ``` bash
179
+ bash <( curl -s https://raw.githubusercontent.com/mlcommons/r2-downloader/refs/heads/main/mlc-r2-downloader.sh) \
180
+ https://inference.mlcommons-storage.org/llama3.1-8b%2Fcnn-dailymail-calibration.json
185
181
```
186
- rclone copy mlc-inference:mlcommons-inference-wg-public/llama3.1_8b/datasets/cnn_dailymail_calibration.json ./ -P
187
- ```
182
+ This will download ` cnn_dailymail_calibration.json ` (~ 21MB).
188
183
189
- You can also download the calibration dataset from the Cloudflare R2 bucket by running the following command:
190
-
191
- ```
192
- rclone copy mlc-inference:mlcommons-inference-wg-public/llama3.1_8b/cnn_eval.json ./ -P
184
+ To specify a custom download directory for any of these, use the ` -d ` flag:
185
+ ``` bash
186
+ bash <( curl -s https://raw.githubusercontent.com/mlcommons/r2-downloader/refs/heads/main/mlc-r2-downloader.sh) \
187
+ -d /path/to/download/directory \
188
+ < URI>
193
189
```
194
190
195
191
0 commit comments