Skip to content

Commit be09d60

Browse files
authored
Update README.md to clarify separate datasets and model download commands
1 parent c1f697e commit be09d60

File tree

1 file changed

+25
-26
lines changed

1 file changed

+25
-26
lines changed

language/deepseek-r1/README.md

Lines changed: 25 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,22 @@ You can also do pip install mlc-scripts and then use `mlcr` commands for downloa
1313
- DeepSeek-R1 model is automatically downloaded as part of setup
1414
- Checkpoint conversion is done transparently when needed.
1515

16+
**Using MLC R2 Downloader**
17+
18+
Download the model using the MLC R2 Downloader:
19+
20+
```bash
21+
bash <(curl -s https://raw.githubusercontent.com/mlcommons/r2-downloader/refs/heads/main/mlc-r2-downloader.sh) \
22+
https://inference.mlcommons-storage.org/metadata/deepseek-r1-0528.uri
23+
```
24+
25+
To specify a custom download directory, use the `-d` flag:
26+
```bash
27+
bash <(curl -s https://raw.githubusercontent.com/mlcommons/r2-downloader/refs/heads/main/mlc-r2-downloader.sh) \
28+
-d /path/to/download/directory \
29+
https://inference.mlcommons-storage.org/metadata/deepseek-r1-0528.uri
30+
```
31+
1632
## Dataset Download
1733

1834
The dataset is an ensemble of the datasets: AIME, MATH500, gpqa, MMLU-Pro, livecodebench(code_generation_lite). They are covered by the following licenses:
@@ -23,24 +39,17 @@ The dataset is an ensemble of the datasets: AIME, MATH500, gpqa, MMLU-Pro, livec
2339
- MMLU-Pro: [MIT](https://opensource.org/license/mit)
2440
- livecodebench(code_generation_lite): [CC](https://creativecommons.org/share-your-work/cclicenses/)
2541

26-
### Preprocessed
27-
28-
**Using MLCFlow Automation**
29-
30-
```
31-
mlcr get,dataset,whisper,_preprocessed,_mlc,_rclone --outdirname=<path to download> -j
32-
```
42+
### Preprocessed & Calibration
3343

34-
**Using Native method**
44+
**Using MLC R2 Downloader**
3545

3646
Download the preprocessed dataset using the MLCommons downloader:
3747

3848
```bash
39-
bash <(curl -s https://raw.githubusercontent.com/mlcommons/r2-downloader/refs/heads/main/mlc-r2-downloader.sh) \
40-
https://inference.mlcommons-storage.org/metadata/deepseek-r1-datasets-fp8-eval.uri
49+
bash <(curl -s https://raw.githubusercontent.com/mlcommons/r2-downloader/refs/heads/main/mlc-r2-downloader.sh) -d ./ https://inference.mlcommons-storage.org/metadata/deepseek-r1-datasets-fp8-eval.uri
4150
```
4251

43-
This will download the dataset file `mlperf_deepseek_r1_dataset_4388_fp8_eval.pkl`.
52+
This will download both the full preprocessed dataset (`mlperf_deepseek_r1_dataset_4388_fp8_eval.pkl`) and the calibration dataset (`mlperf_deepseek_r1_calibration_dataset_500_fp8_eval.pkl`).
4453

4554
To specify a custom download directory, use the `-d` flag:
4655
```bash
@@ -49,30 +58,20 @@ bash <(curl -s https://raw.githubusercontent.com/mlcommons/r2-downloader/refs/he
4958
https://inference.mlcommons-storage.org/metadata/deepseek-r1-datasets-fp8-eval.uri
5059
```
5160

52-
### Calibration
61+
### Preprocessed
5362

5463
**Using MLCFlow Automation**
5564

5665
```
57-
mlcr get,preprocessed,dataset,deepseek-r1,_calibration,_mlc,_rclone --outdirname=<path to download> -j
66+
mlcr get,dataset,whisper,_preprocessed,_mlc,_rclone --outdirname=<path to download> -j
5867
```
5968

60-
**Using Native method**
69+
### Calibration
6170

62-
Download the calibration dataset using the MLCommons downloader:
71+
**Using MLCFlow Automation**
6372

64-
```bash
65-
bash <(curl -s https://raw.githubusercontent.com/mlcommons/r2-downloader/refs/heads/main/mlc-r2-downloader.sh) \
66-
https://inference.mlcommons-storage.org/metadata/deepseek-r1-0528.uri
6773
```
68-
69-
This will download the calibration dataset file `mlperf_deepseek_r1_calibration_dataset_500_fp8_eval.pkl`.
70-
71-
To specify a custom download directory, use the `-d` flag:
72-
```bash
73-
bash <(curl -s https://raw.githubusercontent.com/mlcommons/r2-downloader/refs/heads/main/mlc-r2-downloader.sh) \
74-
-d /path/to/download/directory \
75-
https://inference.mlcommons-storage.org/metadata/deepseek-r1-0528.uri
74+
mlcr get,preprocessed,dataset,deepseek-r1,_calibration,_mlc,_rclone --outdirname=<path to download> -j
7675
```
7776

7877
## Docker

0 commit comments

Comments
 (0)