Skip to content

Commit d630a96

Browse files
authored
Clarify separate datasets & model download commands in README.md
1 parent 6283f8f commit d630a96

File tree

1 file changed

+27
-27
lines changed

1 file changed

+27
-27
lines changed

language/deepseek-r1/README.md

Lines changed: 27 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# Mlperf Inference DeepSeek Reference Implementation
1+
# MLPerf Inference DeepSeek Reference Implementation
22

33
## Automated command to run the benchmark via MLFlow
44

@@ -13,6 +13,22 @@ You can also do pip install mlc-scripts and then use `mlcr` commands for downloa
1313
- DeepSeek-R1 model is automatically downloaded as part of setup
1414
- Checkpoint conversion is done transparently when needed.
1515

16+
**Using the MLC R2 Downloader**
17+
18+
Download the model using the MLCommons R2 Downloader:
19+
20+
```bash
21+
bash <(curl -s https://raw.githubusercontent.com/mlcommons/r2-downloader/refs/heads/main/mlc-r2-downloader.sh) \
22+
https://inference.mlcommons-storage.org/metadata/deepseek-r1-0528.uri
23+
```
24+
25+
To specify a custom download directory, use the `-d` flag:
26+
```bash
27+
bash <(curl -s https://raw.githubusercontent.com/mlcommons/r2-downloader/refs/heads/main/mlc-r2-downloader.sh) \
28+
-d /path/to/download/directory \
29+
https://inference.mlcommons-storage.org/metadata/deepseek-r1-0528.uri
30+
```
31+
1632
## Dataset Download
1733

1834
The dataset is an ensemble of the datasets: AIME, MATH500, gpqa, MMLU-Pro, livecodebench(code_generation_lite). They are covered by the following licenses:
@@ -23,24 +39,18 @@ The dataset is an ensemble of the datasets: AIME, MATH500, gpqa, MMLU-Pro, livec
2339
- MMLU-Pro: [MIT](https://opensource.org/license/mit)
2440
- livecodebench(code_generation_lite): [CC](https://creativecommons.org/share-your-work/cclicenses/)
2541

26-
### Preprocessed
27-
28-
**Using MLCFlow Automation**
42+
### Preprocessed & Calibration
2943

30-
```
31-
mlcr get,dataset,whisper,_preprocessed,_mlc,_rclone --outdirname=<path to download> -j
32-
```
44+
**Using the MLC R2 Downloader**
3345

34-
**Using Native method**
35-
36-
Download the preprocessed dataset using the MLCommons downloader:
46+
Download the full preprocessed dataset and calibration dataset using the MLCommons R2 Downloader:
3747

3848
```bash
3949
bash <(curl -s https://raw.githubusercontent.com/mlcommons/r2-downloader/refs/heads/main/mlc-r2-downloader.sh) \
40-
https://inference.mlcommons-storage.org/metadata/deepseek-r1-datasets-fp8-eval.uri
50+
-d ./ https://inference.mlcommons-storage.org/metadata/deepseek-r1-datasets-fp8-eval.uri
4151
```
4252

43-
This will download the dataset file `mlperf_deepseek_r1_dataset_4388_fp8_eval.pkl`.
53+
This will download the full preprocessed dataset file (`mlperf_deepseek_r1_dataset_4388_fp8_eval.pkl`) and the calibration dataset file (`mlperf_deepseek_r1_calibration_dataset_500_fp8_eval.pkl`).
4454

4555
To specify a custom download directory, use the `-d` flag:
4656
```bash
@@ -49,30 +59,20 @@ bash <(curl -s https://raw.githubusercontent.com/mlcommons/r2-downloader/refs/he
4959
https://inference.mlcommons-storage.org/metadata/deepseek-r1-datasets-fp8-eval.uri
5060
```
5161

52-
### Calibration
62+
### Preprocessed
5363

5464
**Using MLCFlow Automation**
5565

5666
```
57-
mlcr get,preprocessed,dataset,deepseek-r1,_calibration,_mlc,_rclone --outdirname=<path to download> -j
67+
mlcr get,dataset,whisper,_preprocessed,_mlc,_rclone --outdirname=<path to download> -j
5868
```
5969

60-
**Using Native method**
70+
### Calibration
6171

62-
Download the calibration dataset using the MLCommons downloader:
72+
**Using MLCFlow Automation**
6373

64-
```bash
65-
bash <(curl -s https://raw.githubusercontent.com/mlcommons/r2-downloader/refs/heads/main/mlc-r2-downloader.sh) \
66-
https://inference.mlcommons-storage.org/metadata/deepseek-r1-0528.uri
6774
```
68-
69-
This will download the calibration dataset file `mlperf_deepseek_r1_calibration_dataset_500_fp8_eval.pkl`.
70-
71-
To specify a custom download directory, use the `-d` flag:
72-
```bash
73-
bash <(curl -s https://raw.githubusercontent.com/mlcommons/r2-downloader/refs/heads/main/mlc-r2-downloader.sh) \
74-
-d /path/to/download/directory \
75-
https://inference.mlcommons-storage.org/metadata/deepseek-r1-0528.uri
75+
mlcr get,preprocessed,dataset,deepseek-r1,_calibration,_mlc,_rclone --outdirname=<path to download> -j
7676
```
7777

7878
## Docker

0 commit comments

Comments
 (0)