Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
143 commits
Select commit Hold shift + click to select a range
9b59d2e
[Automated Commit] Format Codebase
mlcommons-bot Dec 20, 2024
fe51c12
Merge branch 'mlcommons:master' into master
v-shobhit Sep 18, 2025
1f2666c
[Automated Commit] Format Codebase
github-actions[bot] Sep 18, 2025
f9c4f61
initial
v-shobhit Sep 27, 2025
db9d25e
[Automated Commit] Format Codebase
github-actions[bot] Sep 27, 2025
9daa72c
json fixes
v-shobhit Sep 27, 2025
2d0a179
updates, tokenizer
v-shobhit Sep 27, 2025
c8e679d
fix padding
v-shobhit Sep 27, 2025
50453c6
concurrent requests
v-shobhit Sep 27, 2025
e76b68d
increase timeout
v-shobhit Sep 27, 2025
b6d5671
[Automated Commit] Format Codebase
github-actions[bot] Sep 27, 2025
4fd4f56
add refactor changes
v-shobhit Sep 27, 2025
1df0885
[Automated Commit] Format Codebase
github-actions[bot] Sep 27, 2025
eb2f48c
rm truncation
v-shobhit Sep 27, 2025
4f35b8a
rm truncation, wait for server ready
v-shobhit Sep 27, 2025
354cb62
left padding
v-shobhit Sep 27, 2025
75f4307
[Automated Commit] Format Codebase
github-actions[bot] Sep 27, 2025
065bf7c
fixes
v-shobhit Sep 27, 2025
36aa581
add failure check
v-shobhit Sep 27, 2025
c75e629
change opts
v-shobhit Sep 27, 2025
040e986
organize files
v-shobhit Oct 1, 2025
991ff7e
rm submodule
v-shobhit Oct 1, 2025
492847a
add infer stuff
v-shobhit Oct 1, 2025
f3a3282
add harmonize-tokens.py
v-shobhit Oct 1, 2025
8596b13
move things
v-shobhit Oct 1, 2025
1db2f96
[Automated Commit] Format Codebase
github-actions[bot] Oct 1, 2025
8c08778
add README
v-shobhit Oct 1, 2025
168f210
fix name
v-shobhit Oct 1, 2025
13775b1
add commands
v-shobhit Oct 1, 2025
37c646d
update README
v-shobhit Oct 1, 2025
9cd0c61
accepts output_ids and detokenize
v-shobhit Oct 1, 2025
6abf8fa
[Automated Commit] Format Codebase
github-actions[bot] Oct 1, 2025
8fe8712
add plotter
v-shobhit Oct 1, 2025
889db8c
fix sampling
v-shobhit Oct 1, 2025
381cf60
add reasoning effort option
v-shobhit Oct 1, 2025
209e923
[Automated Commit] Format Codebase
github-actions[bot] Oct 1, 2025
fdc518d
doc option
v-shobhit Oct 1, 2025
f73da58
draw input histograms
v-shobhit Oct 1, 2025
4219a77
[Automated Commit] Format Codebase
github-actions[bot] Oct 1, 2025
782f066
updates
v-shobhit Oct 8, 2025
326a5fa
[Automated Commit] Format Codebase
github-actions[bot] Oct 8, 2025
1b99263
add more opts
v-shobhit Oct 8, 2025
579ef1c
move
v-shobhit Oct 13, 2025
ec8d12e
[Automated Commit] Format Codebase
github-actions[bot] Oct 13, 2025
a9de6f4
refactor
v-shobhit Oct 13, 2025
387779c
updates
v-shobhit Oct 13, 2025
bde219f
rename opt
v-shobhit Oct 13, 2025
24ef1e0
updates
v-shobhit Oct 14, 2025
141dd8d
[Automated Commit] Format Codebase
github-actions[bot] Oct 14, 2025
684e27a
add healthbench prompt creation
v-shobhit Oct 15, 2025
f5b04db
[Automated Commit] Format Codebase
github-actions[bot] Oct 15, 2025
d44256e
add healthbench eval
v-shobhit Oct 16, 2025
ff9133b
[Automated Commit] Format Codebase
github-actions[bot] Oct 16, 2025
5f1fd8e
add scripts to fetch datasets
v-shobhit Oct 16, 2025
ca654c8
[Automated Commit] Format Codebase
github-actions[bot] Oct 16, 2025
8c59f03
update requirements
v-shobhit Oct 28, 2025
108ea9d
add setup enroot script
v-shobhit Oct 28, 2025
f302c7c
add changes
v-shobhit Oct 29, 2025
534390c
[Automated Commit] Format Codebase
github-actions[bot] Oct 29, 2025
78bf971
add symlinks to gitmodules
v-shobhit Oct 29, 2025
115f498
add fetch_lcb.py
v-shobhit Oct 29, 2025
9571c11
[Automated Commit] Format Codebase
github-actions[bot] Oct 29, 2025
f8244ee
updates
v-shobhit Nov 4, 2025
531c37c
add pass@k; add spec decode option
v-shobhit Nov 5, 2025
5e86d65
add openai client; add pass@k
v-shobhit Nov 5, 2025
8c02839
[Automated Commit] Format Codebase
github-actions[bot] Nov 5, 2025
be4109e
restrict lcb to v5
v-shobhit Nov 5, 2025
c0b9ef3
[Automated Commit] Format Codebase
github-actions[bot] Nov 5, 2025
dc54e98
lcb optimizations
v-shobhit Nov 5, 2025
41cac5a
remove openai client
v-shobhit Nov 5, 2025
06a5387
[Automated Commit] Format Codebase
github-actions[bot] Nov 5, 2025
d927fb3
rename
v-shobhit Nov 10, 2025
c2fda5e
remove mmlu, healthbench
v-shobhit Nov 10, 2025
c5c389b
add fetch_all.py
v-shobhit Nov 10, 2025
341c750
updates
v-shobhit Nov 10, 2025
ef142a6
add preprocess
v-shobhit Nov 10, 2025
834175f
update README
v-shobhit Nov 10, 2025
e447081
[Automated Commit] Format Codebase
github-actions[bot] Nov 10, 2025
a1e668a
add top-p option
v-shobhit Nov 10, 2025
3db5bc2
add summarize_eval
v-shobhit Nov 10, 2025
3ca2cc1
[Automated Commit] Format Codebase
github-actions[bot] Nov 10, 2025
09cd3f7
add trtllm infer script
v-shobhit Nov 7, 2025
1aad396
fixes
v-shobhit Nov 7, 2025
a6e05f2
add round-robin for multi-dp
v-shobhit Nov 8, 2025
a2936a6
fix timeout issues
v-shobhit Nov 8, 2025
188411c
[Automated Commit] Format Codebase
github-actions[bot] Nov 8, 2025
b324d7d
add anthropic
v-shobhit Nov 11, 2025
f8a9f43
add stuff
v-shobhit Nov 11, 2025
8429244
optimize lcb multi-pass
v-shobhit Nov 11, 2025
d1f2794
[Automated Commit] Format Codebase
github-actions[bot] Nov 11, 2025
ee33969
rm healthbench
v-shobhit Nov 11, 2025
20f8916
[Automated Commit] Format Codebase
github-actions[bot] Nov 11, 2025
1e67e0c
lcb bug fixes
v-shobhit Nov 11, 2025
96c90e1
[Automated Commit] Format Codebase
github-actions[bot] Nov 11, 2025
e6d9c67
omit top-k if 0
v-shobhit Nov 11, 2025
ff892bb
[Automated Commit] Format Codebase
github-actions[bot] Nov 11, 2025
e4043d2
add changes and plotting scripts
v-shobhit Nov 12, 2025
1714c09
[Automated Commit] Format Codebase
github-actions[bot] Nov 12, 2025
19fcc80
add overall number
v-shobhit Nov 12, 2025
6cb7698
[Automated Commit] Format Codebase
github-actions[bot] Nov 12, 2025
b281727
add glob matching
v-shobhit Nov 12, 2025
9a0c45a
rename
v-shobhit Nov 20, 2025
35ea0e4
add pubmed tokenization
v-shobhit Nov 20, 2025
1c73423
updates
v-shobhit Nov 20, 2025
9a9194f
add tentative gpt-oss fields
v-shobhit Nov 20, 2025
1812c04
remove data dir
v-shobhit Nov 20, 2025
16af1bf
create preprocess module
v-shobhit Nov 20, 2025
9b4a84c
move things to archive
v-shobhit Nov 20, 2025
9450d46
[Automated Commit] Format Codebase
github-actions[bot] Nov 20, 2025
319e5f7
rm unused scripts
v-shobhit Nov 20, 2025
4d89b98
rm unused
v-shobhit Nov 20, 2025
be02519
mv things
v-shobhit Nov 20, 2025
18a8444
add mlperf artifacts
v-shobhit Nov 20, 2025
199f476
add mlperf artifacts
v-shobhit Nov 20, 2025
425ce75
[Automated Commit] Format Codebase
github-actions[bot] Nov 20, 2025
7cdc7cb
add utils, gitignore
v-shobhit Nov 20, 2025
ab90695
[Automated Commit] Format Codebase
github-actions[bot] Nov 20, 2025
62e4d47
update README
v-shobhit Nov 20, 2025
292c49d
fix request pool size
v-shobhit Nov 20, 2025
98585b8
[Automated Commit] Format Codebase
github-actions[bot] Nov 20, 2025
21a8034
add setup
v-shobhit Nov 21, 2025
734d8f4
updates
v-shobhit Nov 21, 2025
e3e22b8
server scenario fix; gpt-oss -> gpt-oss-120b
v-shobhit Nov 21, 2025
e40a7da
add fixes
v-shobhit Nov 21, 2025
382fc9e
add accuracy eval script for mlperf
v-shobhit Nov 21, 2025
31f435a
finishing touches
v-shobhit Nov 21, 2025
63592a3
[Automated Commit] Format Codebase
github-actions[bot] Nov 21, 2025
a41f882
refactor mode -> scenario
v-shobhit Nov 21, 2025
b2bc9e0
Merge branch 'mlcommons:master' into gptoss-loadgen
v-shobhit Nov 24, 2025
81f6ca5
add eval_perf script
v-shobhit Nov 26, 2025
f780189
[Automated Commit] Format Codebase
github-actions[bot] Nov 26, 2025
7f47e5e
add pass@k to acc eval
v-shobhit Dec 1, 2025
d3a7b58
add repeats_per_sample option to loadgen
v-shobhit Dec 1, 2025
60976f2
Merge branch 'loadgen-repeat-samples' into gptoss-loadgen
v-shobhit Dec 1, 2025
50051f2
[Automated Commit] Format Codebase
github-actions[bot] Dec 1, 2025
bee73b2
fix harmonize tokens -> text
v-shobhit Dec 2, 2025
5039fd6
[Automated Commit] Format Codebase
github-actions[bot] Dec 2, 2025
db4d290
remove file
v-shobhit Dec 2, 2025
dbb0fd9
fix prompt of summarization
v-shobhit Dec 3, 2025
57c6dae
move stuff to sglang
v-shobhit Dec 3, 2025
da35468
allow use of parquet
v-shobhit Dec 3, 2025
72cd475
[Automated Commit] Format Codebase
github-actions[bot] Dec 3, 2025
724502b
Merge branch 'master' into gptoss-loadgen
anandhu-eng Dec 3, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions language/gpt-oss/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
*venv*
*.pkl
*.csv
141 changes: 141 additions & 0 deletions language/gpt-oss/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,141 @@
# MLPerf Inference reference implementation for GPT-OSS-120B
This is the reference implementation for GPT-OSS-120B. This is a proposal and is a WIP.

## Model and Dataset download

* Model: `openai/gpt-oss-120b`, commit id: [`b5c939d`](https://huggingface.co/openai/gpt-oss-120b/tree/b5c939de8f754692c1647ca79fbf85e8c1e70f8a)
* Dataset: Please request access at [this link](https://drive.google.com/drive/folders/1DCfEXHqe69okrqKbSyV-8VUw413JqpPY?usp=drive_link) - **this is a tentative dataset**

Datasets are now provided in **Parquet format** (recommended) for better performance and smaller file size (50% smaller than pickle). Pickle format is still supported for backward compatibility.


## Environment setup
Work on reference implementation is done using the sglang containers at [https://hub.docker.com/r/lmsysorg/sglang/tags](https://hub.docker.com/r/lmsysorg/sglang/tags). For enroot setup, a script is provided under [`setup_enroot.sh`](./setup_enroot.sh). For all sections below, we shall assume this environment is instantiated.

Once in the environment, install additional requirements using [`setup.sh`](./setup.sh):
```bash
./setup.sh
```

## Running the reference implementation: SGLang
Use [`./sglang/run_server.sh`](./sglang/run_server.sh) to launch an SGLang server hosting `gpt-oss-120b`.

### Run the server
```bash
./run_server.sh \
--model_path path/to/gpt-oss-120b/model \
--dp N \
--stream_interval 100 \
--eagle_path optional/path/to/eagle/head
```
The script uses `python3 -m sglang.launch_server` tp instantiate the model, with `tp=pp=ep=1`, and `dp` as specified.

Then, run a benchmark script that uses the client to send/recv requests.
### Run the inference

**Note:** All scripts now support both Parquet (`.parquet`) and Pickle (`.pkl`) formats for dataset files. Parquet is recommended as it offers:
- 50% smaller file size
- Faster loading times
- Cross-language compatibility
- Type-safe schema preservation

Example usage:
```bash
# first, install loadgen
pip install $(git rev-parse --show-toplevel)/loadgen

# Using Parquet format (recommended)
python3 run_mlperf.py \
--scenario offline \
--input-file /path/to/dataset.parquet \
--accuracy

# Using Pickle format (backward compatible)
python3 run_mlperf.py \
--scenario offline \
--input-file /path/to/dataset.pkl \
--accuracy
```

Full command-line options:
```bash
python3 run_mlperf.py --help
usage: run_mlperf.py [-h] [--scenario {offline,server}] --input-file INPUT_FILE [--max-samples MAX_SAMPLES] [--mlperf-conf MLPERF_CONF]
[--user-conf USER_CONF] [--accuracy] [--output-dir OUTPUT_DIR] [--backend {sglang}] [--server-url SERVER_URL]
[--generation-config GENERATION_CONFIG] [--max-new-tokens MAX_NEW_TOKENS] [--num-workers NUM_WORKERS]
[--max-concurrency MAX_CONCURRENCY]

Run MLPerf inference benchmarks for gpt-oss

options:
-h, --help show this help message and exit
--scenario {offline,server}
MLPerf scenario mode
--input-file INPUT_FILE
Path to tokenized dataset (parquet or pickle file)
--max-samples MAX_SAMPLES
Maximum number of samples to use (None for all)
--mlperf-conf MLPERF_CONF
Path to MLPerf configuration file
--user-conf USER_CONF
Path to user configuration file
--accuracy Run accuracy mode instead of performance
--output-dir OUTPUT_DIR
Directory for MLPerf output logs
--backend {sglang} Backend to use for inference
--server-url SERVER_URL
Server URL for backend (SGLang)
--generation-config GENERATION_CONFIG
Path to generation configuration JSON file
--max-new-tokens MAX_NEW_TOKENS
Override max_new_tokens from generation config (default: use value from config)
--num-workers NUM_WORKERS
Number of worker threads (for server scenario)
--max-concurrency MAX_CONCURRENCY
Maximum concurrent requests to backend (SGLang handles batching internally)

```

### Evaluate the accuracy
Run `run_mlperf.py` with `--accuracy`, and then use the generated `mlperf_log_accuracy.json` to evaluate the accuracy of the run.

Example usage:
```bash
# Using Parquet format (recommended)
python3 eval_mlperf_accuracy.py \
--mlperf-log mlperf_results/offline/accuracy/mlperf_log_accuracy.json \
--reference-data /path/to/acc_eval_inputs.parquet \
--tokenizer openai/gpt-oss-120b

# Using Pickle format (backward compatible)
python3 eval_mlperf_accuracy.py \
--mlperf-log mlperf_results/offline/accuracy/mlperf_log_accuracy.json \
--reference-data /path/to/acc_eval_inputs.pkl \
--tokenizer openai/gpt-oss-120b
```

Full command-line options:
```bash
python3 eval_mlperf_accuracy.py --help
usage: eval_mlperf_accuracy.py [-h] --mlperf-log MLPERF_LOG --reference-data REFERENCE_DATA [--tokenizer TOKENIZER] [--output-file OUTPUT_FILE]
[--save-outputs SAVE_OUTPUTS] [--num-lcb-workers NUM_LCB_WORKERS] [--verbose]

Evaluate MLPerf accuracy logs for gpt-oss-120b

options:
-h, --help show this help message and exit
--mlperf-log MLPERF_LOG
Path to mlperf_log_accuracy.json
--reference-data REFERENCE_DATA
Path to reference parquet or pickle file (DataFrame with dataset, ground_truth, etc.)
--tokenizer TOKENIZER
HuggingFace tokenizer name or path
--output-file OUTPUT_FILE
Output JSON file for results (optional)
--save-outputs SAVE_OUTPUTS
Save detokenized outputs to pickle file (ordered by qsl_idx) for debugging
--num-lcb-workers NUM_LCB_WORKERS
Number of parallel workers for LiveCodeBench evaluation (default: 64)
--verbose Verbose logging

```
Loading