Skip to content

Conversation

@v-shobhit
Copy link
Contributor

Modern LLM evaluation techniques use 'repeating' a sample multiple times to judge model accuracy. This MR introduces a new config param, repeats_per_sample, in which each sample is sent multiple times independantly. in PerformanceOnly mode, it is asserted that repeats_per_sample==1. In AccuracyOnly mode, the logging happens in mlperf_log_accuracy.json as follows: (with repeats_per_sample=5)

[
  { "seq_id": 1, "qsl_idx": 0, "repeat_idx": 0, "data": "...", "token_count": 100 },
  { "seq_id": 2, "qsl_idx": 0, "repeat_idx": 1, "data": "...", "token_count": 105 },
  { "seq_id": 3, "qsl_idx": 0, "repeat_idx": 2, "data": "...", "token_count": 98 },
  { "seq_id": 4, "qsl_idx": 0, "repeat_idx": 3, "data": "...", "token_count": 110 },
  { "seq_id": 5, "qsl_idx": 0, "repeat_idx": 4, "data": "...", "token_count": 102 },
  { "seq_id": 6, "qsl_idx": 1, "repeat_idx": 0, "data": "...", "token_count": 150 },
  ...
]

To use, the user can add a line in user.conf. Default is 1 (no repeats)

# For all scenarios (use wildcard *)
model-name.*.repeats_per_sample = 5

# Or for specific scenario only
model-name.Offline.repeats_per_sample = 5
model-name.Server.repeats_per_sample = 5

@v-shobhit v-shobhit requested a review from a team as a code owner December 1, 2025 18:44
@github-actions
Copy link
Contributor

github-actions bot commented Dec 1, 2025

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants