Skip to content

Conversation

@mrzzy
Copy link

@mrzzy mrzzy commented Nov 15, 2025

Motivation

The LLaMA-2-70B benchmark (Offline Scenario) currently does not have multinode support.

Contents

This PR adds multinode inference support to LLaMA-2-70B benchmark (Offline Scenario) by enabling SUT_API.py to issue requests to multiple OpenAI-compatible endpoints (e.g., vLLM, TensorRT-LLM) simultaneously. Prompts are (mostly) evenly partitioned across servers.

  • Multi-server API mode for SUT_API (--vllm) with even prompt distribution across multiple OpenAI-compatible endpoints.
  • Unit tests for API-related logic (query_batch and query_servers).
  • Documentation updates and example commands for multinode usage.
  • Additional dependencies specified in READMEs.

User facing Changes

Usage Example (Offline + Multinode API mode)

python3 -u main.py --scenario Offline \
    --vllm \
    --api-model-name ${MODEL_NAME} \
    --api-server http://node1:8000 \
    --api-server http://node2:8000 \
    --api-server http://node3:8000 \
    --model-path ${CHECKPOINT_PATH} \
    --user-conf user.conf \
    --total-sample-count 24576 \
    --dataset-path ${DATASET_PATH} \
    --output-log-dir offline-logs

Each --api-server argument registers an endpoint; SUT_API distributes prompts across them automatically.


mrzzy added 22 commits November 7, 2025 09:35
This reverts commit 325526f6aed7b2c18d93c19e561d743d3246a3b2.
This reverts commit b10eceddd3b9ecb824cd6d7248df8ada786ef132.
…rompt load over servers"

This reverts commit f6769548b45a5872dd20a5c0a329a005c05a8664.
@github-actions
Copy link
Contributor

MLCommons CLA bot:
Thank you very much for your submission, we really appreciate it. Before we can accept your contribution, we ask that you sign the MLCommons CLA (Apache 2). Please use this [Google form] (https://forms.gle/Ew1KkBVpyeJDuRw67) to initiate authorization. If you are from an MLCommons member organization, we will request that you be added to the CLA. If you are not from a member organization, we will email you a CLA to sign. For any questions, please contact [email protected].
0 out of 1 committers have signed the MLCommons CLA.
@mrzzy
You can retrigger this bot by commenting recheck in this Pull Request

@mrzzy mrzzy marked this pull request as ready for review November 15, 2025 03:21
@mrzzy mrzzy requested a review from a team as a code owner November 15, 2025 03:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants