Skip to content

[Performance]: vLLM v0.15.0 throughput regression compared to ROCm vLLM v0.14.0 #36454

@Spurthi-Bhat-ScalersAI

Description

@Spurthi-Bhat-ScalersAI

Proposal to improve performance

No response

Report of performance regression

Performance observations were conducted for vLLM v0.15.0 in comparison with the ROCm forked vLLM v0.14.0 (now deprecated). Testing was executed on a server equipped with 8× AMD Instinct MI300X GPUs.

Benchmarking was performed using the vLLM bench utility across eight Docker-based vLLM serving instances of the model Qwen3-30B-A3B-Thinking-2507. Traffic distribution across the serving instances was handled through nginx load balancing.

The command executed is shown below:
HF_HOME=/mnt/models HUGGINGFACE_HUB_CACHE=/mnt/models/hub TRANSFORMERS_CACHE=/mnt/models/hub vllm bench serve --backend vllm --model Qwen/Qwen3-30B-A3B-Thinking-2507 --tokenizer /mnt/models/hub/models--Qwen--Qwen3-30B-A3B-Thinking-2507/snapshots/144afc2f379b542fdd4e85a1fcd5e1f79112d95d --host localhost --port 8000 --endpoint /v1/completions --dataset-name sharegpt --dataset-path ./ShareGPT_V3_unfiltered_cleaned_split.json --output-len 128 --num-prompts 4096 --max-concurrency 4096 2>&1 | tee "$LOG"

Benchmark results are attached along with the Docker Compose configuration used for testing. The configuration remained identical across both runs, with the container image name representing the only modification.

v0.14.0.txt

v0.15.0.txt

docker-compose.yaml

Misc discussion on performance

Benchmark results indicate a performance difference between vLLM v0.15.0 and the ROCm forked v0.14.0. Under the tested configuration, v0.14.0 demonstrates higher throughput, achieving approximately 1.34× greater request throughput and output token throughput compared to v0.15.0.

However, v0.15.0 shows slightly lower time-to-first-token (TTFT) compared to v0.14.0.

Also this difference widens in comparison to even some of the older versions of rocm/vllm.

Your current environment (if you think it is necessary)

The output of `python collect_env.py`

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    performancePerformance-related issuesrocmRelated to AMD ROCm

    Type

    No type

    Projects

    Status

    Todo

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions