You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
-`REMOTE_PORT`: Port for the remote vLLM service to benchmark. Default value is empty string.
47
49
48
50
Nightly benchmark will be triggered when:
51
+
49
52
- Every commit for those PRs with `perf-benchmarks` label and `nightly-benchmarks` label.
50
53
51
54
## Performance benchmark details
52
55
53
56
See [performance-benchmarks-descriptions.md](performance-benchmarks-descriptions.md) for detailed descriptions, and use `tests/latency-tests.json`, `tests/throughput-tests.json`, `tests/serving-tests.json` to configure the test cases.
54
57
> NOTE: For Intel® Xeon® Processors, use `tests/latency-tests-cpu.json`, `tests/throughput-tests-cpu.json`, `tests/serving-tests-cpu.json` instead.
58
+
>
55
59
### Latency test
56
60
57
61
Here is an example of one test inside `latency-tests.json`:
@@ -149,6 +153,7 @@ Here is an example using the script to compare result_a and result_b without det
149
153
150
154
Here is an example using the script to compare result_a and result_b with detail test name.
@@ -16,6 +17,7 @@ Easy, fast, and cheap LLM serving for everyone
16
17
---
17
18
18
19
*Latest News* 🔥
20
+
19
21
-[2025/05] We hosted [NYC vLLM Meetup](https://lu.ma/c1rqyf1f)! Please find the meetup slides [here](https://docs.google.com/presentation/d/1_q_aW_ioMJWUImf1s1YM-ZhjXz8cUeL0IJvaquOYBeA/edit?usp=sharing).
20
22
-[2025/05] vLLM is now a hosted project under PyTorch Foundation! Please find the announcement [here](https://pytorch.org/blog/pytorch-foundation-welcomes-vllm/).
21
23
-[2025/04] We hosted [Asia Developer Day](https://www.sginnovate.com/event/limited-availability-morning-evening-slots-remaining-inaugural-vllm-asia-developer-day)! Please find the meetup slides from the vLLM team [here](https://docs.google.com/presentation/d/19cp6Qu8u48ihB91A064XfaXruNYiBOUKrBxAmDOllOo/edit?usp=sharing).
@@ -46,6 +48,7 @@ Easy, fast, and cheap LLM serving for everyone
46
48
</details>
47
49
48
50
---
51
+
49
52
## About
50
53
51
54
vLLM is a fast and easy-to-use library for LLM inference and serving.
@@ -75,6 +78,7 @@ vLLM is flexible and easy to use with:
75
78
- Multi-LoRA support
76
79
77
80
vLLM seamlessly supports most popular open-source models on HuggingFace, including:
81
+
78
82
- Transformer-like LLMs (e.g., Llama)
79
83
- Mixture-of-Expert LLMs (e.g., Mixtral, Deepseek-V2 and V3)
80
84
- Embedding Models (e.g., E5-Mistral)
@@ -91,6 +95,7 @@ pip install vllm
91
95
```
92
96
93
97
Visit our [documentation](https://docs.vllm.ai/en/latest/) to learn more.
Copy file name to clipboardExpand all lines: RELEASE.md
+4-1Lines changed: 4 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -60,9 +60,10 @@ Please note: **No feature work allowed for cherry picks**. All PRs that are cons
60
60
Before each release, we perform end-to-end performance validation to ensure no regressions are introduced. This validation uses the [vllm-benchmark workflow](https://github.com/pytorch/pytorch-integration-testing/actions/workflows/vllm-benchmark.yml) on PyTorch CI.
61
61
62
62
**Current Coverage:**
63
+
63
64
* Models: Llama3, Llama4, and Mixtral
64
65
* Hardware: NVIDIA H100 and AMD MI300x
65
-
**Note: Coverage may change based on new model releases and hardware availability*
66
+
*_Note: Coverage may change based on new model releases and hardware availability_
66
67
67
68
**Performance Validation Process:**
68
69
@@ -71,11 +72,13 @@ Request write access to the [pytorch/pytorch-integration-testing](https://github
71
72
72
73
**Step 2: Review Benchmark Setup**
73
74
Familiarize yourself with the benchmark configurations:
0 commit comments