Skip to content

Commit 23a3584

Browse files
hmellorvadiklyutiy
authored andcommitted
[Docs] Switch to better markdown linting pre-commit hook (vllm-project#21851)
Signed-off-by: Harry Mellor <[email protected]>
1 parent 949801e commit 23a3584

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

55 files changed

+274
-199
lines changed

.buildkite/nightly-benchmarks/README.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,7 @@ See [vLLM performance dashboard](https://perf.vllm.ai) for the latest performanc
2828
## Trigger the benchmark
2929

3030
Performance benchmark will be triggered when:
31+
3132
- A PR being merged into vllm.
3233
- Every commit for those PRs with `perf-benchmarks` label AND `ready` label.
3334

@@ -38,6 +39,7 @@ bash .buildkite/nightly-benchmarks/scripts/run-performance-benchmarks.sh
3839
```
3940

4041
Runtime environment variables:
42+
4143
- `ON_CPU`: set the value to '1' on Intel® Xeon® Processors. Default value is 0.
4244
- `SERVING_JSON`: JSON file to use for the serving tests. Default value is empty string (use default file).
4345
- `LATENCY_JSON`: JSON file to use for the latency tests. Default value is empty string (use default file).
@@ -46,12 +48,14 @@ Runtime environment variables:
4648
- `REMOTE_PORT`: Port for the remote vLLM service to benchmark. Default value is empty string.
4749

4850
Nightly benchmark will be triggered when:
51+
4952
- Every commit for those PRs with `perf-benchmarks` label and `nightly-benchmarks` label.
5053

5154
## Performance benchmark details
5255

5356
See [performance-benchmarks-descriptions.md](performance-benchmarks-descriptions.md) for detailed descriptions, and use `tests/latency-tests.json`, `tests/throughput-tests.json`, `tests/serving-tests.json` to configure the test cases.
5457
> NOTE: For Intel® Xeon® Processors, use `tests/latency-tests-cpu.json`, `tests/throughput-tests-cpu.json`, `tests/serving-tests-cpu.json` instead.
58+
>
5559
### Latency test
5660

5761
Here is an example of one test inside `latency-tests.json`:
@@ -149,6 +153,7 @@ Here is an example using the script to compare result_a and result_b without det
149153

150154
Here is an example using the script to compare result_a and result_b with detail test name.
151155
`python3 compare-json-results.py -f results_a/benchmark_results.json -f results_b/benchmark_results.json`
156+
152157
| | results_a/benchmark_results.json_name | results_a/benchmark_results.json | results_b/benchmark_results.json_name | results_b/benchmark_results.json | perf_ratio |
153158
|---|---------------------------------------------|----------------------------------------|---------------------------------------------|----------------------------------------|----------|
154159
| 0 | serving_llama8B_tp1_sharegpt_qps_1 | 142.633982 | serving_llama8B_tp1_sharegpt_qps_1 | 156.526018 | 1.097396 |
Lines changed: 11 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,4 @@
1+
# Nightly benchmark annotation
12

23
## Description
34

@@ -13,15 +14,15 @@ Please download the visualization scripts in the post
1314

1415
- Find the docker we use in `benchmarking pipeline`
1516
- Deploy the docker, and inside the docker:
16-
- Download `nightly-benchmarks.zip`.
17-
- In the same folder, run the following code:
18-
19-
```bash
20-
export HF_TOKEN=<your HF token>
21-
apt update
22-
apt install -y git
23-
unzip nightly-benchmarks.zip
24-
VLLM_SOURCE_CODE_LOC=./ bash .buildkite/nightly-benchmarks/scripts/run-nightly-benchmarks.sh
25-
```
17+
- Download `nightly-benchmarks.zip`.
18+
- In the same folder, run the following code:
19+
20+
```bash
21+
export HF_TOKEN=<your HF token>
22+
apt update
23+
apt install -y git
24+
unzip nightly-benchmarks.zip
25+
VLLM_SOURCE_CODE_LOC=./ bash .buildkite/nightly-benchmarks/scripts/run-nightly-benchmarks.sh
26+
```
2627

2728
And the results will be inside `./benchmarks/results`.

.buildkite/nightly-benchmarks/nightly-descriptions.md

Lines changed: 17 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -13,25 +13,25 @@ Latest reproduction guilde: [github issue link](https://github.com/vllm-project/
1313
## Setup
1414

1515
- Docker images:
16-
- vLLM: `vllm/vllm-openai:v0.6.2`
17-
- SGLang: `lmsysorg/sglang:v0.3.2-cu121`
18-
- LMDeploy: `openmmlab/lmdeploy:v0.6.1-cu12`
19-
- TensorRT-LLM: `nvcr.io/nvidia/tritonserver:24.07-trtllm-python-py3`
20-
- *NOTE: we uses r24.07 as the current implementation only works for this version. We are going to bump this up.*
21-
- Check [nightly-pipeline.yaml](nightly-pipeline.yaml) for the concrete docker images, specs and commands we use for the benchmark.
16+
- vLLM: `vllm/vllm-openai:v0.6.2`
17+
- SGLang: `lmsysorg/sglang:v0.3.2-cu121`
18+
- LMDeploy: `openmmlab/lmdeploy:v0.6.1-cu12`
19+
- TensorRT-LLM: `nvcr.io/nvidia/tritonserver:24.07-trtllm-python-py3`
20+
- *NOTE: we uses r24.07 as the current implementation only works for this version. We are going to bump this up.*
21+
- Check [nightly-pipeline.yaml](nightly-pipeline.yaml) for the concrete docker images, specs and commands we use for the benchmark.
2222
- Hardware
23-
- 8x Nvidia A100 GPUs
23+
- 8x Nvidia A100 GPUs
2424
- Workload:
25-
- Dataset
26-
- ShareGPT dataset
27-
- Prefill-heavy dataset (in average 462 input tokens, 16 tokens as output)
28-
- Decode-heavy dataset (in average 462 input tokens, 256 output tokens)
29-
- Check [nightly-tests.json](tests/nightly-tests.json) for the concrete configuration of datasets we use.
30-
- Models: llama-3 8B, llama-3 70B.
31-
- We do not use llama 3.1 as it is incompatible with trt-llm r24.07. ([issue](https://github.com/NVIDIA/TensorRT-LLM/issues/2105)).
32-
- Average QPS (query per second): 2, 4, 8, 16, 32 and inf.
33-
- Queries are randomly sampled, and arrival patterns are determined via Poisson process, but all with fixed random seed.
34-
- Evaluation metrics: Throughput (higher the better), TTFT (time to the first token, lower the better), ITL (inter-token latency, lower the better).
25+
- Dataset
26+
- ShareGPT dataset
27+
- Prefill-heavy dataset (in average 462 input tokens, 16 tokens as output)
28+
- Decode-heavy dataset (in average 462 input tokens, 256 output tokens)
29+
- Check [nightly-tests.json](tests/nightly-tests.json) for the concrete configuration of datasets we use.
30+
- Models: llama-3 8B, llama-3 70B.
31+
- We do not use llama 3.1 as it is incompatible with trt-llm r24.07. ([issue](https://github.com/NVIDIA/TensorRT-LLM/issues/2105)).
32+
- Average QPS (query per second): 2, 4, 8, 16, 32 and inf.
33+
- Queries are randomly sampled, and arrival patterns are determined via Poisson process, but all with fixed random seed.
34+
- Evaluation metrics: Throughput (higher the better), TTFT (time to the first token, lower the better), ITL (inter-token latency, lower the better).
3535

3636
## Known issues
3737

.buildkite/nightly-benchmarks/performance-benchmarks-descriptions.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,4 @@
1+
# Performance benchmarks descriptions
12

23
## Latency tests
34

.github/PULL_REQUEST_TEMPLATE.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
1-
## Essential Elements of an Effective PR Description Checklist
1+
# Essential Elements of an Effective PR Description Checklist
2+
23
- [ ] The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
34
- [ ] The test plan, such as providing test command.
45
- [ ] The test results, such as pasting the results comparison before and after, or e2e results
@@ -14,5 +15,4 @@ PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS ABOVE HAVE B
1415

1516
## (Optional) Documentation Update
1617

17-
<!--- pyml disable-next-line no-emphasis-as-heading -->
1818
**BEFORE SUBMITTING, PLEASE READ <https://docs.vllm.ai/en/latest/contributing>** (anything written below this line will be removed by GitHub Actions)

.markdownlint.yaml

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
MD007:
2+
indent: 4
3+
MD013: false
4+
MD024:
5+
siblings_only: true
6+
MD033: false
7+
MD042: false
8+
MD045: false
9+
MD046: false
10+
MD051: false
11+
MD052: false
12+
MD053: false
13+
MD059: false

.pre-commit-config.yaml

Lines changed: 3 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -35,12 +35,11 @@ repos:
3535
exclude: 'csrc/(moe/topk_softmax_kernels.cu|quantization/gguf/(ggml-common.h|dequantize.cuh|vecdotq.cuh|mmq.cuh|mmvq.cuh))|vllm/third_party/.*'
3636
types_or: [c++, cuda]
3737
args: [--style=file, --verbose]
38-
- repo: https://github.com/jackdewinter/pymarkdown
39-
rev: v0.9.29
38+
- repo: https://github.com/igorshubovych/markdownlint-cli
39+
rev: v0.45.0
4040
hooks:
41-
- id: pymarkdown
41+
- id: markdownlint-fix
4242
exclude: '.*\.inc\.md'
43-
args: [fix]
4443
- repo: https://github.com/rhysd/actionlint
4544
rev: v1.7.7
4645
hooks:

README.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,4 @@
1+
<!-- markdownlint-disable MD001 MD041 -->
12
<p align="center">
23
<picture>
34
<source media="(prefers-color-scheme: dark)" srcset="https://raw.githubusercontent.com/vllm-project/vllm/main/docs/assets/logos/vllm-logo-text-dark.png">
@@ -16,6 +17,7 @@ Easy, fast, and cheap LLM serving for everyone
1617
---
1718

1819
*Latest News* 🔥
20+
1921
- [2025/05] We hosted [NYC vLLM Meetup](https://lu.ma/c1rqyf1f)! Please find the meetup slides [here](https://docs.google.com/presentation/d/1_q_aW_ioMJWUImf1s1YM-ZhjXz8cUeL0IJvaquOYBeA/edit?usp=sharing).
2022
- [2025/05] vLLM is now a hosted project under PyTorch Foundation! Please find the announcement [here](https://pytorch.org/blog/pytorch-foundation-welcomes-vllm/).
2123
- [2025/04] We hosted [Asia Developer Day](https://www.sginnovate.com/event/limited-availability-morning-evening-slots-remaining-inaugural-vllm-asia-developer-day)! Please find the meetup slides from the vLLM team [here](https://docs.google.com/presentation/d/19cp6Qu8u48ihB91A064XfaXruNYiBOUKrBxAmDOllOo/edit?usp=sharing).
@@ -46,6 +48,7 @@ Easy, fast, and cheap LLM serving for everyone
4648
</details>
4749

4850
---
51+
4952
## About
5053

5154
vLLM is a fast and easy-to-use library for LLM inference and serving.
@@ -75,6 +78,7 @@ vLLM is flexible and easy to use with:
7578
- Multi-LoRA support
7679

7780
vLLM seamlessly supports most popular open-source models on HuggingFace, including:
81+
7882
- Transformer-like LLMs (e.g., Llama)
7983
- Mixture-of-Expert LLMs (e.g., Mixtral, Deepseek-V2 and V3)
8084
- Embedding Models (e.g., E5-Mistral)
@@ -91,6 +95,7 @@ pip install vllm
9195
```
9296

9397
Visit our [documentation](https://docs.vllm.ai/en/latest/) to learn more.
98+
9499
- [Installation](https://docs.vllm.ai/en/latest/getting_started/installation.html)
95100
- [Quickstart](https://docs.vllm.ai/en/latest/getting_started/quickstart.html)
96101
- [List of Supported Models](https://docs.vllm.ai/en/latest/models/supported_models.html)
@@ -107,13 +112,15 @@ vLLM is a community project. Our compute resources for development and testing a
107112
<!-- Note: Please sort them in alphabetical order. -->
108113
<!-- Note: Please keep these consistent with docs/community/sponsors.md -->
109114
Cash Donations:
115+
110116
- a16z
111117
- Dropbox
112118
- Sequoia Capital
113119
- Skywork AI
114120
- ZhenFund
115121

116122
Compute Resources:
123+
117124
- AMD
118125
- Anyscale
119126
- AWS

RELEASE.md

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -60,9 +60,10 @@ Please note: **No feature work allowed for cherry picks**. All PRs that are cons
6060
Before each release, we perform end-to-end performance validation to ensure no regressions are introduced. This validation uses the [vllm-benchmark workflow](https://github.com/pytorch/pytorch-integration-testing/actions/workflows/vllm-benchmark.yml) on PyTorch CI.
6161

6262
**Current Coverage:**
63+
6364
* Models: Llama3, Llama4, and Mixtral
6465
* Hardware: NVIDIA H100 and AMD MI300x
65-
* *Note: Coverage may change based on new model releases and hardware availability*
66+
* _Note: Coverage may change based on new model releases and hardware availability_
6667

6768
**Performance Validation Process:**
6869

@@ -71,11 +72,13 @@ Request write access to the [pytorch/pytorch-integration-testing](https://github
7172

7273
**Step 2: Review Benchmark Setup**
7374
Familiarize yourself with the benchmark configurations:
75+
7476
* [CUDA setup](https://github.com/pytorch/pytorch-integration-testing/tree/main/vllm-benchmarks/benchmarks/cuda)
7577
* [ROCm setup](https://github.com/pytorch/pytorch-integration-testing/tree/main/vllm-benchmarks/benchmarks/rocm)
7678

7779
**Step 3: Run the Benchmark**
7880
Navigate to the [vllm-benchmark workflow](https://github.com/pytorch/pytorch-integration-testing/actions/workflows/vllm-benchmark.yml) and configure:
81+
7982
* **vLLM branch**: Set to the release branch (e.g., `releases/v0.9.2`)
8083
* **vLLM commit**: Set to the RC commit hash
8184

0 commit comments

Comments
 (0)