Skip to content

Commit 37f86d9

Browse files
authored
[Docs] use uv in GPU installation docs (#20277)
Signed-off-by: David Xia <[email protected]>
1 parent 58b11b2 commit 37f86d9

File tree

1 file changed

+44
-40
lines changed

1 file changed

+44
-40
lines changed

docs/getting_started/installation/gpu/cuda.inc.md

Lines changed: 44 additions & 40 deletions
Original file line numberDiff line numberDiff line change
@@ -20,16 +20,16 @@ Therefore, it is recommended to install vLLM with a **fresh new** environment. I
2020
# --8<-- [end:set-up-using-python]
2121
# --8<-- [start:pre-built-wheels]
2222

23-
You can install vLLM using either `pip` or `uv pip`:
24-
2523
```bash
26-
# Install vLLM with CUDA 12.8.
27-
# If you are using pip.
28-
pip install vllm --extra-index-url https://download.pytorch.org/whl/cu128
29-
# If you are using uv.
3024
uv pip install vllm --torch-backend=auto
3125
```
3226

27+
??? console "pip"
28+
```bash
29+
# Install vLLM with CUDA 12.8.
30+
pip install vllm --extra-index-url https://download.pytorch.org/whl/cu128
31+
```
32+
3333
We recommend leveraging `uv` to [automatically select the appropriate PyTorch index at runtime](https://docs.astral.sh/uv/guides/integration/pytorch/#automatic-backend-selection) by inspecting the installed CUDA driver version via `--torch-backend=auto` (or `UV_TORCH_BACKEND=auto`). To select a specific backend (e.g., `cu126`), set `--torch-backend=cu126` (or `UV_TORCH_BACKEND=cu126`). If this doesn't work, try running `uv self update` to update `uv` first.
3434

3535
!!! note
@@ -50,36 +50,22 @@ uv pip install https://github.com/vllm-project/vllm/releases/download/v${VLLM_VE
5050

5151
LLM inference is a fast-evolving field, and the latest code may contain bug fixes, performance improvements, and new features that are not released yet. To allow users to try the latest code without waiting for the next release, vLLM provides wheels for Linux running on a x86 platform with CUDA 12 for every commit since `v0.5.3`.
5252

53-
##### Install the latest code using `pip`
54-
55-
```bash
56-
pip install -U vllm \
57-
--pre \
58-
--extra-index-url https://wheels.vllm.ai/nightly
59-
```
60-
61-
`--pre` is required for `pip` to consider pre-released versions.
62-
63-
Another way to install the latest code is to use `uv`:
64-
6553
```bash
6654
uv pip install -U vllm \
6755
--torch-backend=auto \
6856
--extra-index-url https://wheels.vllm.ai/nightly
6957
```
7058

71-
##### Install specific revisions using `pip`
59+
??? console "pip"
60+
```bash
61+
pip install -U vllm \
62+
--pre \
63+
--extra-index-url https://wheels.vllm.ai/nightly
64+
```
7265

73-
If you want to access the wheels for previous commits (e.g. to bisect the behavior change, performance regression), due to the limitation of `pip`, you have to specify the full URL of the wheel file by embedding the commit hash in the URL:
66+
`--pre` is required for `pip` to consider pre-released versions.
7467

75-
```bash
76-
export VLLM_COMMIT=33f460b17a54acb3b6cc0b03f4a17876cff5eafd # use full commit hash from the main branch
77-
pip install https://wheels.vllm.ai/${VLLM_COMMIT}/vllm-1.0.0.dev-cp38-abi3-manylinux1_x86_64.whl
78-
```
79-
80-
Note that the wheels are built with Python 3.8 ABI (see [PEP 425](https://peps.python.org/pep-0425/) for more details about ABI), so **they are compatible with Python 3.8 and later**. The version string in the wheel file name (`1.0.0.dev`) is just a placeholder to have a unified URL for the wheels, the actual versions of wheels are contained in the wheel metadata (the wheels listed in the extra index url have correct versions). Although we don't support Python 3.8 any more (because PyTorch 2.5 dropped support for Python 3.8), the wheels are still built with Python 3.8 ABI to keep the same wheel name as before.
81-
82-
##### Install specific revisions using `uv`
68+
##### Install specific revisions
8369

8470
If you want to access the wheels for previous commits (e.g. to bisect the behavior change, performance regression), you can specify the commit hash in the URL:
8571

@@ -92,17 +78,35 @@ uv pip install vllm \
9278

9379
The `uv` approach works for vLLM `v0.6.6` and later and offers an easy-to-remember command. A unique feature of `uv` is that packages in `--extra-index-url` have [higher priority than the default index](https://docs.astral.sh/uv/pip/compatibility/#packages-that-exist-on-multiple-indexes). If the latest public release is `v0.6.6.post1`, `uv`'s behavior allows installing a commit before `v0.6.6.post1` by specifying the `--extra-index-url`. In contrast, `pip` combines packages from `--extra-index-url` and the default index, choosing only the latest version, which makes it difficult to install a development version prior to the released version.
9480

81+
??? note "pip"
82+
If you want to access the wheels for previous commits (e.g. to bisect the behavior change,
83+
performance regression), due to the limitation of `pip`, you have to specify the full URL of the
84+
wheel file by embedding the commit hash in the URL:
85+
86+
```bash
87+
export VLLM_COMMIT=33f460b17a54acb3b6cc0b03f4a17876cff5eafd # use full commit hash from the main branch
88+
pip install https://wheels.vllm.ai/${VLLM_COMMIT}/vllm-1.0.0.dev-cp38-abi3-manylinux1_x86_64.whl
89+
```
90+
91+
Note that the wheels are built with Python 3.8 ABI (see [PEP
92+
425](https://peps.python.org/pep-0425/) for more details about ABI), so **they are compatible
93+
with Python 3.8 and later**. The version string in the wheel file name (`1.0.0.dev`) is just a
94+
placeholder to have a unified URL for the wheels, the actual versions of wheels are contained in
95+
the wheel metadata (the wheels listed in the extra index url have correct versions). Although we
96+
don't support Python 3.8 any more (because PyTorch 2.5 dropped support for Python 3.8), the
97+
wheels are still built with Python 3.8 ABI to keep the same wheel name as before.
98+
9599
# --8<-- [end:pre-built-wheels]
96100
# --8<-- [start:build-wheel-from-source]
97101

98102
#### Set up using Python-only build (without compilation)
99103

100-
If you only need to change Python code, you can build and install vLLM without compilation. Using `pip`'s [`--editable` flag](https://pip.pypa.io/en/stable/topics/local-project-installs/#editable-installs), changes you make to the code will be reflected when you run vLLM:
104+
If you only need to change Python code, you can build and install vLLM without compilation. Using `uv pip`'s [`--editable` flag](https://docs.astral.sh/uv/pip/packages/#editable-packages), changes you make to the code will be reflected when you run vLLM:
101105

102106
```bash
103107
git clone https://github.com/vllm-project/vllm.git
104108
cd vllm
105-
VLLM_USE_PRECOMPILED=1 pip install --editable .
109+
VLLM_USE_PRECOMPILED=1 uv pip install --editable .
106110
```
107111

108112
This command will do the following:
@@ -121,7 +125,7 @@ In case you see an error about wheel not found when running the above command, i
121125
```bash
122126
export VLLM_COMMIT=72d9c316d3f6ede485146fe5aabd4e61dbc59069 # use full commit hash from the main branch
123127
export VLLM_PRECOMPILED_WHEEL_LOCATION=https://wheels.vllm.ai/${VLLM_COMMIT}/vllm-1.0.0.dev-cp38-abi3-manylinux1_x86_64.whl
124-
pip install --editable .
128+
uv pip install --editable .
125129
```
126130

127131
You can find more information about vLLM's wheels in [install-the-latest-code][install-the-latest-code].
@@ -137,7 +141,7 @@ If you want to modify C++ or CUDA code, you'll need to build vLLM from source. T
137141
```bash
138142
git clone https://github.com/vllm-project/vllm.git
139143
cd vllm
140-
pip install -e .
144+
uv pip install -e .
141145
```
142146

143147
!!! tip
@@ -152,23 +156,23 @@ pip install -e .
152156
The following environment variables can be set to configure the vLLM `sccache` remote: `SCCACHE_BUCKET=vllm-build-sccache SCCACHE_REGION=us-west-2 SCCACHE_S3_NO_CREDENTIALS=1`. We also recommend setting `SCCACHE_IDLE_TIMEOUT=0`.
153157

154158
!!! note "Faster Kernel Development"
155-
For frequent C++/CUDA kernel changes, after the initial `pip install -e .` setup, consider using the [Incremental Compilation Workflow](../../contributing/incremental_build.md) for significantly faster rebuilds of only the modified kernel code.
159+
For frequent C++/CUDA kernel changes, after the initial `uv pip install -e .` setup, consider using the [Incremental Compilation Workflow](../../contributing/incremental_build.md) for significantly faster rebuilds of only the modified kernel code.
156160

157161
##### Use an existing PyTorch installation
158162

159-
There are scenarios where the PyTorch dependency cannot be easily installed via pip, e.g.:
163+
There are scenarios where the PyTorch dependency cannot be easily installed with `uv`, e.g.:
160164

161165
- Building vLLM with PyTorch nightly or a custom PyTorch build.
162-
- Building vLLM with aarch64 and CUDA (GH200), where the PyTorch wheels are not available on PyPI. Currently, only the PyTorch nightly has wheels for aarch64 with CUDA. You can run `pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu124` to [install PyTorch nightly](https://pytorch.org/get-started/locally/), and then build vLLM on top of it.
166+
- Building vLLM with aarch64 and CUDA (GH200), where the PyTorch wheels are not available on PyPI. Currently, only the PyTorch nightly has wheels for aarch64 with CUDA. You can run `uv pip install --index-url https://download.pytorch.org/whl/nightly/cu128 torch torchvision torchaudio` to [install PyTorch nightly](https://pytorch.org/get-started/locally/) and then build vLLM on top of it.
163167

164168
To build vLLM using an existing PyTorch installation:
165169

166170
```bash
167171
git clone https://github.com/vllm-project/vllm.git
168172
cd vllm
169173
python use_existing_torch.py
170-
pip install -r requirements/build.txt
171-
pip install --no-build-isolation -e .
174+
uv pip install -r requirements/build.txt
175+
uv pip install --no-build-isolation -e .
172176
```
173177

174178
##### Use the local cutlass for compilation
@@ -179,7 +183,7 @@ To achieve this, you can set the environment variable VLLM_CUTLASS_SRC_DIR to po
179183
```bash
180184
git clone https://github.com/vllm-project/vllm.git
181185
cd vllm
182-
VLLM_CUTLASS_SRC_DIR=/path/to/cutlass pip install -e .
186+
VLLM_CUTLASS_SRC_DIR=/path/to/cutlass uv pip install -e .
183187
```
184188

185189
##### Troubleshooting
@@ -189,7 +193,7 @@ to be run simultaneously, via the environment variable `MAX_JOBS`. For example:
189193

190194
```bash
191195
export MAX_JOBS=6
192-
pip install -e .
196+
uv pip install -e .
193197
```
194198

195199
This is especially useful when you are building on less powerful machines. For example, when you use WSL it only [assigns 50% of the total memory by default](https://learn.microsoft.com/en-us/windows/wsl/wsl-config#main-wsl-settings), so using `export MAX_JOBS=1` can avoid compiling multiple files simultaneously and running out of memory.
@@ -228,7 +232,7 @@ Simply disable the `VLLM_TARGET_DEVICE` environment variable before installing:
228232

229233
```bash
230234
export VLLM_TARGET_DEVICE=empty
231-
pip install -e .
235+
uv pip install -e .
232236
```
233237

234238
# --8<-- [end:build-wheel-from-source]

0 commit comments

Comments
 (0)