Commit 90c16b4
fix: update CUDA to 12.4.1 for Blackwell GPU support (#251)
* fix: update CUDA to 12.4.1 for Blackwell GPU support
- Update Dockerfile base image from CUDA 12.1.0 to 12.4.1
- Update ldconfig path to cuda-12.4
- Update FlashInfer installation to use flashinfer-python package
- Add NVIDIA B200 (Blackwell) to supported gpuIds in hub.json
This fixes the "imagePullAsync: failed to get self-hosted image registry auth"
error when deploying on Blackwell GPUs (RTX PRO 6000, B200) by aligning
the Docker image CUDA version with the allowedCudaVersions in hub.json.
Fixes: DR-1118
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* revert: remove NVIDIA B200 from default gpuIds
The gpuIds in hub.json controls default GPU selection for deployments,
not GPU compatibility. The CUDA 12.4 upgrade is sufficient to enable
Blackwell GPU support.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* fix: remove FlashInfer to avoid JIT compilation errors
FlashInfer requires nvcc to JIT-compile CUDA kernels at runtime for
new GPU architectures (like Blackwell SM 10.0). Since we use the CUDA
base image without the toolkit, nvcc is not available.
vLLM will use its built-in fallback sampling methods instead.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>1 parent 6f2381a commit 90c16b4
1 file changed
+4
-5
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | | - | |
| 1 | + | |
2 | 2 | | |
3 | 3 | | |
4 | 4 | | |
5 | 5 | | |
6 | | - | |
| 6 | + | |
7 | 7 | | |
8 | 8 | | |
9 | 9 | | |
10 | 10 | | |
11 | 11 | | |
12 | 12 | | |
13 | 13 | | |
14 | | - | |
15 | | - | |
16 | | - | |
| 14 | + | |
| 15 | + | |
17 | 16 | | |
18 | 17 | | |
19 | 18 | | |
| |||
0 commit comments