Skip to content

Commit 90c16b4

Browse files
fix: update CUDA to 12.4.1 for Blackwell GPU support (#251)
* fix: update CUDA to 12.4.1 for Blackwell GPU support - Update Dockerfile base image from CUDA 12.1.0 to 12.4.1 - Update ldconfig path to cuda-12.4 - Update FlashInfer installation to use flashinfer-python package - Add NVIDIA B200 (Blackwell) to supported gpuIds in hub.json This fixes the "imagePullAsync: failed to get self-hosted image registry auth" error when deploying on Blackwell GPUs (RTX PRO 6000, B200) by aligning the Docker image CUDA version with the allowedCudaVersions in hub.json. Fixes: DR-1118 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * revert: remove NVIDIA B200 from default gpuIds The gpuIds in hub.json controls default GPU selection for deployments, not GPU compatibility. The CUDA 12.4 upgrade is sufficient to enable Blackwell GPU support. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * fix: remove FlashInfer to avoid JIT compilation errors FlashInfer requires nvcc to JIT-compile CUDA kernels at runtime for new GPU architectures (like Blackwell SM 10.0). Since we use the CUDA base image without the toolkit, nvcc is not available. vLLM will use its built-in fallback sampling methods instead. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
1 parent 6f2381a commit 90c16b4

File tree

1 file changed

+4
-5
lines changed

1 file changed

+4
-5
lines changed

Dockerfile

Lines changed: 4 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,19 +1,18 @@
1-
FROM nvidia/cuda:12.1.0-base-ubuntu22.04
1+
FROM nvidia/cuda:12.4.1-base-ubuntu22.04
22

33
RUN apt-get update -y \
44
&& apt-get install -y python3-pip
55

6-
RUN ldconfig /usr/local/cuda-12.1/compat/
6+
RUN ldconfig /usr/local/cuda-12.4/compat/
77

88
# Install Python dependencies
99
COPY builder/requirements.txt /requirements.txt
1010
RUN --mount=type=cache,target=/root/.cache/pip \
1111
python3 -m pip install --upgrade pip && \
1212
python3 -m pip install --upgrade -r /requirements.txt
1313

14-
# Install vLLM (switching back to pip installs since issues that required building fork are fixed and space optimization is not as important since caching) and FlashInfer
15-
RUN python3 -m pip install vllm==0.11.0 && \
16-
python3 -m pip install flashinfer -i https://flashinfer.ai/whl/cu121/torch2.3
14+
# Install vLLM
15+
RUN python3 -m pip install vllm==0.11.0
1716

1817
# Setup for Option 2: Building the Image with the Model included
1918
ARG MODEL_NAME=""

0 commit comments

Comments
 (0)