fix: update CUDA to 12.4.1 for Blackwell GPU support (#251)

TimPietruskyRunPod · claude · web-flow · commit 90c16b472d83 · 2026-01-13T22:01:36.000+01:00
* fix: update CUDA to 12.4.1 for Blackwell GPU support

- Update Dockerfile base image from CUDA 12.1.0 to 12.4.1
- Update ldconfig path to cuda-12.4
- Update FlashInfer installation to use flashinfer-python package
- Add NVIDIA B200 (Blackwell) to supported gpuIds in hub.json

This fixes the "imagePullAsync: failed to get self-hosted image registry auth"
error when deploying on Blackwell GPUs (RTX PRO 6000, B200) by aligning
the Docker image CUDA version with the allowedCudaVersions in hub.json.

Fixes: DR-1118

Co-Authored-By: Claude Opus 4.5 &lt;noreply@anthropic.com&gt;

* revert: remove NVIDIA B200 from default gpuIds

The gpuIds in hub.json controls default GPU selection for deployments,
not GPU compatibility. The CUDA 12.4 upgrade is sufficient to enable
Blackwell GPU support.

Co-Authored-By: Claude Opus 4.5 &lt;noreply@anthropic.com&gt;

* fix: remove FlashInfer to avoid JIT compilation errors

FlashInfer requires nvcc to JIT-compile CUDA kernels at runtime for
new GPU architectures (like Blackwell SM 10.0). Since we use the CUDA
base image without the toolkit, nvcc is not available.

vLLM will use its built-in fallback sampling methods instead.

Co-Authored-By: Claude Opus 4.5 &lt;noreply@anthropic.com&gt;

---------

Co-authored-by: Claude Opus 4.5 &lt;noreply@anthropic.com&gt;
diff --git a/Dockerfile b/Dockerfile
@@ -1,19 +1,18 @@
-FROM nvidia/cuda:12.1.0-base-ubuntu22.04 
+FROM nvidia/cuda:12.4.1-base-ubuntu22.04 
 
 RUN apt-get update -y \
     && apt-get install -y python3-pip
 
-RUN ldconfig /usr/local/cuda-12.1/compat/
+RUN ldconfig /usr/local/cuda-12.4/compat/
 
 # Install Python dependencies
 COPY builder/requirements.txt /requirements.txt
 RUN --mount=type=cache,target=/root/.cache/pip \
     python3 -m pip install --upgrade pip && \
     python3 -m pip install --upgrade -r /requirements.txt
 
-# Install vLLM (switching back to pip installs since issues that required building fork are fixed and space optimization is not as important since caching) and FlashInfer 
-RUN python3 -m pip install vllm==0.11.0 && \
-    python3 -m pip install flashinfer -i https://flashinfer.ai/whl/cu121/torch2.3
+# Install vLLM
+RUN python3 -m pip install vllm==0.11.0
 
 # Setup for Option 2: Building the Image with the Model included
 ARG MODEL_NAME=""