Skip to content

[VLLM][ARM64] Currency Release #5154

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 18 commits into
base: master
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions dlc_developer_config.toml
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ neuronx_mode = false
graviton_mode = false
# Please only set it to true if you are preparing a ARM64 related PR
# Do remember to revert it back to false before merging any PR (including ARM64 dedicated PR)
arm64_mode = false
arm64_mode = true
# Please only set it to True if you are preparing a HABANA related PR
# Do remember to revert it back to False before merging any PR (including HABANA dedicated PR)
habana_mode = false
Expand All @@ -37,7 +37,7 @@ deep_canary_mode = false
[build]
# Add in frameworks you would like to build. By default, builds are disabled unless you specify building an image.
# available frameworks - ["base", "vllm", "autogluon", "huggingface_tensorflow", "huggingface_pytorch", "huggingface_tensorflow_trcomp", "huggingface_pytorch_trcomp", "pytorch_trcomp", "tensorflow", "pytorch", "stabilityai_pytorch"]
build_frameworks = []
build_frameworks = ["vllm"]


# By default we build both training and inference containers. Set true/false values to determine which to build.
Expand Down
2 changes: 1 addition & 1 deletion scripts/install_efa.sh
Original file line number Diff line number Diff line change
Expand Up @@ -72,7 +72,7 @@ function install_efa {
apt-get autoremove -y
rm -rf /var/lib/apt/lists/*
ldconfig
check_libnccl_net_so
# check_libnccl_net_so
}

# idiomatic parameter and option handling in sh
Expand Down
11 changes: 11 additions & 0 deletions vllm/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,17 @@

All notable changes to vLLM Deep Learning Containers will be documented in this file.

## [0.10.0] - 2025-08-04
### Updated
- vllm/vllm-openai version `v0.10.0`
- EFA installer version `1.43.1`
- Architecture ARM64
### Sample ECR URI
```
763104351884.dkr.ecr.us-east-1.amazonaws.com/0.10-gpu-py312-arm64
763104351884.dkr.ecr.us-east-1.amazonaws.com/0.10.0-gpu-py312-cu128-ubuntu22.04-arm64
```

## [0.10.0] - 2025-08-04
### Updated
- vllm/vllm-openai version `v0.10.0`
Expand Down
161 changes: 161 additions & 0 deletions vllm/arm64/gpu/Dockerfile.arm64.gpu
Original file line number Diff line number Diff line change
@@ -0,0 +1,161 @@
# Base arguments
ARG CUDA_VERSION=12.8.1
ARG PYTHON_VERSION=3.12
ARG VLLM_VERSION="v0.10.1"
ARG BUILD_BASE_IMAGE=nvidia/cuda:${CUDA_VERSION}-devel-ubuntu22.04
ARG FINAL_BASE_IMAGE=nvidia/cuda:${CUDA_VERSION}-devel-ubuntu22.04

#################### BASE STAGE ####################
FROM ${BUILD_BASE_IMAGE} AS base
ARG CUDA_VERSION
ARG PYTHON_VERSION
ARG TARGETPLATFORM=linux/arm64

ENV DEBIAN_FRONTEND=noninteractive

# Install basic dependencies
RUN apt-get update && apt-get install -y \
ccache \
software-properties-common \
git \
curl \
wget \
sudo \
vim \
ffmpeg \
libsm6 \
libxext6 \
libgl1 \
libibverbs-dev \
gcc-10 \
g++-10

# Set up GCC 10
RUN update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-10 110 --slave /usr/bin/g++ g++ /usr/bin/g++-10

# Install Python
RUN add-apt-repository -y ppa:deadsnakes/ppa \
&& apt-get update \
&& apt-get install -y python${PYTHON_VERSION} python${PYTHON_VERSION}-dev python${PYTHON_VERSION}-venv \
&& update-alternatives --install /usr/bin/python3 python3 /usr/bin/python${PYTHON_VERSION} 1 \
&& update-alternatives --set python3 /usr/bin/python${PYTHON_VERSION} \
&& ln -sf /usr/bin/python${PYTHON_VERSION}-config /usr/bin/python3-config \
&& curl -sS https://bootstrap.pypa.io/get-pip.py | python${PYTHON_VERSION} \
&& ln -s /usr/bin/python3 /usr/bin/python

RUN python3 -m pip install \
--index-url https://download.pytorch.org/whl/nightly/cu$(echo $CUDA_VERSION | cut -d. -f1,2 | tr -d '.') \
--pre \
torch \
torchvision \
pytorch_triton;

RUN python3 -c "import torch; print(f'PyTorch {torch.__version__}, CUDA available: {torch.cuda.is_available()}')"


RUN ldconfig /usr/local/cuda-$(echo $CUDA_VERSION | cut -d. -f1,2)/compat/

#################### WHEEL BUILD STAGE ####################
FROM base AS wheel
ARG VLLM_VERSION

WORKDIR /workspace
RUN git clone https://github.com/vllm-project/vllm.git /vllm && \
cd /vllm && \
git checkout ${VLLM_VERSION}

# Install build dependencies
WORKDIR /vllm
COPY requirements/build.txt requirements/build.txt
RUN python3 -m pip install -r requirements/build.txt

# Build wheel
ENV TORCH_CUDA_ARCH_LIST="7.5 8.0 8.9 9.0a 10.0a 12.0"
RUN python3 setup.py bdist_wheel --dist-dir=dist --py-limited-api=cp38

#################### BUILD STAGE ####################
FROM base AS build
ARG VLLM_VERSION

# Copy wheel from wheel stage
COPY --from=wheel /vllm/dist/*.whl /tmp/

# Install vLLM wheel
RUN python3 -m pip install /tmp/*.whl

# Build FlashInfer
ARG FLASHINFER_GIT_REF="v0.2.11"
RUN git clone --depth 1 --recursive --shallow-submodules \
--branch ${FLASHINFER_GIT_REF} \
https://github.com/flashinfer-ai/flashinfer.git && \
cd flashinfer && \
TORCH_CUDA_ARCH_LIST="7.5 8.0 8.9 9.0a 10.0a 12.0" python3 -m flashinfer.aot && \
TORCH_CUDA_ARCH_LIST="7.5 8.0 8.9 9.0a 10.0a 12.0" python3 -m pip install --no-build-isolation --force-reinstall --no-deps .

#################### FINAL STAGE ####################
FROM ${FINAL_BASE_IMAGE} AS final

ARG PYTHON="python3"
ARG EFA_VERSION="1.43.1"
LABEL maintainer="Amazon AI"
LABEL dlc_major_version="1"
ENV DEBIAN_FRONTEND=noninteractive \
LANG=C.UTF-8 \
LC_ALL=C.UTF-8 \
DLC_CONTAINER_TYPE=base \
# Python won’t try to write .pyc or .pyo files on the import of source modules
# Force stdin, stdout and stderr to be totally unbuffered. Good for logging
PYTHONDONTWRITEBYTECODE=1 \
PYTHONUNBUFFERED=1 \
PYTHONIOENCODING=UTF-8 \
LD_LIBRARY_PATH="/usr/local/lib:/opt/amazon/ofi-nccl/lib/aarch64-linux-gnu:/opt/amazon/openmpi/lib:/opt/amazon/efa/lib:/usr/local/cuda/lib64:${LD_LIBRARY_PATH}" \
PATH="/opt/amazon/openmpi/bin:/opt/amazon/efa/bin:/usr/local/cuda/bin:${PATH}"

WORKDIR /

# COPY install_efa.sh install_efa.sh
# COPY deep_learning_container.py /usr/local/bin/deep_learning_container.py
# COPY bash_telemetry.sh /usr/local/bin/bash_telemetry.sh
# COPY dockerd_entrypoint.sh /usr/local/bin/dockerd_entrypoint.sh
# RUN chmod +x /usr/local/bin/deep_learning_container.py && \
# chmod +x /usr/local/bin/bash_telemetry.sh && \
# chmod +x /usr/local/bin/dockerd_entrypoint.sh && \
# echo 'source /usr/local/bin/bash_telemetry.sh' >> /etc/bash.bashrc && \
# # Install EFA
# bash install_efa.sh ${EFA_VERSION} && \
# rm install_efa.sh && \
# # OSS compliance and software update
# apt-get update && \
# apt-get upgrade -y && \
# apt-get install -y --allow-change-held-packages --no-install-recommends unzip && \
# apt-get clean && \
# HOME_DIR=/root && \
# curl -o ${HOME_DIR}/oss_compliance.zip https://aws-dlinfra-utilities.s3.amazonaws.com/oss_compliance.zip && \
# unzip ${HOME_DIR}/oss_compliance.zip -d ${HOME_DIR}/ && \
# cp ${HOME_DIR}/oss_compliance/test/testOSSCompliance /usr/local/bin/testOSSCompliance && \
# chmod +x /usr/local/bin/testOSSCompliance && \
# chmod +x ${HOME_DIR}/oss_compliance/generate_oss_compliance.sh && \
# ${HOME_DIR}/oss_compliance/generate_oss_compliance.sh ${HOME_DIR} ${PYTHON} && \
# # create symlink for python
# ln -s /usr/bin/python3 /usr/bin/python && \
# # clean up
# rm -rf ${HOME_DIR}/oss_compliance* && \
# rm -rf /tmp/tmp* && \
# rm -rf /tmp/uv* && \
# rm -rf /var/lib/apt/lists/* && \
# rm -rf /root/.cache | true

RUN mkdir -p /tmp/nvjpeg \
&& cd /tmp/nvjpeg \
&& wget https://developer.download.nvidia.com/compute/cuda/redist/libnvjpeg/linux-aarch64/libnvjpeg-linux-aarch64-12.4.0.76-archive.tar.xz \
&& tar -xvf libnvjpeg-linux-aarch64-12.4.0.76-archive.tar.xz \
&& rm -rf /usr/local/cuda/targets/sbsa-linux/lib/libnvjpeg* \
&& rm -rf /usr/local/cuda/targets/sbsa-linux/include/nvjpeg.h \
&& cp libnvjpeg-linux-aarch64-12.4.0.76-archive/lib/libnvjpeg* /usr/local/cuda/targets/sbsa-linux/lib/ \
&& cp libnvjpeg-linux-aarch64-12.4.0.76-archive/include/* /usr/local/cuda/targets/sbsa-linux/include/ \
&& rm -rf /tmp/nvjpeg \
# patch cuobjdump and nvdisasm
&& rm -rf /usr/local/cuda/bin/cuobjdump* \
&& rm -rf /usr/local/cuda/bin/nvdisasm*

ENTRYPOINT ["/usr/local/bin/dockerd_entrypoint.sh"]
9 changes: 5 additions & 4 deletions vllm/buildspec-arm64.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ account_id: &ACCOUNT_ID <set-$ACCOUNT_ID-in-environment>
prod_account_id: &PROD_ACCOUNT_ID 763104351884
region: &REGION <set-$REGION-in-environment>
framework: &FRAMEWORK vllm
version: &VERSION "0.10.0"
version: &VERSION "0.10.1"
short_version: &SHORT_VERSION "0.10"
arch_type: &ARCH_TYPE arm64
autopatch_build: "False"
Expand Down Expand Up @@ -39,14 +39,15 @@ images:
python_version: &DOCKER_PYTHON_VERSION py3
tag_python_version: &TAG_PYTHON_VERSION py312
os_version: &OS_VERSION ubuntu22.04
tag: !join [ *VERSION, "-", *DEVICE_TYPE, "-", *TAG_PYTHON_VERSION, "-", *CUDA_VERSION, "-", *OS_VERSION, "-ec2" ]
latest_release_tag: !join [ *VERSION, "-", *DEVICE_TYPE, "-", *TAG_PYTHON_VERSION, "-", *CUDA_VERSION, "-", *OS_VERSION, "-ec2" ]
docker_file: !join [ *FRAMEWORK, /, *ARCH_TYPE, /, *DEVICE_TYPE, /Dockerfile ]
tag: !join [ *VERSION, "-", *DEVICE_TYPE, "-", *TAG_PYTHON_VERSION, "-", *CUDA_VERSION, "-", *OS_VERSION, "-arm64" ]
latest_release_tag: !join [ *VERSION, "-", *DEVICE_TYPE, "-", *TAG_PYTHON_VERSION, "-", *CUDA_VERSION, "-", *OS_VERSION, "-arm64" ]
docker_file: !join [ *FRAMEWORK, /, *ARCH_TYPE, /, *DEVICE_TYPE, /Dockerfile.arm64., *DEVICE_TYPE ]
target: final
build: true
enable_common_stage_build: false
test_configs:
test_platforms:
- sanity
- security
- ec2
- eks