Skip to content

Commit 1083fe5

Browse files
authored
Reenable pp and lightweight-serving serving on 0.6.6 (#12814)
* reenable pp ang lightweight serving on 066 * update readme * updat * update tag
1 parent af69342 commit 1083fe5

File tree

5 files changed

+22
-5
lines changed

5 files changed

+22
-5
lines changed

docker/llm/serving/xpu/docker/Dockerfile

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -93,6 +93,12 @@ RUN apt-get update && \
9393
cp -r ./ipex-llm/python/llm/dev/benchmark/ ./benchmark && \
9494
cp -r ./ipex-llm/python/llm/example/GPU/HuggingFace/LLM ./examples && \
9595
cp -r ./ipex-llm/python/llm/example/GPU/vLLM-Serving/ ./vLLM-Serving && \
96+
# Download pp_serving
97+
mkdir -p /llm/pp_serving && \
98+
cp ./ipex-llm/python/llm/example/GPU/Pipeline-Parallel-Serving/*.py /llm/pp_serving/ && \
99+
# Download lightweight_serving
100+
mkdir -p /llm/lightweight_serving && \
101+
cp ./ipex-llm/python/llm/example/GPU/Lightweight-Serving/*.py /llm/lightweight_serving/ && \
96102
rm -rf ./ipex-llm && \
97103
# Install vllm dependencies
98104
pip install --upgrade fastapi && \
@@ -120,7 +126,7 @@ RUN apt-get update && \
120126
cd /llm && \
121127
rm -rf /tmp/neo && \
122128
# Install vllm
123-
git clone -b 0.6.6-pre https://github.com/analytics-zoo/vllm.git /llm/vllm && \
129+
git clone -b 0.6.6 https://github.com/analytics-zoo/vllm.git /llm/vllm && \
124130
cd /llm/vllm && \
125131
pip install setuptools-scm && \
126132
pip install --upgrade cmake && \

docker/llm/serving/xpu/docker/README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -49,14 +49,14 @@ Currently, we provide two different serving engines in the image, which are Fast
4949

5050
To run Lightweight serving on one intel gpu using `IPEX-LLM` as backend, you can refer to this [readme](https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/GPU/Lightweight-Serving).
5151

52-
For convenience, we have included a file `/llm/start-lightweight_serving-service` in the image.
52+
For convenience, we have included a file `/llm/start-lightweight_serving-service` in the image. And need to install the appropriate transformers version first, like `pip install transformers==4.37.0`.
5353

5454

5555
#### Pipeline parallel serving engine
5656

5757
To run Pipeline parallel serving using `IPEX-LLM` as backend, you can refer to this [readme](https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/GPU/Pipeline-Parallel-FastAPI).
5858

59-
For convenience, we have included a file `/llm/start-pp_serving-service.sh` in the image.
59+
For convenience, we have included a file `/llm/start-pp_serving-service.sh` in the image. And need to install the appropriate transformers version first, like `pip install transformers==4.37.0`.
6060

6161

6262
#### vLLM serving engine
Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
# need to update transformers version first
2+
# pip install transformers==4.37.0
3+
cd /llm/lightweight_serving
4+
export IPEX_LLM_NOT_USE_VLLM=True
5+
model_path="/llm/models/Llama-2-7b-chat-hf"
6+
low_bit="sym_int4"
7+
python lightweight_serving.py --repo-id-or-model-path $model_path --low-bit $low_bit

docker/llm/serving/xpu/docker/start-pp_serving-service.sh

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,12 @@
1+
# update transformers version first
2+
# pip install transformers==4.37.0
13
source /opt/intel/oneapi/setvars.sh --force
4+
export IPEX_LLM_NOT_USE_VLLM=True
25
export no_proxy=localhost
36
export FI_PROVIDER=tcp
47
export OMP_NUM_THREADS=32
58

6-
export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
9+
#export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
710
basekit_root=/opt/intel/oneapi
811
source $basekit_root/setvars.sh --force
912
# source $basekit_root/ccl/latest/env/vars.sh --force

python/llm/src/ipex_llm/transformers/convert.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -70,8 +70,9 @@ def is_auto_awq_available():
7070

7171
def is_vllm_available():
7272
global _IS_VLLM_AVAILABLE
73+
_IS_VLLM_AVAILABLE = os.getenv("IPEX_LLM_NOT_USE_VLLM", None)
7374
if _IS_VLLM_AVAILABLE is not None:
74-
return _IS_VLLM_AVAILABLE
75+
return False
7576
import sys
7677
original_path = sys.path
7778
# Temporally remove current directory

0 commit comments

Comments
 (0)