Skip to content

Commit fb2abcc

Browse files
authored
[FEATURE] [IPEX-LLM] [BUG FIX] (#13)
# ADDED ## DOC - Update `README.md` to include usage of precompiled engine executable. # CHANGES / FIXES ## Engine 1. Re-structure the configuration to specify which backend and device to launch the `ipex-llm` model. 2. Fixed Non-Streaming Mode of ONNX is returning the Prompt in the Response #12 ## PyInstaller Executable 1. Update the `ellm_api_server.spec` to support compilation of `ipex-llm` into executable. #14 --------- Co-authored-by: tjtanaa <[email protected]>
1 parent 7ed931f commit fb2abcc

19 files changed

+255
-75
lines changed

CHANGELOG.md

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
# CHANGELOG
2+
3+
## [Unrelease]
4+
5+
### ADDED
6+
7+
DOC
8+
- Update `README.md` to include usage of precompiled engine executable.
9+
10+
### CHANGES / FIXES
11+
12+
Engine
13+
- Re-structure the configuration to specify which backend and device to launch the `ipex-llm` model.
14+
- Fixed Non-Streaming Mode of ONNX is returning the Prompt in the Response #12
15+
16+
PyInstaller Executable
17+
- Update the `ellm_api_server.spec` to support compilation of `ipex-llm` into executable. #14

README.md

Lines changed: 18 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,7 @@ Run local LLMs on iGPU, APU and CPU (AMD , Intel, and Qualcomm (Coming Soon)). E
2929
- [Launch Chatbot Web UI](#launch-chatbot-web-ui)
3030
- [Launch Model Management UI](#launch-model-management-ui)
3131
- [Compile OpenAI-API Compatible Server into Windows Executable](#compile-openai-api-compatible-server-into-windows-executable)
32+
- [Prebuilt Binary (Alpha)](#compile-openai-api-compatible-server-into-windows-executable)
3233
- [Acknowledgements](#acknowledgements)
3334

3435
## Supported Models (Quick Start)
@@ -59,39 +60,39 @@ Run local LLMs on iGPU, APU and CPU (AMD , Intel, and Qualcomm (Coming Soon)). E
5960

6061
1. Custom Setup:
6162

62-
- **XPU**: Requires anaconda environment. `conda create -n ellm python=3.10 libuv; conda activate llm`.
63+
- **IPEX(XPU)**: Requires anaconda environment. `conda create -n ellm python=3.10 libuv; conda activate ellm`.
6364
- **DirectML**: If you are using Conda Environment. Install additional dependencies: `conda install conda-forge::vs2015_runtime`.
6465

6566
2. Install embeddedllm package. `$env:ELLM_TARGET_DEVICE='directml'; pip install -e .`. Note: currently support `cpu`, `directml` and `cuda`.
6667

6768
- **DirectML:** `$env:ELLM_TARGET_DEVICE='directml'; pip install -e .[directml]`
6869
- **CPU:** `$env:ELLM_TARGET_DEVICE='cpu'; pip install -e .[cpu]`
6970
- **CUDA:** `$env:ELLM_TARGET_DEVICE='cuda'; pip install -e .[cuda]`
70-
- **XPU:** `$env:ELLM_TARGET_DEVICE='xpu'; pip install -e .[xpu]`
71+
- **IPEX:** `$env:ELLM_TARGET_DEVICE='ipex'; python setup.py develop`
7172
- **With Web UI**:
7273
- **DirectML:** `$env:ELLM_TARGET_DEVICE='directml'; pip install -e .[directml,webui]`
7374
- **CPU:** `$env:ELLM_TARGET_DEVICE='cpu'; pip install -e .[cpu,webui]`
7475
- **CUDA:** `$env:ELLM_TARGET_DEVICE='cuda'; pip install -e .[cuda,webui]`
75-
- **XPU:** `$env:ELLM_TARGET_DEVICE='xpu'; pip install -e .[xpu,webui]`
76+
- **IPEX:** `$env:ELLM_TARGET_DEVICE='ipex'; python setup.py develop; pip install -r requirements-webui.txt`
7677

7778
- **Linux**
7879

7980
1. Custom Setup:
8081

81-
- **XPU**: Requires anaconda environment. `conda create -n ellm python=3.10 libuv; conda activate llm`.
82+
- **IPEX(XPU)**: Requires anaconda environment. `conda create -n ellm python=3.10 libuv; conda activate ellm`.
8283
- **DirectML**: If you are using Conda Environment. Install additional dependencies: `conda install conda-forge::vs2015_runtime`.
8384

8485
2. Install embeddedllm package. `ELLM_TARGET_DEVICE='directml' pip install -e .`. Note: currently support `cpu`, `directml` and `cuda`.
8586

8687
- **DirectML:** `ELLM_TARGET_DEVICE='directml' pip install -e .[directml]`
8788
- **CPU:** `ELLM_TARGET_DEVICE='cpu' pip install -e .[cpu]`
8889
- **CUDA:** `ELLM_TARGET_DEVICE='cuda' pip install -e .[cuda]`
89-
- **XPU:** `ELLM_TARGET_DEVICE='xpu' pip install -e .[xpu]`
90+
- **IPEX:** `ELLM_TARGET_DEVICE='ipex' python setup.py develop`
9091
- **With Web UI**:
9192
- **DirectML:** `ELLM_TARGET_DEVICE='directml' pip install -e .[directml,webui]`
9293
- **CPU:** `ELLM_TARGET_DEVICE='cpu' pip install -e .[cpu,webui]`
9394
- **CUDA:** `ELLM_TARGET_DEVICE='cuda' pip install -e .[cuda,webui]`
94-
- **XPU:** `ELLM_TARGET_DEVICE='xpu' pip install -e .[xpu,webui]`
95+
- **IPEX:** `$env:ELLM_TARGET_DEVICE='ipex'; python setup.py develop; pip install -r requirements-webui.txt`
9596

9697
### Launch OpenAI API Compatible Server
9798

@@ -131,9 +132,19 @@ It is an interface that allows you to download and deploy OpenAI API compatible
131132
## Compile OpenAI-API Compatible Server into Windows Executable
132133
133134
1. Install `embeddedllm`.
134-
2. Install PyInstaller: `pip install pyinstaller`.
135+
2. Install PyInstaller: `pip install pyinstaller==6.9.0`.
135136
3. Compile Windows Executable: `pyinstaller .\ellm_api_server.spec`.
136137
4. You can find the executable in the `dist\ellm_api_server`.
138+
5. Use it like `ellm_server`. `.\ellm_api_server.exe --model_path <path/to/model/weight>`.
139+
140+
## Prebuilt OpenAI API Compatible Windows Executable (Alpha)
141+
You can find the prebuilt OpenAI API Compatible Windows Executable in the Release page.
142+
143+
*Powershell/Terminal Usage (Use it like `ellm_server`)*:
144+
```powershell
145+
.\ellm_api_server.exe --model_path <path/to/model/weight>
146+
```
147+
137148

138149
## Acknowledgements
139150

docs/model/ipex_models.md

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -63,3 +63,15 @@
6363
| MiniCPM | [link](https://huggingface.co/openbmb/MiniCPM-2B-sft-bf16) |
6464

6565
Resources from: https://github.com/intel-analytics/ipex-llm/
66+
67+
68+
## Qwen2 Model (Experimental)
69+
1. Upgrade `transformers`. `pip install --upgrade transformers~=4.42.3`.
70+
2. Edit `lib\site-packages\transformers\models\qwen2\modeling_qwen2.py`.
71+
3. Change `from transformers.models.qwen2.modeling_qwen2 import _prepare_4d_causal_attention_mask` to
72+
`from transformers.modeling_attn_mask_utils import _prepare_4d_causal_attention_mask`.
73+
74+
### FAQ
75+
```
76+
ImportError: cannot import name '_prepare_4d_causal_attention_mask' from 'transformers.models.qwen2.modeling_qwen2' (C:\Users\hpintel\anaconda3\envs\ellmipex\lib\site-packages\transformers\models\qwen2\modeling_qwen2.py)
77+
```

ellm_api_server.spec

Lines changed: 61 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1,43 +1,93 @@
11
# -*- mode: python ; coding: utf-8 -*-
22

33
from pathlib import Path
4-
from PyInstaller.utils.hooks import collect_all
4+
from PyInstaller.utils.hooks import collect_all, collect_data_files
5+
import importlib.metadata
6+
import re
7+
import sys
8+
import os
59

6-
binaries_list = []
10+
CONDA_PATH=Path(sys.executable)
11+
print(dir(CONDA_PATH))
12+
print(CONDA_PATH)
13+
print(CONDA_PATH.parent)
14+
15+
excluded_modules = ['torch.distributions'] # <<< ADD THIS LINE
16+
17+
def get_embeddedllm_backend():
18+
try:
19+
# Get the version of embeddedllm
20+
version = importlib.metadata.version("embeddedllm")
21+
22+
# Use regex to extract the backend
23+
match = re.search(r"\+(directml|cpu|cuda|ipex)$", version)
24+
25+
if match:
26+
backend = match.group(1)
27+
return backend
28+
else:
29+
return "Unknown backend"
30+
31+
except importlib.metadata.PackageNotFoundError:
32+
return "embeddedllm not installed"
733

8-
print(Path("src/owl/entrypoints/api.py").resolve().as_posix())
34+
35+
backend = get_embeddedllm_backend()
36+
37+
binaries_list = []
938

1039
datas_list = [
11-
(Path("src/embeddedllm/entrypoints/api_server.py").resolve().as_posix(), 'embeddedllm/entrypoints')
40+
(Path("src/embeddedllm/entrypoints/api_server.py").resolve().as_posix(), 'embeddedllm/entrypoints'),
1241
]
42+
datas_list.extend(collect_data_files('torch', include_py_files=True))
1343

1444
hiddenimports_list = ['multipart']
45+
# Add missing hidden imports
46+
#hiddenimports_list.extend([
47+
# 'torch', 'torchvision', 'intel_extension_for_pytorch',
48+
# 'intel_extension_for_pytorch.xpu', 'intel_extension_for_pytorch.xpu.fp8',
49+
# 'intel_extension_for_pytorch.nn.utils'
50+
#])
51+
52+
pathex = []
1553

1654
def add_package(package_name):
1755
datas, binaries, hiddenimports = collect_all(package_name)
1856
datas_list.extend(datas)
1957
binaries_list.extend(binaries)
2058
hiddenimports_list.extend(hiddenimports)
2159

22-
add_package('onnxruntime')
23-
add_package('onnxruntime_genai')
60+
if backend in ('directml', 'cpu', 'cuda'):
61+
add_package('onnxruntime')
62+
add_package('onnxruntime_genai')
63+
elif backend == 'ipex':
64+
add_package('ipex_llm')
65+
add_package('torch')
66+
add_package('torchvision')
67+
add_package('intel_extension_for_pytorch')
68+
add_package('trl')
69+
add_package('embeddedllm')
70+
add_package('numpy')
71+
binaries_list.append((f'{CONDA_PATH.parent}/Library/bin/*', '.'))
2472

2573
print(binaries_list)
74+
2675
with open("binary.txt", 'w') as f:
2776
f.write(str(binaries_list))
28-
77+
block_cipher = None
2978
a = Analysis(
30-
['src\\embeddedllm\\entrypoints\\api_server.py'],
31-
pathex=[],
79+
['src\\embeddedllm\\entrypoints\\api_server.py'],
80+
pathex=pathex,
3281
binaries=binaries_list,
3382
datas=datas_list,
3483
hiddenimports=hiddenimports_list,
3584
hookspath=[],
3685
hooksconfig={},
3786
runtime_hooks=[],
38-
excludes=[],
87+
excludes=excluded_modules,
88+
block_cipher=block_cipher,
3989
noarchive=False,
40-
optimize=0,
90+
optimize=1,
4191
)
4292
pyz = PYZ(a.pure)
4393

pyproject.toml

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -57,7 +57,6 @@ requires = [
5757
"setuptools>=62.0",
5858
"packaging",
5959
"setuptools>=49.4.0",
60-
"torch==2.3.1",
6160
"wheel"
6261
]
6362
build-backend = "setuptools.build_meta"

requirements-build.txt

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,4 @@
11
# Should be mirrored in pyproject.toml
22
packaging
33
setuptools>=49.4.0
4-
torch
54
wheel

requirements-common.txt

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,16 +1,17 @@
11
huggingface-hub[cli]
2-
fastapi~=0.110.0
3-
gunicorn~=21.2.0
2+
fastapi
3+
gunicorn~=22.0.0
44
loguru~=0.7.2
55
numpy~=1.26.4
66
pydantic-settings>=2.3.3
77
pydantic-core~=2.18.4
88
pydantic~=2.7.4
99
loguru
1010
openai
11-
torch
1211
transformers
1312
uvicorn
1413
filetype~=1.2.0
1514
Pillow~=10.3.0
16-
torchvision
15+
torchvision
16+
aiohttp<4
17+
fsspec[http]<=2024.5.0,>=2023.1.0

requirements-cpu.txt

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
11
torch==2.3.1
22
torchvision~=0.18.1
33
onnxruntime
4-
onnxruntime-genai==0.3.0rc2
4+
onnxruntime-genai==0.3.0rc2
5+
transformers>=4.43.3

requirements-cuda.txt

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
11
torch==2.3.1
22
torchvision~=0.18.1
33
onnxruntime-gpu~=1.18.0
4-
onnxruntime-genai-cuda~=0.3.0rc2
4+
onnxruntime-genai-cuda~=0.3.0rc2
5+
transformers>=4.43.3

requirements-directml.txt

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
11
torch==2.3.1
22
torchvision~=0.18.1
33
onnxruntime-directml~=1.18.0
4-
onnxruntime-genai-directml~=0.3.0
4+
onnxruntime-genai-directml~=0.3.0
5+
transformers>=4.43.3

0 commit comments

Comments
 (0)