EmbeddedLLM
diff --git a/‎CHANGELOG.md
Lines changed: 17 additions & 0 deletions b/‎CHANGELOG.md
Lines changed: 17 additions & 0 deletions
diff --git a/‎README.md
Lines changed: 18 additions & 7 deletions b/‎README.md
Lines changed: 18 additions & 7 deletions
diff --git a/‎docs/model/ipex_models.md
Lines changed: 12 additions & 0 deletions b/‎docs/model/ipex_models.md
Lines changed: 12 additions & 0 deletions
diff --git a/‎ellm_api_server.spec
Lines changed: 61 additions & 11 deletions b/‎ellm_api_server.spec
Lines changed: 61 additions & 11 deletions
diff --git a/‎pyproject.toml
Lines changed: 0 additions & 1 deletion b/‎pyproject.toml
Lines changed: 0 additions & 1 deletion
diff --git a/‎requirements-build.txt
Lines changed: 0 additions & 1 deletion b/‎requirements-build.txt
Lines changed: 0 additions & 1 deletion
diff --git a/‎requirements-common.txt
Lines changed: 5 additions & 4 deletions b/‎requirements-common.txt
Lines changed: 5 additions & 4 deletions
diff --git a/‎requirements-cpu.txt
Lines changed: 2 additions & 1 deletion b/‎requirements-cpu.txt
Lines changed: 2 additions & 1 deletion
diff --git a/‎requirements-cuda.txt
Lines changed: 2 additions & 1 deletion b/‎requirements-cuda.txt
Lines changed: 2 additions & 1 deletion
diff --git a/‎requirements-directml.txt
Lines changed: 2 additions & 1 deletion b/‎requirements-directml.txt
Lines changed: 2 additions & 1 deletion
@@ -0,0 +1,17 @@
+# CHANGELOG
+
+## [Unrelease]
+
+### ADDED
+
+DOC
+- Update `README.md` to include usage of precompiled engine executable.
+
+### CHANGES / FIXES
+
+Engine
+- Re-structure the configuration to specify which backend and device to launch the `ipex-llm` model.
+- Fixed Non-Streaming Mode of ONNX is returning the Prompt in the Response #12
+
+PyInstaller Executable
+- Update the `ellm_api_server.spec` to support compilation of `ipex-llm` into executable. #14 
@@ -29,6 +29,7 @@ Run local LLMs on iGPU, APU and CPU (AMD , Intel, and Qualcomm (Coming Soon)). E
   - [Launch Chatbot Web UI](#launch-chatbot-web-ui)
   - [Launch Model Management UI](#launch-model-management-ui)
 - [Compile OpenAI-API Compatible Server into Windows Executable](#compile-openai-api-compatible-server-into-windows-executable)
+- [Prebuilt Binary (Alpha)](#compile-openai-api-compatible-server-into-windows-executable)
 - [Acknowledgements](#acknowledgements)
 
 ## Supported Models (Quick Start)
@@ -59,39 +60,39 @@ Run local LLMs on iGPU, APU and CPU (AMD , Intel, and Qualcomm (Coming Soon)). E
 
   1. Custom Setup:
 
-  - **XPU**: Requires anaconda environment. `conda create -n ellm python=3.10 libuv; conda activate llm`.
+  - **IPEX(XPU)**: Requires anaconda environment. `conda create -n ellm python=3.10 libuv; conda activate ellm`.
   - **DirectML**: If you are using Conda Environment. Install additional dependencies: `conda install conda-forge::vs2015_runtime`.
 
   2. Install embeddedllm package. `$env:ELLM_TARGET_DEVICE='directml'; pip install -e .`. Note: currently support `cpu`, `directml` and `cuda`.
 
      - **DirectML:** `$env:ELLM_TARGET_DEVICE='directml'; pip install -e .[directml]`
      - **CPU:** `$env:ELLM_TARGET_DEVICE='cpu'; pip install -e .[cpu]`
      - **CUDA:** `$env:ELLM_TARGET_DEVICE='cuda'; pip install -e .[cuda]`
-     - **XPU:** `$env:ELLM_TARGET_DEVICE='xpu'; pip install -e .[xpu]`
+     - **IPEX:** `$env:ELLM_TARGET_DEVICE='ipex'; python setup.py develop`
      - **With Web UI**:
        - **DirectML:** `$env:ELLM_TARGET_DEVICE='directml'; pip install -e .[directml,webui]`
        - **CPU:** `$env:ELLM_TARGET_DEVICE='cpu'; pip install -e .[cpu,webui]`
        - **CUDA:** `$env:ELLM_TARGET_DEVICE='cuda'; pip install -e .[cuda,webui]`
-       - **XPU:** `$env:ELLM_TARGET_DEVICE='xpu'; pip install -e .[xpu,webui]`
+       - **IPEX:** `$env:ELLM_TARGET_DEVICE='ipex'; python setup.py develop; pip install -r requirements-webui.txt`
 
 - **Linux**
 
   1. Custom Setup:
 
-  - **XPU**: Requires anaconda environment. `conda create -n ellm python=3.10 libuv; conda activate llm`.
+  - **IPEX(XPU)**: Requires anaconda environment. `conda create -n ellm python=3.10 libuv; conda activate ellm`.
   - **DirectML**: If you are using Conda Environment. Install additional dependencies: `conda install conda-forge::vs2015_runtime`.
 
   2. Install embeddedllm package. `ELLM_TARGET_DEVICE='directml' pip install -e .`. Note: currently support `cpu`, `directml` and `cuda`.
 
      - **DirectML:** `ELLM_TARGET_DEVICE='directml' pip install -e .[directml]`
      - **CPU:** `ELLM_TARGET_DEVICE='cpu' pip install -e .[cpu]`
      - **CUDA:** `ELLM_TARGET_DEVICE='cuda' pip install -e .[cuda]`
-     - **XPU:** `ELLM_TARGET_DEVICE='xpu' pip install -e .[xpu]`
+     - **IPEX:** `ELLM_TARGET_DEVICE='ipex' python setup.py develop`
      - **With Web UI**:
        - **DirectML:** `ELLM_TARGET_DEVICE='directml' pip install -e .[directml,webui]`
        - **CPU:** `ELLM_TARGET_DEVICE='cpu' pip install -e .[cpu,webui]`
        - **CUDA:** `ELLM_TARGET_DEVICE='cuda' pip install -e .[cuda,webui]`
-       - **XPU:** `ELLM_TARGET_DEVICE='xpu' pip install -e .[xpu,webui]`
+       - **IPEX:** `$env:ELLM_TARGET_DEVICE='ipex'; python setup.py develop; pip install -r requirements-webui.txt`
 
 ### Launch OpenAI API Compatible Server
 
@@ -131,9 +132,19 @@ It is an interface that allows you to download and deploy OpenAI API compatible
 ## Compile OpenAI-API Compatible Server into Windows Executable
 
 1. Install `embeddedllm`.
-2. Install PyInstaller: `pip install pyinstaller`.
+2. Install PyInstaller: `pip install pyinstaller==6.9.0`.
 3. Compile Windows Executable: `pyinstaller .\ellm_api_server.spec`.
 4. You can find the executable in the `dist\ellm_api_server`.
+5. Use it like `ellm_server`. `.\ellm_api_server.exe --model_path <path/to/model/weight>`.
+
+## Prebuilt OpenAI API Compatible Windows Executable (Alpha)
+You can find the prebuilt OpenAI API Compatible Windows Executable in the Release page.
+
+*Powershell/Terminal Usage (Use it like `ellm_server`)*:
+```powershell
+.\ellm_api_server.exe --model_path <path/to/model/weight>
+```
+
 
 ## Acknowledgements
 
 
@@ -63,3 +63,15 @@
 | MiniCPM | [link](https://huggingface.co/openbmb/MiniCPM-2B-sft-bf16) |
 
 Resources from: https://github.com/intel-analytics/ipex-llm/
+
+
+## Qwen2 Model (Experimental)
+1. Upgrade `transformers`. `pip install --upgrade transformers~=4.42.3`.
+2. Edit `lib\site-packages\transformers\models\qwen2\modeling_qwen2.py`.
+3. Change `from transformers.models.qwen2.modeling_qwen2 import _prepare_4d_causal_attention_mask` to
+`from transformers.modeling_attn_mask_utils import _prepare_4d_causal_attention_mask`.
+
+### FAQ
+```
+ImportError: cannot import name '_prepare_4d_causal_attention_mask' from 'transformers.models.qwen2.modeling_qwen2' (C:\Users\hpintel\anaconda3\envs\ellmipex\lib\site-packages\transformers\models\qwen2\modeling_qwen2.py)
+```
@@ -1,43 +1,93 @@
 # -*- mode: python ; coding: utf-8 -*-
 
 from pathlib import Path
-from PyInstaller.utils.hooks import collect_all
+from PyInstaller.utils.hooks import collect_all, collect_data_files
+import importlib.metadata
+import re
+import sys
+import os
 
-binaries_list = []
+CONDA_PATH=Path(sys.executable)
+print(dir(CONDA_PATH))
+print(CONDA_PATH)
+print(CONDA_PATH.parent)
+
+excluded_modules = ['torch.distributions'] # <<< ADD THIS LINE
+
+def get_embeddedllm_backend():
+    try:
+        # Get the version of embeddedllm
+        version = importlib.metadata.version("embeddedllm")
+
+        # Use regex to extract the backend
+        match = re.search(r"\+(directml|cpu|cuda|ipex)$", version)
+
+        if match:
+            backend = match.group(1)
+            return backend
+        else:
+            return "Unknown backend"
+
+    except importlib.metadata.PackageNotFoundError:
+        return "embeddedllm not installed"
 
-print(Path("src/owl/entrypoints/api.py").resolve().as_posix())
+
+backend = get_embeddedllm_backend()
+
+binaries_list = []
 
 datas_list = [
-    (Path("src/embeddedllm/entrypoints/api_server.py").resolve().as_posix(), 'embeddedllm/entrypoints')
+    (Path("src/embeddedllm/entrypoints/api_server.py").resolve().as_posix(), 'embeddedllm/entrypoints'),
 ]
+datas_list.extend(collect_data_files('torch', include_py_files=True))
 
 hiddenimports_list = ['multipart']
+# Add missing hidden imports
+#hiddenimports_list.extend([
+#    'torch', 'torchvision', 'intel_extension_for_pytorch',
+#    'intel_extension_for_pytorch.xpu', 'intel_extension_for_pytorch.xpu.fp8',
+#    'intel_extension_for_pytorch.nn.utils'
+#])
+
+pathex = []
 
 def add_package(package_name):
     datas, binaries, hiddenimports = collect_all(package_name)
     datas_list.extend(datas)
     binaries_list.extend(binaries)
     hiddenimports_list.extend(hiddenimports)
 
-add_package('onnxruntime')
-add_package('onnxruntime_genai')
+if backend in ('directml', 'cpu', 'cuda'):
+    add_package('onnxruntime')
+    add_package('onnxruntime_genai')
+elif backend == 'ipex':
+    add_package('ipex_llm')
+    add_package('torch')
+    add_package('torchvision')
+    add_package('intel_extension_for_pytorch')
+    add_package('trl')
+    add_package('embeddedllm')
+    add_package('numpy')
+    binaries_list.append((f'{CONDA_PATH.parent}/Library/bin/*', '.'))
 
 print(binaries_list)
+
 with open("binary.txt", 'w') as f:
     f.write(str(binaries_list))
-
+block_cipher = None
 a = Analysis(
-    ['src\\embeddedllm\\entrypoints\\api_server.py'],
-    pathex=[],
+    ['src\\embeddedllm\\entrypoints\\api_server.py'],             
+    pathex=pathex,
     binaries=binaries_list,
     datas=datas_list,
     hiddenimports=hiddenimports_list,
     hookspath=[],
     hooksconfig={},
     runtime_hooks=[],
-    excludes=[],
+    excludes=excluded_modules,
+    block_cipher=block_cipher,
     noarchive=False,
-    optimize=0,
+    optimize=1,
 )
 pyz = PYZ(a.pure)
 
 
@@ -57,7 +57,6 @@ requires = [
     "setuptools>=62.0", 
     "packaging",
     "setuptools>=49.4.0",
-    "torch==2.3.1",
     "wheel"
 ]
 build-backend = "setuptools.build_meta"
@@ -1,5 +1,4 @@
 # Should be mirrored in pyproject.toml
 packaging
 setuptools>=49.4.0
-torch
 wheel
@@ -1,16 +1,17 @@
 huggingface-hub[cli]
-fastapi~=0.110.0
-gunicorn~=21.2.0
+fastapi
+gunicorn~=22.0.0
 loguru~=0.7.2
 numpy~=1.26.4
 pydantic-settings>=2.3.3
 pydantic-core~=2.18.4
 pydantic~=2.7.4
 loguru
 openai
-torch
 transformers
 uvicorn
 filetype~=1.2.0
 Pillow~=10.3.0
-torchvision
+torchvision
+aiohttp<4
+fsspec[http]<=2024.5.0,>=2023.1.0
@@ -1,4 +1,5 @@
 torch==2.3.1
 torchvision~=0.18.1
 onnxruntime
-onnxruntime-genai==0.3.0rc2
+onnxruntime-genai==0.3.0rc2
+transformers>=4.43.3
@@ -1,4 +1,5 @@
 torch==2.3.1
 torchvision~=0.18.1
 onnxruntime-gpu~=1.18.0
-onnxruntime-genai-cuda~=0.3.0rc2
+onnxruntime-genai-cuda~=0.3.0rc2
+transformers>=4.43.3
@@ -1,4 +1,5 @@
 torch==2.3.1
 torchvision~=0.18.1
 onnxruntime-directml~=1.18.0
-onnxruntime-genai-directml~=0.3.0
+onnxruntime-genai-directml~=0.3.0
+transformers>=4.43.3
Original file line number	Diff line number	Diff line change
`@@ -57,7 +57,6 @@ requires = [`
`57`	`57`	`"setuptools>=62.0",`
`58`	`58`	`"packaging",`
`59`	`59`	`"setuptools>=49.4.0",`
`60`		`- "torch==2.3.1",`
`61`	`60`	`"wheel"`
`62`	`61`	`]`
`63`	`62`	`build-backend = "setuptools.build_meta"`