Skip to content

Commit 1fab2f5

Browse files
authored
[Feature] support qwen2.5-vl for pytorch engine (#3194)
* support qwen2.5-vl for pytorch engine * reuse qwen2 code * update doc * reuse vl qwen2 code * update doc
1 parent ab00e39 commit 1fab2f5

File tree

14 files changed

+1057
-2
lines changed

14 files changed

+1057
-2
lines changed

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -153,6 +153,7 @@ LMDeploy is a toolkit for compressing, deploying, and serving LLM, developed by
153153
<li>InternLM-XComposer2.5 (7B)</li>
154154
<li>Qwen-VL (7B)</li>
155155
<li>Qwen2-VL (2B, 7B, 72B)</li>
156+
<li>Qwen2.5-VL (3B, 7B, 72B)</li>
156157
<li>DeepSeek-VL (7B)</li>
157158
<li>DeepSeek-VL2 (3B, 16B, 27B)</li>
158159
<li>InternVL-Chat (v1.1-v1.5)</li>

README_ja.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -150,6 +150,8 @@ LMDeploy TurboMindエンジンは卓越した推論能力を持ち、さまざ
150150
<li>InternLM-XComposer2 (7B, 4khd-7B)</li>
151151
<li>InternLM-XComposer2.5 (7B)</li>
152152
<li>Qwen-VL (7B)</li>
153+
<li>Qwen2-VL (2B, 7B, 72B)</li>
154+
<li>Qwen2.5-VL (3B, 7B, 72B)</li>
153155
<li>DeepSeek-VL (7B)</li>
154156
<li>DeepSeek-VL2 (3B, 16B, 27B)</li>
155157
<li>InternVL-Chat (v1.1-v1.5)</li>

README_zh-CN.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -155,6 +155,7 @@ LMDeploy TurboMind 引擎拥有卓越的推理能力,在各种规模的模型
155155
<li>InternLM-XComposer2.5 (7B)</li>
156156
<li>Qwen-VL (7B)</li>
157157
<li>Qwen2-VL (2B, 7B, 72B)</li>
158+
<li>Qwen2.5-VL (3B, 7B, 72B)</li>
158159
<li>DeepSeek-VL (7B)</li>
159160
<li>DeepSeek-VL2 (3B, 16B, 27B)</li>
160161
<li>InternVL-Chat (v1.1-v1.5)</li>

docs/en/multi_modal/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,4 +14,5 @@ Vision-Language Models
1414
phi3.md
1515
mllama.md
1616
qwen2_vl.md
17+
qwen2_5_vl.md
1718
molmo.md

docs/en/multi_modal/qwen2_5_vl.md

Lines changed: 156 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,156 @@
1+
# Qwen2.5-VL
2+
3+
LMDeploy supports the following Qwen-VL series of models, which are detailed in the table below:
4+
5+
| Model | Size | Supported Inference Engine |
6+
| :--------: | :---------: | :------------------------: |
7+
| Qwen2.5-VL | 3B, 7B, 72B | PyTorch |
8+
9+
The next chapter demonstrates how to deploy a Qwen-VL model using LMDeploy, with [Qwen2.5-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct) as an example.
10+
11+
## Installation
12+
13+
Please install LMDeploy by following the [installation guide](../get_started/installation.md), and install other packages that Qwen2.5-VL needs
14+
15+
```shell
16+
# Qwen2.5-VL requires the latest transformers (transformers >= 4.49.0)
17+
pip install git+https://github.com/huggingface/transformers
18+
# It's highly recommended to use `[decord]` feature for faster video loading.
19+
pip install qwen-vl-utils[decord]==0.0.8
20+
```
21+
22+
## Offline inference
23+
24+
The following sample code shows the basic usage of the VLM pipeline. For detailed information, please refer to [VLM Offline Inference Pipeline](./vl_pipeline.md)
25+
26+
```python
27+
from lmdeploy import pipeline
28+
from lmdeploy.vl import load_image
29+
30+
pipe = pipeline('Qwen/Qwen2.5-VL-7B-Instruct')
31+
32+
image = load_image('https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/tests/data/tiger.jpeg')
33+
response = pipe((f'describe this image', image))
34+
print(response)
35+
```
36+
37+
More examples are listed below:
38+
39+
<details>
40+
<summary>
41+
<b>multi-image multi-round conversation, combined images</b>
42+
</summary>
43+
44+
```python
45+
from lmdeploy import pipeline, GenerationConfig
46+
47+
pipe = pipeline('Qwen/Qwen2.5-VL-7B-Instruct', log_level='INFO')
48+
messages = [
49+
dict(role='user', content=[
50+
dict(type='text', text='Describe the two images in detail.'),
51+
dict(type='image_url', image_url=dict(url='https://raw.githubusercontent.com/QwenLM/Qwen-VL/master/assets/mm_tutorial/Beijing_Small.jpeg')),
52+
dict(type='image_url', image_url=dict(url='https://raw.githubusercontent.com/QwenLM/Qwen-VL/master/assets/mm_tutorial/Chongqing_Small.jpeg'))
53+
])
54+
]
55+
out = pipe(messages, gen_config=GenerationConfig(top_k=1))
56+
57+
messages.append(dict(role='assistant', content=out.text))
58+
messages.append(dict(role='user', content='What are the similarities and differences between these two images.'))
59+
out = pipe(messages, gen_config=GenerationConfig(top_k=1))
60+
```
61+
62+
</details>
63+
64+
<details>
65+
<summary>
66+
<b>image resolution for performance boost</b>
67+
</summary>
68+
69+
```python
70+
from lmdeploy import pipeline, GenerationConfig
71+
72+
pipe = pipeline('Qwen/Qwen2.5-VL-7B-Instruct', log_level='INFO')
73+
74+
min_pixels = 64 * 28 * 28
75+
max_pixels = 64 * 28 * 28
76+
messages = [
77+
dict(role='user', content=[
78+
dict(type='text', text='Describe the two images in detail.'),
79+
dict(type='image_url', image_url=dict(min_pixels=min_pixels, max_pixels=max_pixels, url='https://raw.githubusercontent.com/QwenLM/Qwen-VL/master/assets/mm_tutorial/Beijing_Small.jpeg')),
80+
dict(type='image_url', image_url=dict(min_pixels=min_pixels, max_pixels=max_pixels, url='https://raw.githubusercontent.com/QwenLM/Qwen-VL/master/assets/mm_tutorial/Chongqing_Small.jpeg'))
81+
])
82+
]
83+
out = pipe(messages, gen_config=GenerationConfig(top_k=1))
84+
85+
messages.append(dict(role='assistant', content=out.text))
86+
messages.append(dict(role='user', content='What are the similarities and differences between these two images.'))
87+
out = pipe(messages, gen_config=GenerationConfig(top_k=1))
88+
```
89+
90+
</details>
91+
92+
<details>
93+
<summary>
94+
<b>video multi-round conversation</b>
95+
</summary>
96+
97+
```python
98+
import numpy as np
99+
from lmdeploy import pipeline, GenerationConfig
100+
from decord import VideoReader, cpu
101+
from lmdeploy.vl.constants import IMAGE_TOKEN
102+
from lmdeploy.vl.utils import encode_image_base64
103+
from PIL import Image
104+
pipe = pipeline('Qwen/Qwen2.5-VL-7B-Instruct', log_level='INFO')
105+
106+
107+
def get_index(bound, fps, max_frame, first_idx=0, num_segments=32):
108+
if bound:
109+
start, end = bound[0], bound[1]
110+
else:
111+
start, end = -100000, 100000
112+
start_idx = max(first_idx, round(start * fps))
113+
end_idx = min(round(end * fps), max_frame)
114+
seg_size = float(end_idx - start_idx) / num_segments
115+
frame_indices = np.array([
116+
int(start_idx + (seg_size / 2) + np.round(seg_size * idx))
117+
for idx in range(num_segments)
118+
])
119+
return frame_indices
120+
121+
122+
def load_video(video_path, bound=None, num_segments=32):
123+
vr = VideoReader(video_path, ctx=cpu(0), num_threads=1)
124+
max_frame = len(vr) - 1
125+
fps = float(vr.get_avg_fps())
126+
pixel_values_list, num_patches_list = [], []
127+
frame_indices = get_index(bound, fps, max_frame, first_idx=0, num_segments=num_segments)
128+
imgs = []
129+
for frame_index in frame_indices:
130+
img = Image.fromarray(vr[frame_index].asnumpy()).convert('RGB')
131+
imgs.append(img)
132+
return imgs
133+
134+
135+
video_path = 'red-panda.mp4'
136+
imgs = load_video(video_path, num_segments=8)
137+
138+
question = ''
139+
for i in range(len(imgs)):
140+
question = question + f'Frame{i+1}: {IMAGE_TOKEN}\n'
141+
142+
question += 'What is the red panda doing?'
143+
144+
content = [{'type': 'text', 'text': question}]
145+
for img in imgs:
146+
content.append({'type': 'image_url', 'image_url': {'max_dynamic_patch': 1, 'url': f'data:image/jpeg;base64,{encode_image_base64(img)}'}})
147+
148+
messages = [dict(role='user', content=content)]
149+
out = pipe(messages, gen_config=GenerationConfig(top_k=1))
150+
151+
messages.append(dict(role='assistant', content=out.text))
152+
messages.append(dict(role='user', content='Describe this video in detail. Don\'t repeat.'))
153+
out = pipe(messages, gen_config=GenerationConfig(top_k=1))
154+
```
155+
156+
</details>

docs/en/supported_models/supported_models.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -78,6 +78,7 @@ The following tables detail the models supported by LMDeploy's TurboMind engine
7878
| QWen2 | 0.5B - 72B | LLM | Yes | Yes | No | Yes | Yes |
7979
| Qwen2.5 | 0.5B - 72B | LLM | Yes | Yes | No | Yes | Yes |
8080
| QWen2-VL | 2B, 7B | MLLM | Yes | Yes | No | No | Yes |
81+
| QWen2.5-VL | 3B - 72B | MLLM | Yes | No | No | No | No |
8182
| DeepSeek-MoE | 16B | LLM | Yes | No | No | No | No |
8283
| DeepSeek-V2 | 16B, 236B | LLM | Yes | No | No | No | No |
8384
| DeepSeek-V2.5 | 236B | LLM | Yes | No | No | No | No |

docs/zh_cn/multi_modal/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,4 +14,5 @@
1414
phi3.md
1515
mllama.md
1616
qwen2_vl.md
17+
qwen2_5_vl.md
1718
molmo.md
Lines changed: 156 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,156 @@
1+
# Qwen2.5-VL
2+
3+
LMDeploy 支持 Qwen-VL 系列模型,具体如下:
4+
5+
| Model | Size | Supported Inference Engine |
6+
| :--------: | :---------: | :------------------------: |
7+
| Qwen2.5-VL | 3B, 7B, 72B | PyTorch |
8+
9+
本文将以[Qwen2.5-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct)为例,演示使用 LMDeploy 部署 Qwen2.5-VL 系列模型的方法
10+
11+
## 安装
12+
13+
请参考[安装文档](../get_started/installation.md)安装 LMDeploy,并安装上游 Qwen2.5-VL 模型库所需的依赖。
14+
15+
```shell
16+
# Qwen2.5-VL requires the latest transformers (transformers >= 4.49.0)
17+
pip install git+https://github.com/huggingface/transformers
18+
# It's highly recommended to use `[decord]` feature for faster video loading.
19+
pip install qwen-vl-utils[decord]==0.0.8
20+
```
21+
22+
## 离线推理
23+
24+
以下是使用 pipeline 进行离线推理的示例,更多用法参考[VLM离线推理 pipeline](./vl_pipeline.md)
25+
26+
```python
27+
from lmdeploy import pipeline
28+
from lmdeploy.vl import load_image
29+
30+
pipe = pipeline('Qwen/Qwen2.5-VL-7B-Instruct')
31+
32+
image = load_image('https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/tests/data/tiger.jpeg')
33+
response = pipe((f'describe this image', image))
34+
print(response)
35+
```
36+
37+
更多例子如下:
38+
39+
<details>
40+
<summary>
41+
<b>多图多轮对话</b>
42+
</summary>
43+
44+
```python
45+
from lmdeploy import pipeline, GenerationConfig
46+
47+
pipe = pipeline('Qwen/Qwen2.5-VL-7B-Instruct', log_level='INFO')
48+
messages = [
49+
dict(role='user', content=[
50+
dict(type='text', text='Describe the two images in detail.'),
51+
dict(type='image_url', image_url=dict(url='https://raw.githubusercontent.com/QwenLM/Qwen-VL/master/assets/mm_tutorial/Beijing_Small.jpeg')),
52+
dict(type='image_url', image_url=dict(url='https://raw.githubusercontent.com/QwenLM/Qwen-VL/master/assets/mm_tutorial/Chongqing_Small.jpeg'))
53+
])
54+
]
55+
out = pipe(messages, gen_config=GenerationConfig(top_k=1))
56+
57+
messages.append(dict(role='assistant', content=out.text))
58+
messages.append(dict(role='user', content='What are the similarities and differences between these two images.'))
59+
out = pipe(messages, gen_config=GenerationConfig(top_k=1))
60+
```
61+
62+
</details>
63+
64+
<details>
65+
<summary>
66+
<b>控制图片分辨率,加速推理</b>
67+
</summary>
68+
69+
```python
70+
from lmdeploy import pipeline, GenerationConfig
71+
72+
pipe = pipeline('Qwen/Qwen2.5-VL-7B-Instruct', log_level='INFO')
73+
74+
min_pixels = 64 * 28 * 28
75+
max_pixels = 64 * 28 * 28
76+
messages = [
77+
dict(role='user', content=[
78+
dict(type='text', text='Describe the two images in detail.'),
79+
dict(type='image_url', image_url=dict(min_pixels=min_pixels, max_pixels=max_pixels, url='https://raw.githubusercontent.com/QwenLM/Qwen-VL/master/assets/mm_tutorial/Beijing_Small.jpeg')),
80+
dict(type='image_url', image_url=dict(min_pixels=min_pixels, max_pixels=max_pixels, url='https://raw.githubusercontent.com/QwenLM/Qwen-VL/master/assets/mm_tutorial/Chongqing_Small.jpeg'))
81+
])
82+
]
83+
out = pipe(messages, gen_config=GenerationConfig(top_k=1))
84+
85+
messages.append(dict(role='assistant', content=out.text))
86+
messages.append(dict(role='user', content='What are the similarities and differences between these two images.'))
87+
out = pipe(messages, gen_config=GenerationConfig(top_k=1))
88+
```
89+
90+
</details>
91+
92+
<details>
93+
<summary>
94+
<b>视频多轮对话</b>
95+
</summary>
96+
97+
```python
98+
import numpy as np
99+
from lmdeploy import pipeline, GenerationConfig
100+
from decord import VideoReader, cpu
101+
from lmdeploy.vl.constants import IMAGE_TOKEN
102+
from lmdeploy.vl.utils import encode_image_base64
103+
from PIL import Image
104+
pipe = pipeline('Qwen/Qwen2.5-VL-7B-Instruct', log_level='INFO')
105+
106+
107+
def get_index(bound, fps, max_frame, first_idx=0, num_segments=32):
108+
if bound:
109+
start, end = bound[0], bound[1]
110+
else:
111+
start, end = -100000, 100000
112+
start_idx = max(first_idx, round(start * fps))
113+
end_idx = min(round(end * fps), max_frame)
114+
seg_size = float(end_idx - start_idx) / num_segments
115+
frame_indices = np.array([
116+
int(start_idx + (seg_size / 2) + np.round(seg_size * idx))
117+
for idx in range(num_segments)
118+
])
119+
return frame_indices
120+
121+
122+
def load_video(video_path, bound=None, num_segments=32):
123+
vr = VideoReader(video_path, ctx=cpu(0), num_threads=1)
124+
max_frame = len(vr) - 1
125+
fps = float(vr.get_avg_fps())
126+
pixel_values_list, num_patches_list = [], []
127+
frame_indices = get_index(bound, fps, max_frame, first_idx=0, num_segments=num_segments)
128+
imgs = []
129+
for frame_index in frame_indices:
130+
img = Image.fromarray(vr[frame_index].asnumpy()).convert('RGB')
131+
imgs.append(img)
132+
return imgs
133+
134+
135+
video_path = 'red-panda.mp4'
136+
imgs = load_video(video_path, num_segments=8)
137+
138+
question = ''
139+
for i in range(len(imgs)):
140+
question = question + f'Frame{i+1}: {IMAGE_TOKEN}\n'
141+
142+
question += 'What is the red panda doing?'
143+
144+
content = [{'type': 'text', 'text': question}]
145+
for img in imgs:
146+
content.append({'type': 'image_url', 'image_url': {'max_dynamic_patch': 1, 'url': f'data:image/jpeg;base64,{encode_image_base64(img)}'}})
147+
148+
messages = [dict(role='user', content=content)]
149+
out = pipe(messages, gen_config=GenerationConfig(top_k=1))
150+
151+
messages.append(dict(role='assistant', content=out.text))
152+
messages.append(dict(role='user', content='Describe this video in detail. Don\'t repeat.'))
153+
out = pipe(messages, gen_config=GenerationConfig(top_k=1))
154+
```
155+
156+
</details>

docs/zh_cn/supported_models/supported_models.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -78,6 +78,7 @@
7878
| QWen2 | 0.5B - 72B | LLM | Yes | Yes | No | Yes | Yes |
7979
| Qwen2.5 | 0.5B - 72B | LLM | Yes | Yes | No | Yes | Yes |
8080
| QWen2-VL | 2B, 7B | MLLM | Yes | Yes | No | No | Yes |
81+
| QWen2.5-VL | 3B - 72B | MLLM | Yes | No | No | No | No |
8182
| DeepSeek-MoE | 16B | LLM | Yes | No | No | No | No |
8283
| DeepSeek-V2 | 16B, 236B | LLM | Yes | No | No | No | No |
8384
| DeepSeek-V2.5 | 236B | LLM | Yes | No | No | No | No |

lmdeploy/archs.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -119,7 +119,8 @@ def check_vl_llm(config: dict) -> bool:
119119
'LlavaLlamaForCausalLM', 'LlavaMistralForCausalLM', 'CogVLMForCausalLM', 'InternLMXComposer2ForCausalLM',
120120
'InternVLChatModel', 'MiniGeminiLlamaForCausalLM', 'MGMLlamaForCausalLM', 'MiniCPMV',
121121
'LlavaForConditionalGeneration', 'LlavaNextForConditionalGeneration', 'Phi3VForCausalLM',
122-
'Qwen2VLForConditionalGeneration', 'MllamaForConditionalGeneration', 'MolmoForCausalLM'
122+
'Qwen2VLForConditionalGeneration', 'Qwen2_5_VLForConditionalGeneration', 'MllamaForConditionalGeneration',
123+
'MolmoForCausalLM'
123124
])
124125
if arch == 'QWenLMHeadModel' and 'visual' in config:
125126
return True

0 commit comments

Comments
 (0)