Skip to content

Commit e4b21f6

Browse files
committed
chore: update blog with vlm support
1 parent 2c17a94 commit e4b21f6

File tree

1 file changed

+63
-0
lines changed

1 file changed

+63
-0
lines changed

_posts/2025-04-11-transformers-backend.md

Lines changed: 63 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,69 @@ vLLM will therefore optimize throughput/latency on top of existing transformers
2020
In this post, we’ll explore how vLLM leverages the transformers backend to combine **flexibility**
2121
with **efficiency**, enabling you to deploy state-of-the-art models faster and smarter.
2222

23+
## Updates
24+
25+
This section will hold all the updates that have been taken place over the course of the first release of the blog psot (11th April 2025).
26+
27+
### Support for Vision Language Models (21st July 2025)
28+
29+
vLLM with the transformers backend now supports Vision Langauge Models. Here is how one would use
30+
the API.
31+
32+
```python
33+
from vllm import LLM, SamplingParams
34+
from PIL import Image
35+
import requests
36+
from transformers import AutoProcessor
37+
38+
model_id = "llava-hf/llava-onevision-qwen2-0.5b-ov-hf"
39+
hf_processor = AutoProcessor.from_pretrained(model_id) # required to dynamically update the chat template
40+
41+
messages = [
42+
{
43+
"role": "user",
44+
"content": [
45+
{"type": "image", "url": "dummy_image.jpg"},
46+
{"type": "text", "text": "What is the content of this image?"},
47+
],
48+
},
49+
]
50+
prompt = hf_processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
51+
image = Image.open(
52+
requests.get(
53+
"http://images.cocodataset.org/val2017/000000039769.jpg", stream=True
54+
).raw
55+
)
56+
57+
# initialize the vlm using the `model_impl="transformers"`
58+
vlm = LLM(
59+
model="llava-hf/llava-onevision-qwen2-0.5b-ov-hf",
60+
model_impl="transformers",
61+
disable_mm_preprocessor_cache=True, # we disable the mm preprocessor cache for the time being
62+
enable_prefix_caching=False,
63+
enable_chunked_prefill=False
64+
)
65+
66+
outputs = vlm.generate(
67+
{
68+
"prompt": prompt,
69+
"multi_modal_data": {"image": image},
70+
},
71+
sampling_params=SamplingParams(max_tokens=100)
72+
)
73+
74+
for o in outputs:
75+
generated_text = o.outputs[0].text
76+
print(generated_text)
77+
78+
# OUTPUTS:
79+
# In the tranquil setting of this image, two feline companions are enjoying a peaceful slumber on a
80+
# cozy pink couch. The couch, adorned with a plush red fabric across the seating area, serves as their perfect resting place.
81+
#
82+
# On the left side of the couch, a gray tabby cat is curled up at rest, its body relaxed in a display
83+
# of feline serenity. One paw playfully stretches out, perhaps in mid-jump or simply exploring its surroundings.
84+
```
85+
2386
## Transformers and vLLM: Inference in Action
2487

2588
Let’s start with a simple text generation task using the `meta-llama/Llama-3.2-1B` model to see how

0 commit comments

Comments
 (0)