From 7275d55c926217be373ffe3047757cf886f8a2ac Mon Sep 17 00:00:00 2001
From: ariG23498 <aritra.born2fly@gmail.com>
Date: Mon, 21 Jul 2025 18:21:36 +0530
Subject: [PATCH 1/9] chore: update blog with vlm support

Signed-off-by: ariG23498 <aritra.born2fly@gmail.com>
---
 _posts/2025-04-11-transformers-backend.md | 63 +++++++++++++++++++++++
 1 file changed, 63 insertions(+)

diff --git a/_posts/2025-04-11-transformers-backend.md b/_posts/2025-04-11-transformers-backend.md
index 84a08c3..5c25fdc 100644
--- a/_posts/2025-04-11-transformers-backend.md
+++ b/_posts/2025-04-11-transformers-backend.md
@@ -20,6 +20,69 @@ vLLM will therefore optimize throughput/latency on top of existing transformers
 In this post, we’ll explore how vLLM leverages the transformers backend to combine **flexibility**
 with **efficiency**, enabling you to deploy state-of-the-art models faster and smarter.
 
+## Updates
+
+This section will hold all the updates that have been taken place over the course of the first release of the blog psot (11th April 2025).
+
+### Support for Vision Language Models (21st July 2025)
+
+vLLM with the transformers backend now supports Vision Langauge Models. Here is how one would use
+the API.
+
+```python
+from vllm import LLM, SamplingParams
+from PIL import Image
+import requests
+from transformers import AutoProcessor
+
+model_id = "llava-hf/llava-onevision-qwen2-0.5b-ov-hf"
+hf_processor = AutoProcessor.from_pretrained(model_id) # required to dynamically update the chat template
+
+messages = [
+    {
+      "role": "user",
+      "content": [
+          {"type": "image", "url": "dummy_image.jpg"},
+          {"type": "text", "text": "What is the content of this image?"},
+        ],
+    },
+]
+prompt = hf_processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
+image = Image.open(
+    requests.get(
+        "http://images.cocodataset.org/val2017/000000039769.jpg", stream=True
+    ).raw
+)
+
+# initialize the vlm using the `model_impl="transformers"`
+vlm = LLM(
+    model="llava-hf/llava-onevision-qwen2-0.5b-ov-hf",
+    model_impl="transformers",
+    disable_mm_preprocessor_cache=True, # we disable the mm preprocessor cache for the time being
+    enable_prefix_caching=False,
+    enable_chunked_prefill=False
+)
+
+outputs = vlm.generate(
+    {
+        "prompt": prompt,
+        "multi_modal_data": {"image": image},
+    },
+    sampling_params=SamplingParams(max_tokens=100)
+)
+
+for o in outputs:
+    generated_text = o.outputs[0].text
+    print(generated_text)
+
+# OUTPUTS:
+# In the tranquil setting of this image, two feline companions are enjoying a peaceful slumber on a
+# cozy pink couch. The couch, adorned with a plush red fabric across the seating area, serves as their perfect resting place.
+#
+# On the left side of the couch, a gray tabby cat is curled up at rest, its body relaxed in a display
+# of feline serenity. One paw playfully stretches out, perhaps in mid-jump or simply exploring its surroundings.
+```
+
 ## Transformers and vLLM: Inference in Action
 
 Let’s start with a simple text generation task using the `meta-llama/Llama-3.2-1B` model to see how

From 09e614d217c9fe2ebf3351caee15b925c80653b8 Mon Sep 17 00:00:00 2001
From: Aritra Roy Gosthipaty <aritra.born2fly@gmail.com>
Date: Tue, 22 Jul 2025 10:23:32 +0530
Subject: [PATCH 2/9] Update _posts/2025-04-11-transformers-backend.md

Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: ariG23498 <aritra.born2fly@gmail.com>
---
 _posts/2025-04-11-transformers-backend.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/_posts/2025-04-11-transformers-backend.md b/_posts/2025-04-11-transformers-backend.md
index 5c25fdc..2063371 100644
--- a/_posts/2025-04-11-transformers-backend.md
+++ b/_posts/2025-04-11-transformers-backend.md
@@ -22,7 +22,7 @@ with **efficiency**, enabling you to deploy state-of-the-art models faster and s
 
 ## Updates
 
-This section will hold all the updates that have been taken place over the course of the first release of the blog psot (11th April 2025).
+This section will hold all the updates that have been taken place over the course of the first release of the blog post (11th April 2025).
 
 ### Support for Vision Language Models (21st July 2025)
 

From 402cad6afd1687407cd03e228d84baad83099087 Mon Sep 17 00:00:00 2001
From: Aritra Roy Gosthipaty <aritra.born2fly@gmail.com>
Date: Tue, 22 Jul 2025 10:24:00 +0530
Subject: [PATCH 3/9] Update _posts/2025-04-11-transformers-backend.md

Co-authored-by: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com>
Signed-off-by: ariG23498 <aritra.born2fly@gmail.com>
---
 _posts/2025-04-11-transformers-backend.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/_posts/2025-04-11-transformers-backend.md b/_posts/2025-04-11-transformers-backend.md
index 2063371..5123909 100644
--- a/_posts/2025-04-11-transformers-backend.md
+++ b/_posts/2025-04-11-transformers-backend.md
@@ -26,7 +26,7 @@ This section will hold all the updates that have been taken place over the cours
 
 ### Support for Vision Language Models (21st July 2025)
 
-vLLM with the transformers backend now supports Vision Langauge Models. Here is how one would use
+vLLM with the transformers backend now supports Vision Language Models. Here is how one would use
 the API.
 
 ```python

From d22035abc1cf2a90a22a0470f2c8fcdeaa6720ad Mon Sep 17 00:00:00 2001
From: ariG23498 <aritra.born2fly@gmail.com>
Date: Tue, 22 Jul 2025 10:25:33 +0530
Subject: [PATCH 4/9] review suggestions

Signed-off-by: ariG23498 <aritra.born2fly@gmail.com>
---
 _posts/2025-04-11-transformers-backend.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/_posts/2025-04-11-transformers-backend.md b/_posts/2025-04-11-transformers-backend.md
index 5123909..41823b4 100644
--- a/_posts/2025-04-11-transformers-backend.md
+++ b/_posts/2025-04-11-transformers-backend.md
@@ -22,7 +22,7 @@ with **efficiency**, enabling you to deploy state-of-the-art models faster and s
 
 ## Updates
 
-This section will hold all the updates that have been taken place over the course of the first release of the blog post (11th April 2025).
+This section will hold all the updates that have taken place since the blog post was first released (11th April 2025).
 
 ### Support for Vision Language Models (21st July 2025)
 

From 3a0c98b7ace199d21c0ea918018679c041e9a622 Mon Sep 17 00:00:00 2001
From: ariG23498 <aritra.born2fly@gmail.com>
Date: Tue, 22 Jul 2025 10:30:10 +0530
Subject: [PATCH 5/9] vb's suggestions

Signed-off-by: ariG23498 <aritra.born2fly@gmail.com>
---
 _posts/2025-04-11-transformers-backend.md | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/_posts/2025-04-11-transformers-backend.md b/_posts/2025-04-11-transformers-backend.md
index 41823b4..3f13b40 100644
--- a/_posts/2025-04-11-transformers-backend.md
+++ b/_posts/2025-04-11-transformers-backend.md
@@ -26,8 +26,10 @@ This section will hold all the updates that have taken place since the blog post
 
 ### Support for Vision Language Models (21st July 2025)
 
-vLLM with the transformers backend now supports Vision Language Models. Here is how one would use
-the API.
+vLLM with the transformers backend now supports **Vision Language Models**. When user adds `model_impl="transformers"`,
+the correct class for text-only and multimodality will be deduced and loaded.
+
+Here is how one would use the API.
 
 ```python
 from vllm import LLM, SamplingParams

From e7178f6803ec13354c0544db47548c1fabb860f2 Mon Sep 17 00:00:00 2001
From: ariG23498 <aritra.born2fly@gmail.com>
Date: Tue, 22 Jul 2025 16:58:59 +0530
Subject: [PATCH 6/9] adding openai consumption and serving

Signed-off-by: ariG23498 <aritra.born2fly@gmail.com>
---
 _posts/2025-04-11-transformers-backend.md | 39 ++++++++++++++++++++++-
 1 file changed, 38 insertions(+), 1 deletion(-)

diff --git a/_posts/2025-04-11-transformers-backend.md b/_posts/2025-04-11-transformers-backend.md
index 3f13b40..88ab936 100644
--- a/_posts/2025-04-11-transformers-backend.md
+++ b/_posts/2025-04-11-transformers-backend.md
@@ -29,7 +29,44 @@ This section will hold all the updates that have taken place since the blog post
 vLLM with the transformers backend now supports **Vision Language Models**. When user adds `model_impl="transformers"`,
 the correct class for text-only and multimodality will be deduced and loaded.
 
-Here is how one would use the API.
+Here is how one can serve a multimodal model using the transformers backend.
+```bash
+vllm serve llava-hf/llava-onevision-qwen2-0.5b-ov-hf \
+--model_impl transformers \
+--disable-mm-preprocessor-cache \
+--no-enable-prefix-caching \
+--no-enable-chunked-prefill
+```
+
+To consume the model one can use the `openai` API like so:
+```python
+from openai import OpenAI
+openai_api_key = "EMPTY"
+openai_api_base = "http://localhost:8000/v1"
+client = OpenAI(
+    api_key=openai_api_key,
+    base_url=openai_api_base,
+)
+chat_response = client.chat.completions.create(
+    model="llava-hf/llava-onevision-qwen2-0.5b-ov-hf",
+    messages=[{
+        "role": "user",
+        "content": [
+            {"type": "text", "text": "What's in this image?"},
+            {
+                "type": "image_url",
+                "image_url": {
+                    "url": "http://images.cocodataset.org/val2017/000000039769.jpg",
+                },
+            },
+        ],
+    }],
+)
+print("Chat response:", chat_response)
+```
+
+You can also directly initialize the vLLM engine using the `LLM` API. Here is the same model being
+served using the `LLM` API.
 
 ```python
 from vllm import LLM, SamplingParams

From c6bef35f709e22d0be3f09643703371cc3e6f5f2 Mon Sep 17 00:00:00 2001
From: Aritra Roy Gosthipaty <aritra.born2fly@gmail.com>
Date: Wed, 23 Jul 2025 07:16:32 +0530
Subject: [PATCH 7/9] Update _posts/2025-04-11-transformers-backend.md

Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: ariG23498 <aritra.born2fly@gmail.com>
---
 _posts/2025-04-11-transformers-backend.md | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/_posts/2025-04-11-transformers-backend.md b/_posts/2025-04-11-transformers-backend.md
index 88ab936..9eeaaf2 100644
--- a/_posts/2025-04-11-transformers-backend.md
+++ b/_posts/2025-04-11-transformers-backend.md
@@ -97,9 +97,6 @@ image = Image.open(
 vlm = LLM(
     model="llava-hf/llava-onevision-qwen2-0.5b-ov-hf",
     model_impl="transformers",
-    disable_mm_preprocessor_cache=True, # we disable the mm preprocessor cache for the time being
-    enable_prefix_caching=False,
-    enable_chunked_prefill=False
 )
 
 outputs = vlm.generate(

From 58ca04c741b5ba76b4140bad6c2a17e2a703e342 Mon Sep 17 00:00:00 2001
From: Aritra Roy Gosthipaty <aritra.born2fly@gmail.com>
Date: Wed, 23 Jul 2025 07:16:38 +0530
Subject: [PATCH 8/9] Update _posts/2025-04-11-transformers-backend.md

Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: ariG23498 <aritra.born2fly@gmail.com>
---
 _posts/2025-04-11-transformers-backend.md | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/_posts/2025-04-11-transformers-backend.md b/_posts/2025-04-11-transformers-backend.md
index 9eeaaf2..88691b9 100644
--- a/_posts/2025-04-11-transformers-backend.md
+++ b/_posts/2025-04-11-transformers-backend.md
@@ -33,9 +33,6 @@ Here is how one can serve a multimodal model using the transformers backend.
 ```bash
 vllm serve llava-hf/llava-onevision-qwen2-0.5b-ov-hf \
 --model_impl transformers \
---disable-mm-preprocessor-cache \
---no-enable-prefix-caching \
---no-enable-chunked-prefill
 ```
 
 To consume the model one can use the `openai` API like so:

From 5492bbbc5a7982f391749b7eb077e2af1172299a Mon Sep 17 00:00:00 2001
From: "dependabot[bot]" <49699333+dependabot[bot]@users.noreply.github.com>
Date: Tue, 22 Jul 2025 12:37:14 +0100
Subject: [PATCH 9/9] Bump nokogiri from 1.18.8 to 1.18.9 (#62)

Bumps [nokogiri](https://github.com/sparklemotion/nokogiri) from 1.18.8 to 1.18.9.
- [Release notes](https://github.com/sparklemotion/nokogiri/releases)
- [Changelog](https://github.com/sparklemotion/nokogiri/blob/main/CHANGELOG.md)
- [Commits](https://github.com/sparklemotion/nokogiri/compare/v1.18.8...v1.18.9)

---
updated-dependencies:
- dependency-name: nokogiri
  dependency-version: 1.18.9
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Signed-off-by: ariG23498 <aritra.born2fly@gmail.com>
---
 Gemfile.lock | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/Gemfile.lock b/Gemfile.lock
index 1c3abb9..6e67a2f 100644
--- a/Gemfile.lock
+++ b/Gemfile.lock
@@ -240,9 +240,9 @@ GEM
     minitest (5.25.4)
     net-http (0.6.0)
       uri
-    nokogiri (1.18.8-arm64-darwin)
+    nokogiri (1.18.9-arm64-darwin)
       racc (~> 1.4)
-    nokogiri (1.18.8-x86_64-linux-gnu)
+    nokogiri (1.18.9-x86_64-linux-gnu)
       racc (~> 1.4)
     octicons (19.15.1)
     octokit (4.25.1)