support qwen25 vl w8a8 quantization #2778

wenba0 · 2025-09-05T07:29:52Z

What this PR does / why we need it?

support qwen25 vl w8a8 quantization

Does this PR introduce any user-facing change?

N/A

How was this patch tested?

vLLM version: v0.10.1.1
vLLM main: vllm-project/vllm@62f66be

gemini-code-assist

Code Review

This pull request adds support for W8A8 quantization for the Qwen2.5 Vision-Language model on Ascend hardware. The changes include adding new padding functions for quantized weights, updating the weight loading logic to handle these new parameters, and adding corresponding unit tests. My review has identified a few areas for improvement. There are a couple of opportunities to optimize the new padding functions by avoiding redundant reshape operations. More importantly, there appears to be an incomplete implementation in the weight loading logic for projection-related quantization parameters, which could be a potential bug. Please see the detailed comments for suggestions.

gemini-code-assist · 2025-09-05T07:30:51Z

vllm_ascend/models/qwen2_5_vl.py

+        data1 = data.reshape(
+            -1, 3, self.origin_hidden_size_per_attention_head, 1
+        )[:, :, :self.half_origin_hidden_size_per_attention_head, :] 
+        data2 = data.reshape(
+            -1, 3, self.origin_hidden_size_per_attention_head, 1
+        )[:, :, self.half_origin_hidden_size_per_attention_head:, :]


The data.reshape(...) operation is performed twice with the same arguments. This is inefficient. You can store the result of the reshape operation in a variable and reuse it to improve performance and readability.

reshaped_data = data.reshape( -1, 3, self.origin_hidden_size_per_attention_head, 1 ) data1 = reshaped_data[:, :, :self.half_origin_hidden_size_per_attention_head, :] data2 = reshaped_data[:, :, self.half_origin_hidden_size_per_attention_head:, :]

gemini-code-assist · 2025-09-05T07:30:51Z

vllm_ascend/models/qwen2_5_vl.py

+        data1 = data.reshape(
+            -1, 3, self.origin_hidden_size_per_attention_head
+        )[:, :, :self.half_origin_hidden_size_per_attention_head]
+        data2 = data.reshape(
+            -1, 3, self.origin_hidden_size_per_attention_head
+        )[:, :, self.half_origin_hidden_size_per_attention_head:]


Similar to pad_qkv_weight_scale_offset, the data.reshape(...) operation is performed twice here. This is inefficient. Please store the result of the reshape operation in a variable and reuse it.

reshaped_data = data.reshape( -1, 3, self.origin_hidden_size_per_attention_head ) data1 = reshaped_data[:, :, :self.half_origin_hidden_size_per_attention_head] data2 = reshaped_data[:, :, self.half_origin_hidden_size_per_attention_head:]

gemini-code-assist · 2025-09-05T07:30:51Z

vllm_ascend/models/qwen2_5_vl.py

+                if ("attn.proj.weight_scale" in name or "attn.proj.weight_offset" in name) and self.enable_pad:
+                    ...
+                elif ("attn.proj.deq_scale" in name or "attn.proj.quant_bias" in name) and self.enable_pad:
+                    ...


The conditional blocks for handling attn.proj.weight_scale, attn.proj.weight_offset, attn.proj.deq_scale, and attn.proj.quant_bias contain only an ellipsis (...). This suggests an incomplete implementation. If these projection-related quantization parameters require padding when self.enable_pad is true, this is a bug that could lead to incorrect model behavior. Please implement the necessary padding logic for these parameters, similar to how it's done for attn.qkv parameters, or add a comment explaining why they are intentionally skipped.

github-actions · 2025-09-05T08:15:05Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

Signed-off-by: lijiaojiao <[email protected]>

codecov · 2025-09-07T07:06:42Z

Codecov Report

❌ Patch coverage is 83.33333% with 11 lines in your changes missing coverage. Please review.
✅ Project coverage is 72.73%. Comparing base (4c90fa7) to head (26c4524).
⚠️ Report is 41 commits behind head on main.

Files with missing lines	Patch %	Lines
vllm_ascend/models/qwen2_5_vl.py	63.33%	11 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2778      +/-   ##
==========================================
- Coverage   72.99%   72.73%   -0.27%     
==========================================
  Files         153      153              
  Lines       21331    21343      +12     
==========================================
- Hits        15571    15523      -48     
- Misses       5760     5820      +60

Flag	Coverage Δ
unittests	`72.73% <83.33%> (-0.27%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

### What this PR does / why we need it? support qwen25 vl w8a8 quantization ### Does this PR introduce _any_ user-facing change? N/A ### How was this patch tested? - vLLM version: v0.10.1.1 - vLLM main: vllm-project/vllm@62f66be --------- Signed-off-by: lijiaojiao <[email protected]> Co-authored-by: lijiaojiao <[email protected]> Signed-off-by: Yizhou Liu <[email protected]>

### What this PR does / why we need it? support qwen25 vl w8a8 quantization ### Does this PR introduce _any_ user-facing change? N/A ### How was this patch tested? - vLLM version: v0.10.1.1 - vLLM main: vllm-project/vllm@62f66be --------- Signed-off-by: lijiaojiao <[email protected]> Co-authored-by: lijiaojiao <[email protected]> Signed-off-by: offline0806 <[email protected]>

gemini-code-assist bot reviewed Sep 5, 2025

View reviewed changes

wenba0 force-pushed the qwen25_vl_7b branch 2 times, most recently from 5bce20e to d1c4720 Compare September 5, 2025 07:43

github-actions bot added module:tests module:quantization labels Sep 5, 2025

lijiaojiao added 5 commits September 7, 2025 14:50

support qwen25 vl w8a8 quantization

0bab92e

Signed-off-by: lijiaojiao <[email protected]>

--amend

228e674

Signed-off-by: lijiaojiao <[email protected]>

--amend

643e3e8

Signed-off-by: lijiaojiao <[email protected]>

--amend

0fce7e5

Signed-off-by: lijiaojiao <[email protected]>

--amend

26c4524

Signed-off-by: lijiaojiao <[email protected]>

wenba0 force-pushed the qwen25_vl_7b branch from b66613f to 26c4524 Compare September 7, 2025 06:50

wangxiyuan approved these changes Sep 11, 2025

View reviewed changes

wangxiyuan merged commit bd3dede into vllm-project:main Sep 11, 2025
25 checks passed

Yikun mentioned this pull request Sep 20, 2025

[Bug]: Remove outofdate commits to improve perf test #3051

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

support qwen25 vl w8a8 quantization #2778

support qwen25 vl w8a8 quantization #2778

wenba0 commented Sep 5, 2025 •

edited

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Sep 5, 2025

Uh oh!

gemini-code-assist bot Sep 5, 2025

Uh oh!

gemini-code-assist bot Sep 5, 2025

Uh oh!

github-actions bot commented Sep 5, 2025

Uh oh!

codecov bot commented Sep 7, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

support qwen25 vl w8a8 quantization #2778

support qwen25 vl w8a8 quantization #2778

Conversation

wenba0 commented Sep 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Sep 5, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Sep 5, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Sep 5, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Sep 5, 2025

Uh oh!

codecov bot commented Sep 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Uh oh!

wenba0 commented Sep 5, 2025 •

edited

Loading

codecov bot commented Sep 7, 2025 •

edited

Loading