Commit e3f6861
authored
[GPU] Add IncreasePositionIdsPrecision for Qwen3-VL models (#34716)
### Description of the issue(symptom, root-cause, how it was resolved)
- Symptom: Qwen3-VL-4B-Instruct INT4 model produces incorrect output on
GPU for long input sequences (>2048 tokens). The 1st token prediction is
wrong, causing completely incoherent generated text. CPU output is
correct.
- Root-cause: The `position_ids` (integer values) are converted to FP16
before the frequency MatMul in the RoPE computation path. FP16 has only
10 mantissa bits, so integers in range 4096–8192 are rounded to the
nearest multiple of 4 (e.g., 4173→4172, 4174→4176). This corrupts the
sin/cos positional embeddings fed into every transformer layer. The
existing `IncreasePositionIdsPrecision` transformation has 4
model-specific patterns but none matches Qwen3-VL because: (1) Unsqueeze
is decomposed to Reshape by the frontend, and (2) the path between
MatMul and Sin/Cos includes a complex `Gather×3 → ScatterNDUpdate` chain
for 3D position assignment (temporal, height, width) that is unique to
Qwen3-VL.
- Resolution: Added `IncreasePositionIdsPrecisionForQwen3VL` matcher
pass that pattern-matches
`Convert→Reshape|Unsqueeze→Convert(i32→f16)→MatMul(Broadcast,...)`, then
uses forward BFS from MatMul to locate downstream Sin/Cos nodes. The
transformation upgrades the position_ids computation path from f16 to
f32, and inserts f32→f16 converts after Sin/Cos to restore original
precision at the boundary.
#### The code and line that caused this issue (if it is not changed
directly)
-
intel_gpu/src/plugin/transformations/increase_position_ids_precision.cpp
- `IncreasePositionIdsPrecision::run_on_model()`: the 4 existing
sub-passes (ForRoPE, ForQwen25VL, ForLtxVideo, ForGPTOSS) all failed to
match the Qwen3-VL graph pattern, so no precision upgrade was applied.
#### Reproduction step and snapshot (if applicable. Do not attach for
customer model)
- python genai/tools/llm_bench/benchmark.py -m Qwen3-VL-4B-Instruct/INT4
-d GPU.1 --task visual_text_gen -pf raw_prompt.jsonl -ic 128 -lc
config.json
- where config.json = {"ATTENTION_BACKEND": "PA", "CACHE_DIR": ""}
- Input: 5545 tokens (tool-calling prompt without image)
#### Problematic graph
- Qwen3-VL RoPE position_ids path in the language model subgraph:
<img width="509" height="1014" alt="image"
src="https://github.com/user-attachments/assets/2c5632a0-ad75-440e-b218-4f42f16f9726"
/>
- The fix changes Convert(i32→f16) to Convert(i32→f32), inserts
Convert(f16→f32) after Broadcast, and inserts Convert(f32→f16) after
Sin/Cos.
#### Checklist
- [x] Is it a proper fix? (not a workaround)
- [x] Did you include test case for this fix, if necessary?
- [x] Did you review existing test that can be extended to cover this
scenario? Which test did you review?
- Reviewed `IncreasePositionIdsPrecisionForQwen25VL` test and Added a
new dedicated test `IncreasePositionIdsPrecisionForQwen3VL`
### Tickets:
- [CVS-182656](https://jira.devtools.intel.com/browse/CVS-182656)
Signed-off-by: Andrew Park <andrew.park@intel.com>1 parent 6fab794 commit e3f6861
File tree
3 files changed
+215
-0
lines changed- src/plugins/intel_gpu
- src/plugin/transformations
- tests/unit/transformations
3 files changed
+215
-0
lines changedLines changed: 110 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
4 | 4 | | |
5 | 5 | | |
6 | 6 | | |
| 7 | + | |
| 8 | + | |
7 | 9 | | |
8 | 10 | | |
9 | 11 | | |
| |||
186 | 188 | | |
187 | 189 | | |
188 | 190 | | |
| 191 | + | |
| 192 | + | |
| 193 | + | |
| 194 | + | |
| 195 | + | |
| 196 | + | |
| 197 | + | |
| 198 | + | |
| 199 | + | |
| 200 | + | |
| 201 | + | |
| 202 | + | |
| 203 | + | |
| 204 | + | |
| 205 | + | |
| 206 | + | |
| 207 | + | |
| 208 | + | |
| 209 | + | |
| 210 | + | |
| 211 | + | |
| 212 | + | |
| 213 | + | |
| 214 | + | |
| 215 | + | |
| 216 | + | |
| 217 | + | |
| 218 | + | |
| 219 | + | |
| 220 | + | |
| 221 | + | |
| 222 | + | |
| 223 | + | |
| 224 | + | |
| 225 | + | |
| 226 | + | |
| 227 | + | |
| 228 | + | |
| 229 | + | |
| 230 | + | |
| 231 | + | |
| 232 | + | |
| 233 | + | |
| 234 | + | |
| 235 | + | |
| 236 | + | |
| 237 | + | |
| 238 | + | |
| 239 | + | |
| 240 | + | |
| 241 | + | |
| 242 | + | |
| 243 | + | |
| 244 | + | |
| 245 | + | |
| 246 | + | |
| 247 | + | |
| 248 | + | |
| 249 | + | |
| 250 | + | |
| 251 | + | |
| 252 | + | |
| 253 | + | |
| 254 | + | |
| 255 | + | |
| 256 | + | |
| 257 | + | |
| 258 | + | |
| 259 | + | |
| 260 | + | |
| 261 | + | |
| 262 | + | |
| 263 | + | |
| 264 | + | |
| 265 | + | |
| 266 | + | |
| 267 | + | |
| 268 | + | |
| 269 | + | |
| 270 | + | |
| 271 | + | |
| 272 | + | |
| 273 | + | |
| 274 | + | |
| 275 | + | |
| 276 | + | |
| 277 | + | |
| 278 | + | |
| 279 | + | |
| 280 | + | |
| 281 | + | |
| 282 | + | |
| 283 | + | |
| 284 | + | |
| 285 | + | |
| 286 | + | |
| 287 | + | |
| 288 | + | |
| 289 | + | |
| 290 | + | |
| 291 | + | |
| 292 | + | |
| 293 | + | |
| 294 | + | |
| 295 | + | |
| 296 | + | |
| 297 | + | |
189 | 298 | | |
190 | 299 | | |
191 | 300 | | |
| |||
338 | 447 | | |
339 | 448 | | |
340 | 449 | | |
| 450 | + | |
341 | 451 | | |
342 | 452 | | |
343 | 453 | | |
| |||
Lines changed: 6 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
20 | 20 | | |
21 | 21 | | |
22 | 22 | | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
23 | 29 | | |
24 | 30 | | |
25 | 31 | | |
| |||
Lines changed: 99 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
729 | 729 | | |
730 | 730 | | |
731 | 731 | | |
| 732 | + | |
| 733 | + | |
| 734 | + | |
| 735 | + | |
| 736 | + | |
| 737 | + | |
| 738 | + | |
| 739 | + | |
| 740 | + | |
| 741 | + | |
| 742 | + | |
| 743 | + | |
| 744 | + | |
| 745 | + | |
| 746 | + | |
| 747 | + | |
| 748 | + | |
| 749 | + | |
| 750 | + | |
| 751 | + | |
| 752 | + | |
| 753 | + | |
| 754 | + | |
| 755 | + | |
| 756 | + | |
| 757 | + | |
| 758 | + | |
| 759 | + | |
| 760 | + | |
| 761 | + | |
| 762 | + | |
| 763 | + | |
| 764 | + | |
| 765 | + | |
| 766 | + | |
| 767 | + | |
| 768 | + | |
| 769 | + | |
| 770 | + | |
| 771 | + | |
| 772 | + | |
| 773 | + | |
| 774 | + | |
| 775 | + | |
| 776 | + | |
| 777 | + | |
| 778 | + | |
| 779 | + | |
| 780 | + | |
| 781 | + | |
| 782 | + | |
| 783 | + | |
| 784 | + | |
| 785 | + | |
| 786 | + | |
| 787 | + | |
| 788 | + | |
| 789 | + | |
| 790 | + | |
| 791 | + | |
| 792 | + | |
| 793 | + | |
| 794 | + | |
| 795 | + | |
| 796 | + | |
| 797 | + | |
| 798 | + | |
| 799 | + | |
| 800 | + | |
| 801 | + | |
| 802 | + | |
| 803 | + | |
| 804 | + | |
| 805 | + | |
| 806 | + | |
| 807 | + | |
| 808 | + | |
| 809 | + | |
| 810 | + | |
| 811 | + | |
| 812 | + | |
| 813 | + | |
| 814 | + | |
| 815 | + | |
| 816 | + | |
| 817 | + | |
| 818 | + | |
| 819 | + | |
| 820 | + | |
| 821 | + | |
| 822 | + | |
| 823 | + | |
| 824 | + | |
| 825 | + | |
| 826 | + | |
| 827 | + | |
| 828 | + | |
| 829 | + | |
| 830 | + | |
732 | 831 | | |
733 | 832 | | |
734 | 833 | | |
| |||
0 commit comments