Add Qwen3-VL export support and example runner by seyeong-han · Pull Request #17572 · pytorch/executorch

seyeong-han · 2026-02-19T22:25:39Z

Companion PR - Optimum-Executorch - Add Qwen3-VL export support for multimodal text-to-text pipeline

Overview

Adds export and runtime support for Qwen3-VL-2B-Instruct, a 2.2B parameter vision-language model. Export goes through optimum-executorch via the existing multimodal-text-to-text task, producing a single .pte with vision_encoder, text_decoder, and token_embedding methods.

The optimum-executorch changes (in a companion PR) handle three Qwen3-VL-specific concerns during torch.export: pre-computing M-RoPE vision positions that use data-dependent ops, injecting position_ids via a forward hook so the text decoder export doesn't hit get_rope_index, and falling back to AutoModelForImageTextToText when AutoModelForPreTraining doesn't resolve.

This PR adds the ExecuTorch-side example:

examples/models/qwen3_vl/run_qwen3_vl.py — Python runtime that loads the .pte via ExecuTorchModule.run_method, driving token_embedding and text_decoder through ExecuTorch. The vision encoder runs in PyTorch eager because the portable runtime's aten::convolution.out does not yet support 5D inputs (Conv3d).
examples/models/qwen3_vl/README.md — Export command, runtime usage, method shapes, quantization config, and architecture notes.

Quantized model is ~1.4 GB (8da4w decoder, 8da4w/8da8w encoder, 8w embeddings).
Decode rate is ~25 tokens/sec on Apple Silicon M-series via XNNPACK.

Run Qwen3-VL-2B

python qwen3/run_qwen3_vl.py \
  --model_path qwen3/Qwen3-VL-2B-Instruct-xnnpack/model.pte \
  --image_path qwen3/test_image.jpg \
  --prompt "What is in this image?" \
  --max_new_tokens 200

Output

Image embeddings shape: torch.Size([651, 2048])
Prefill done: 667 tokens in 4.12s

Prompt: What is in this image?
--------------------------------------------------
Response: This image shows a wooden dock or pier extending into a calm, blue lake. The dock is constructed from weathered wooden planks and leads from the foreground toward the center of the lake. On the right side of the dock, there is a metal railing, and on the left side, there is a metal ladder. The lake is surrounded by a dense forest of green trees, and in the background, there are snow-capped mountains under a cloudy sky. The entire scene is captured in a wide, horizontal view, with the dock as the main focal point.
--------------------------------------------------
Prompt tokens:    666
Generated tokens: 114
Prefill time:     4.119s
Decode rate:      26.59 tokens/sec
Total time:       8.369s

This commit introduces the Qwen3-VL model, a vision-language model with a 2.2B parameter architecture. It includes a comprehensive README detailing prerequisites, export instructions, and usage examples. Additionally, a runtime script is provided to facilitate multimodal inference using ExecuTorch and PyTorch eager mode for the vision encoder. Key features: - Instructions for exporting the model using optimum-executorch. - Example usage for running inference with image and text inputs. - Details on exported methods and quantization configurations. This addition enhances the functionality of ExecuTorch for multimodal applications.

pytorch-bot · 2026-02-19T22:25:43Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/17572

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

⚠️ 8 Awaiting Approval

As of commit bac415e with merge base a24d3e7 ():

AWAITING APPROVAL - The following workflows need approval before CI can run:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

github-actions · 2026-02-19T22:26:23Z

This PR needs a `release notes:` label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

seyeong-han requested a review from lucylq as a code owner February 19, 2026 22:25

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Feb 19, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Qwen3-VL export support and example runner#17572

Add Qwen3-VL export support and example runner#17572
seyeong-han wants to merge 1 commit intopytorch:mainfrom
seyeong-han:qwen3-vl_xnnpack

seyeong-han commented Feb 19, 2026 •

edited

Loading

Uh oh!

pytorch-bot bot commented Feb 19, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Feb 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

Conversation

seyeong-han commented Feb 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Run Qwen3-VL-2B

Output

Uh oh!

pytorch-bot bot commented Feb 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/17572

⚠️ 8 Awaiting Approval

Uh oh!

github-actions bot commented Feb 19, 2026

This PR needs a release notes: label

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

seyeong-han commented Feb 19, 2026 •

edited

Loading

pytorch-bot bot commented Feb 19, 2026 •

edited

Loading

This PR needs a `release notes:` label