feat: Add video support for Qwen3-VL model #1968

pedrohenriqueamartins · 2026-01-16T13:35:42Z

What does this PR do ?

Add video support for Qwen3VLModel, enabling training with video inputs alongside images.

Changes

Remove blocking assertions for video inputs
Handle pixel_values_videos and video_grid_thw parameters
Concatenate image and video vision data when both are present
Split vision embeddings by video_start_index for proper masking
Combine image_mask | video_mask for deepstack processing

Testing

Successfully trained models with video data on Nemo-RL using the megatron backend.

copy-pr-bot · 2026-01-16T13:35:45Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

yaoyu-33 · 2026-01-16T17:11:44Z

@pedrohenriqueamartins : thanks for contribution, we are working something similar. Might cherrypick your change directly. We will need to do verification in mbridge sft pipeline as well to get this merged.

feat: Add video support for Qwen3-VL model

f25e65d

github-actions bot added the community-request label Jan 16, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add video support for Qwen3-VL model #1968

feat: Add video support for Qwen3-VL model #1968

Uh oh!

pedrohenriqueamartins commented Jan 16, 2026

Uh oh!

copy-pr-bot bot commented Jan 16, 2026

Uh oh!

yaoyu-33 commented Jan 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat: Add video support for Qwen3-VL model #1968

Are you sure you want to change the base?

feat: Add video support for Qwen3-VL model #1968

Uh oh!

Conversation

pedrohenriqueamartins commented Jan 16, 2026

What does this PR do ?

Changes

Testing

Uh oh!

copy-pr-bot bot commented Jan 16, 2026

Uh oh!

yaoyu-33 commented Jan 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants