Add distributed infer for qwen3_vl_moe by Blaizzy · Pull Request #730 · Blaizzy/mlx-vlm

Blaizzy · 2026-02-13T13:22:56Z

No description provided.

mlx_vlm/utils.py

pcuenca

Getting 18 tps with my changes, on two M3 Ultra over Ethernet.

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

…tize_model and dequantize_model based on usage

Blaizzy · 2026-02-20T00:19:41Z

Thanks Pedro I'm getting 32 tok/s over thunderbolt!

pcuenca · 2026-02-20T19:15:48Z

Awesome! 🔥

pcuenca · 2026-02-23T16:45:40Z

mlx_vlm/utils.py

        model.shard(tensor_group)

-    mx.eval(model.language_model.parameters())
+    mx.eval(model.parameters())


This is still not working for me, did you revert the model.language_model change?

To be clear: mx.eval(model.language_model.parameters()) is what works for me.

I tried with evaluating the whole model and it works. Just like in the normal path.

What's was missing is model.eval()

My biggest issue is tight right now is that I'm highly limited in my setup to fully test things

In my setup, execution never reaches the new model.eval() you added.

Blaizzy added 4 commits February 11, 2026 00:39

add distributed infer for qwen3_vl_moe

02ea4c6

fix image handling

f2d193f

Merge branch 'main' into pc/qwen3-moe-dist

8ff9e6a

Merge branch 'main' into pc/qwen3-moe-dist

d7b9b5f

Blaizzy marked this pull request as ready for review February 14, 2026 16:22

Merge branch 'main' into pc/qwen3-moe-dist

f0d9c3b

Blaizzy mentioned this pull request Feb 17, 2026

Distributed inference for Kimi K2.5 #689

Open

pcuenca reviewed Feb 19, 2026

View reviewed changes

mlx_vlm/utils.py Show resolved Hide resolved

pcuenca reviewed Feb 19, 2026

View reviewed changes

mlx_vlm/utils.py Outdated Show resolved Hide resolved

pcuenca reviewed Feb 19, 2026

View reviewed changes

Blaizzy and others added 4 commits February 20, 2026 01:05

Update mlx_vlm/utils.py

862385f

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

Update mlx_vlm/utils.py

184a1b9

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

Merge branch 'main' into pc/qwen3-moe-dist

31a0727

Refactor import statements in convert.py to conditionally import quan…

598e46b

…tize_model and dequantize_model based on usage

eval model

a78374d

Add image processor loading in sharded_load function in utils.py

59e93f4

pcuenca reviewed Feb 23, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add distributed infer for qwen3_vl_moe #730

Add distributed infer for qwen3_vl_moe #730
Blaizzy wants to merge 11 commits intomainfrom
pc/qwen3-moe-dist

Blaizzy commented Feb 13, 2026

Uh oh!

Uh oh!

Uh oh!

pcuenca left a comment

Uh oh!

Blaizzy commented Feb 20, 2026

Uh oh!

pcuenca commented Feb 20, 2026

Uh oh!

pcuenca Feb 23, 2026

Uh oh!

pcuenca Feb 23, 2026

Uh oh!

Blaizzy Feb 23, 2026

Uh oh!

pcuenca Feb 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

Blaizzy commented Feb 13, 2026

Uh oh!

Uh oh!

Uh oh!

pcuenca left a comment

Choose a reason for hiding this comment

Uh oh!

Blaizzy commented Feb 20, 2026

Uh oh!

pcuenca commented Feb 20, 2026

Uh oh!

pcuenca Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

pcuenca Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

Blaizzy Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

pcuenca Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants