Skip to content

Conversation

@bulutyigit
Copy link

PR Description

What
Adds ColQwen3 (Tomoro tomoro-colqwen3-embed-8b) support to mlx-vlm for multimodal retrieval embeddings.

Why
Tomoro ColQwen3 is a ColBERT-style multi-vector embedding model on top of a Qwen3-VL backbone.
Without native support, MLX users cannot convert/load the checkpoint or produce token-level embeddings.

Changes

  • Adds colqwen3 model type integration for convert/load
  • Implements embedding_proj_layer and encode() to output token-level embeddings [B,T,D]
  • Adds helper APIs:
    • encode_queries(processor, texts)
    • encode_images(processor, images) (returns visual-token embeddings; ideal for PDF patches)
    • maxsim(q, d) for ColBERT MaxSim scoring
  • Adds weight-key sanitization to map Tomoro/HF keys to MLX module names (vlm.model.*vlm.*)
  • Fixes hidden forward path to correctly respect masks (embedding extraction path)

Testing

  • Verified load(<HF repo>, trust_remote_code=True) works
  • Verified text embeddings + image embeddings produce valid shapes and MaxSim scores

Notes
This PR focuses on embedding usage, not generation. No changes to public generation APIs expected.

- Add colqwen3 model type for mlx_vlm.convert/load
- Implement ColBERT-style multi-vector embedding via embedding_proj_layer
- Add weight-key sanitization for Tomoro checkpoints (vlm.model.* -> vlm.*)
- Provide encode/encode_queries/encode_images helpers and MaxSim scoring
@Blaizzy
Copy link
Owner

Blaizzy commented Jan 2, 2026

Hey @bulutyigit,
Happy new year, this is an awesome addition!
I actually built a package called mlx-embeddings which would be the perfect home for this port. Would you mind redirecting the PR there? I'll review and merge it there.
Thanks for the contribution!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants