-
Notifications
You must be signed in to change notification settings - Fork 12
Open
Description
Summary
Switch all text embeddings to Vercel AI Gateway using alibaba/qwen3-embedding-4b as the canonical embedding model, while keeping Together.ai for reranking via AI SDK. This finalizes the decision to migrate off Together embeddings for higher quality and consistent gateway routing.
Decision (final)
- Embeddings: Use AI Gateway model ID
alibaba/qwen3-embedding-4bfor all embedding generation (API, RAG, memory). The model page showsembedusage with this exact ID. - Reranking: Keep
@ai-sdk/togetheraifor reranking (MixedbreadMxbai-Rerank-Large-V2) as today. - Dimensions: Update all embedding dimensionality expectations and storage to 2560 (Qwen3-Embedding-4B).
Background / current state
- Embedding model selection lives in
src/lib/ai/embeddings/text-embedding-model.tsand is currentlyopenai/text-embedding-3-small(1536-d) when AI Gateway/OpenAI keys are present; otherwise deterministic 1536-d fallback. - Embeddings are used in
/api/embeddings, RAG indexing and retrieval, and memory search (src/lib/rag/*,src/lib/memory/*). - Reranking is already implemented via AI SDK
rerank()with Together.ai insrc/lib/rag/reranker.ts.
Rationale
- Vercel AI Gateway lists
alibaba/qwen3-embedding-4band provides directembedusage with that model ID. - AI SDK
embed/embedManyare the canonical embedding APIs; AI SDKrerankis the canonical reranking API. - User-provided research summary (UNVERIFIED) indicates Qwen3-Embedding-4B outperforms older 2023–2024 embedding models on MTEB (English and multilingual) with stronger retrieval, and offers instruction-tuned multilingual/code performance.
Implementation spec
1) Embedding model + dimensions
- Set
TEXT_EMBEDDING_MODEL_ID = "alibaba/qwen3-embedding-4b"insrc/lib/ai/embeddings/text-embedding-model.ts. - Set
TEXT_EMBEDDING_DIMENSIONS = 2560and align deterministic fallback to 2560-d (or remove deterministic fallback if it conflicts with production expectations). - Preferred: When
AI_GATEWAY_API_KEYis set, always use the gateway string ID. Decide whether to remove OpenAI fallback to avoid silent model divergence; if kept, it must be explicit and documented.
2) Update embedding call sites
- Ensure all usages of
getTextEmbeddingModel()/TEXT_EMBEDDING_DIMENSIONSare updated and validated:src/app/api/embeddings/route.tssrc/lib/rag/indexer.ts(embedMany)src/lib/rag/retriever.tssrc/lib/memory/supabase-adapter.ts- Any other embedding usage surfaced by
rg "embed|embedMany|TEXT_EMBEDDING_DIMENSIONS".
- Verify cache keys incorporating model ID (e.g., RAG search) include the new ID to prevent collisions.
3) Database + schema migration
- Update pgvector columns from
vector(1536)→vector(2560):public.accommodation_embeddings.embeddingpublic.rag_documents.embeddingmemories.turn_embeddings.embedding- RPCs:
match_accommodation_embeddings,match_rag_documents,hybrid_rag_search,match_turn_embeddings.
- Rebuild HNSW/IVFFlat indexes and revisit
ef_searchdefaults for 2560-d vectors. - Update Supabase types (
src/lib/supabase/database.types.ts) and schema sources (src/domain/schemas/supabase.ts).
4) Backfill / reindex plan
- Provide a migration/backfill step to re-embed existing rows and rebuild indexes.
- Reindex:
- RAG documents
- accommodation embeddings
- memory turn embeddings
- Document batching, timeouts, and rate limits to avoid runaway costs.
5) Docs & configuration
- Update runbooks + quick-start docs that reference 1536 dims and
text-embedding-3-small. - Update architecture docs that mention
vector(1536)or OpenAI embedding defaults.
6) Tests & telemetry
- Update unit tests that assert 1536 dimensions and any mocks to 2560.
- Ensure telemetry captures model ID and any provider metadata for gateway usage.
Acceptance criteria
- All production embeddings use AI Gateway
alibaba/qwen3-embedding-4b. - All vector columns and RPCs accept 2560-d vectors and queries pass.
- Reranking remains Together-based and optional when
TOGETHER_AI_API_KEYis absent. - Backfill plan documented and executed for existing data.
- Tests and docs updated.
Risks / mitigations
- Larger vectors increase storage/index costs and query latency; mitigate by tuning HNSW params and monitoring latency/recall.
- Backfill time/cost; mitigate with controlled batch sizes and off-peak execution.
References
- https://vercel.com/ai-gateway/models/qwen3-embedding-4b
- https://vercel.com/docs/ai-gateway/models-and-providers/provider-options
- https://ai-sdk.dev/docs/ai-sdk-core/embeddings
- https://ai-sdk.dev/docs/ai-sdk-core/reranking
User research summary (UNVERIFIED)
- Qwen3-Embedding-4B reportedly leads recent MTEB (English + multilingual) with strong retrieval gains over older models.
- Qwen3 models are instruction-aware, multilingual, and support long context; 4B variant chosen for highest quality.
- Together m2-bert-80M-32k-retrieval is considered older and uncompetitive vs 2025+ models.
Reactions are currently unavailable