Switch embeddings to AI Gateway Qwen3-Embedding-4B; keep Together reranking

## Summary
Switch all text embeddings to Vercel AI Gateway using `alibaba/qwen3-embedding-4b` as the canonical embedding model, while keeping Together.ai for reranking via AI SDK. This finalizes the decision to migrate off Together embeddings for higher quality and consistent gateway routing.

## Decision (final)
- **Embeddings:** Use AI Gateway model ID `alibaba/qwen3-embedding-4b` for all embedding generation (API, RAG, memory). The model page shows `embed` usage with this exact ID. 
- **Reranking:** Keep `@ai-sdk/togetherai` for reranking (Mixedbread `Mxbai-Rerank-Large-V2`) as today.
- **Dimensions:** Update all embedding dimensionality expectations and storage to **2560** (Qwen3-Embedding-4B).

## Background / current state
- Embedding model selection lives in `src/lib/ai/embeddings/text-embedding-model.ts` and is currently `openai/text-embedding-3-small` (1536-d) when AI Gateway/OpenAI keys are present; otherwise deterministic 1536-d fallback.
- Embeddings are used in `/api/embeddings`, RAG indexing and retrieval, and memory search (`src/lib/rag/*`, `src/lib/memory/*`).
- Reranking is already implemented via AI SDK `rerank()` with Together.ai in `src/lib/rag/reranker.ts`.

## Rationale
- Vercel AI Gateway lists `alibaba/qwen3-embedding-4b` and provides direct `embed` usage with that model ID. 
- AI SDK `embed` / `embedMany` are the canonical embedding APIs; AI SDK `rerank` is the canonical reranking API. 
- **User-provided research summary (UNVERIFIED)** indicates Qwen3-Embedding-4B outperforms older 2023–2024 embedding models on MTEB (English and multilingual) with stronger retrieval, and offers instruction-tuned multilingual/code performance.

## Implementation spec
### 1) Embedding model + dimensions
- Set `TEXT_EMBEDDING_MODEL_ID = "alibaba/qwen3-embedding-4b"` in `src/lib/ai/embeddings/text-embedding-model.ts`.
- Set `TEXT_EMBEDDING_DIMENSIONS = 2560` and align deterministic fallback to 2560-d (or remove deterministic fallback if it conflicts with production expectations).
- **Preferred:** When `AI_GATEWAY_API_KEY` is set, always use the gateway string ID. Decide whether to **remove OpenAI fallback** to avoid silent model divergence; if kept, it must be explicit and documented.

### 2) Update embedding call sites
- Ensure all usages of `getTextEmbeddingModel()` / `TEXT_EMBEDDING_DIMENSIONS` are updated and validated:
  - `src/app/api/embeddings/route.ts`
  - `src/lib/rag/indexer.ts` (embedMany)
  - `src/lib/rag/retriever.ts`
  - `src/lib/memory/supabase-adapter.ts`
  - Any other embedding usage surfaced by `rg "embed|embedMany|TEXT_EMBEDDING_DIMENSIONS"`.
- Verify cache keys incorporating model ID (e.g., RAG search) include the new ID to prevent collisions.

### 3) Database + schema migration
- Update pgvector columns from `vector(1536)` → `vector(2560)`:
  - `public.accommodation_embeddings.embedding`
  - `public.rag_documents.embedding`
  - `memories.turn_embeddings.embedding`
  - RPCs: `match_accommodation_embeddings`, `match_rag_documents`, `hybrid_rag_search`, `match_turn_embeddings`.
- Rebuild HNSW/IVFFlat indexes and revisit `ef_search` defaults for 2560-d vectors.
- Update Supabase types (`src/lib/supabase/database.types.ts`) and schema sources (`src/domain/schemas/supabase.ts`).

### 4) Backfill / reindex plan
- Provide a migration/backfill step to re-embed existing rows and rebuild indexes.
- Reindex:
  - RAG documents
  - accommodation embeddings
  - memory turn embeddings
- Document batching, timeouts, and rate limits to avoid runaway costs.

### 5) Docs & configuration
- Update runbooks + quick-start docs that reference 1536 dims and `text-embedding-3-small`.
- Update architecture docs that mention `vector(1536)` or OpenAI embedding defaults.

### 6) Tests & telemetry
- Update unit tests that assert 1536 dimensions and any mocks to 2560.
- Ensure telemetry captures model ID and any provider metadata for gateway usage.

## Acceptance criteria
- All production embeddings use AI Gateway `alibaba/qwen3-embedding-4b`.
- All vector columns and RPCs accept 2560-d vectors and queries pass.
- Reranking remains Together-based and optional when `TOGETHER_AI_API_KEY` is absent.
- Backfill plan documented and executed for existing data.
- Tests and docs updated.

## Risks / mitigations
- **Larger vectors** increase storage/index costs and query latency; mitigate by tuning HNSW params and monitoring latency/recall.
- **Backfill time/cost**; mitigate with controlled batch sizes and off-peak execution.

## References
- https://vercel.com/ai-gateway/models/qwen3-embedding-4b
- https://vercel.com/docs/ai-gateway/models-and-providers/provider-options
- https://ai-sdk.dev/docs/ai-sdk-core/embeddings
- https://ai-sdk.dev/docs/ai-sdk-core/reranking

## User research summary (UNVERIFIED)
- Qwen3-Embedding-4B reportedly leads recent MTEB (English + multilingual) with strong retrieval gains over older models.
- Qwen3 models are instruction-aware, multilingual, and support long context; 4B variant chosen for highest quality.
- Together m2-bert-80M-32k-retrieval is considered older and uncompetitive vs 2025+ models.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Switch embeddings to AI Gateway Qwen3-Embedding-4B; keep Together reranking #666

Summary

Decision (final)

Background / current state

Rationale

Implementation spec

1) Embedding model + dimensions

2) Update embedding call sites

3) Database + schema migration

4) Backfill / reindex plan

5) Docs & configuration

6) Tests & telemetry

Acceptance criteria

Risks / mitigations

References

User research summary (UNVERIFIED)

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Switch embeddings to AI Gateway Qwen3-Embedding-4B; keep Together reranking #666

Description

Summary

Decision (final)

Background / current state

Rationale

Implementation spec

1) Embedding model + dimensions

2) Update embedding call sites

3) Database + schema migration

4) Backfill / reindex plan

5) Docs & configuration

6) Tests & telemetry

Acceptance criteria

Risks / mitigations

References

User research summary (UNVERIFIED)

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions