Benchmark: Nomic-embed-text (Ollama) vs Default SentenceTransformer — Results & Migration Path #668

doobidoo · 2026-04-08T10:51:03Z

doobidoo
Apr 8, 2026
Maintainer

Context

With external embedding API support merged (PR #386) and the embedding migration script available (#556), I benchmarked nomic-embed-text via Ollama against the default SentenceTransformer (all-MiniLM-L6-v2) on an M-series Mac.

Benchmark Results

Latency

Metric	Nomic (Ollama local)	Anthropic/OpenAI API
First call (warmup)	596ms	N/A
Single query (warm)	16-23ms	200-400ms
Batch (8 queries)	11.6ms/query	N/A

Embedding Dimensions

Nomic-embed-text: 768
all-MiniLM-L6-v2: 384
⚠️ Dimension mismatch requires full re-embedding via scripts/maintenance/migrate_embeddings.py

Similarity Quality

Test	Score	Pass?
"AI agent skill evolution" ↔ "Self-evolving skills"	0.867	✓
"Claude Code performance optimization" ↔ "Making Claude Code faster with bare flag"	0.574	Borderline
"Vector database for memory storage" ↔ "Embedding search in sqlite-vec"	0.539	Borderline
"AI agent skill evolution" ↔ "Cloud video transcoding" (dissimilar)	0.385	✓
"Memory consolidation" ↔ "Cold email outreach" (dissimilar)	0.370	✓

Observation: Nomic is strong on broad semantic similarity but weaker on domain-specific technical terms. The BM25 hybrid search (enabled by default, 0.3/0.7 weights) effectively compensates by catching exact keyword matches.

Cost

Nomic local: $0.00/month
API embeddings: ~$0.60/month at 1000 queries/day

Configuration

# Already supported via PR #386
export MCP_EXTERNAL_EMBEDDING_URL=http://localhost:11434/v1/embeddings
export MCP_EXTERNAL_EMBEDDING_MODEL=nomic-embed-text

# Migration from 384 to 768 dimensions
python scripts/maintenance/migrate_embeddings.py

Recommendation

Nomic-embed-text is a viable local alternative — faster than API calls, zero cost, decent quality. The dimension mismatch (768 vs 384) means migration is required. Best time to switch: during a major version upgrade or when re-embedding is needed anyway.

The already-active BM25 hybrid search compensates for embedding model weaknesses on exact keyword matches regardless of which model you use.

Questions for the Community

Has anyone run similar benchmarks with other local models (e.g., bge-base-en-v1.5, e5-small)?
Would a benchmark script in the repo be useful for users evaluating embedding alternatives?
Interest in documenting recommended Ollama + Nomic setup in the wiki?

doobidoo · 2026-04-08T10:59:26Z

doobidoo
Apr 8, 2026
Maintainer Author

⚠️ Important: Hybrid/Cloudflare Backend Incompatibility

After further investigation, Nomic-embed-text (and any external embedding API) is NOT compatible with the hybrid or cloudflare storage backends.

Root Cause

The Cloudflare backend hardcodes @cf/baai/bge-base-en-v1.5 (768-dim) via Workers AI. There is no way to inject an external embedding model into the Cloudflare path. The code in sqlite_vec.py explicitly detects hybrid/cloudflare backends and disables external embedding API support with a warning:

if storage_backend in ("hybrid", "cloudflare"):
    logger.warning("External embedding API not supported with hybrid/cloudflare backend...")
    external_api_url = None  # Disable external API

What This Means

Backend	External Embedding (Nomic/Ollama)	Embedding Source
`sqlite_vec`	✅ Works	Ollama / vLLM / TEI
`cloudflare`	❌ Blocked	Workers AI (`bge-base-en-v1.5`)
`hybrid`	❌ Blocked	Local ONNX/SentenceTransformer + Workers AI

The Blocker Is Architectural, Not Dimensional

Interestingly, the dimensions actually match: both Nomic-embed-text (768) and Cloudflare Workers AI bge-base-en-v1.5 (768) produce 768-dimensional vectors. This means that if the Cloudflare backend were refactored to accept external embeddings, Nomic vectors would be dimensionally compatible with the existing Vectorize index.

The real blocker is purely architectural: the Cloudflare Worker code calls Workers AI directly for embeddings, and there is no hook to substitute an external embedding source.

Note: The dimension mismatch only exists between the default local ONNX model (all-MiniLM-L6-v2, 384-dim) and Nomic (768-dim) — relevant when switching models on the sqlite_vec backend (requires re-embedding all memories).

Recommendation

For users on the hybrid backend (recommended for production), Nomic-embed-text is not a viable option today. The path forward would be:

Option A: Refactor the Cloudflare backend to accept external embeddings via API — the matching 768 dimensions mean Nomic vectors could slot into the existing Vectorize index without re-indexing
Option B: Use Nomic only in sqlite_vec mode and accept no cloud sync
Option C: Wait for Cloudflare to support custom embedding models in Workers AI

Option A is more feasible than initially thought, precisely because the dimensions already align. The refactor would primarily involve routing embedding generation through an external API instead of Workers AI, while keeping the Vectorize storage layer unchanged.

This should be documented more prominently in the external embeddings guide.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Benchmark: Nomic-embed-text (Ollama) vs Default SentenceTransformer — Results & Migration Path #668

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

Uh oh!

Benchmark: Nomic-embed-text (Ollama) vs Default SentenceTransformer — Results & Migration Path #668

Uh oh!

doobidoo Apr 8, 2026 Maintainer

Context

Benchmark Results

Latency

Embedding Dimensions

Similarity Quality

Cost

Configuration

Recommendation

Questions for the Community

Replies: 1 comment

Uh oh!

Uh oh!

doobidoo Apr 8, 2026 Maintainer Author

⚠️ Important: Hybrid/Cloudflare Backend Incompatibility

Root Cause

What This Means

The Blocker Is Architectural, Not Dimensional

Recommendation

doobidoo
Apr 8, 2026
Maintainer

doobidoo
Apr 8, 2026
Maintainer Author