Continuing the theme of exposing Apple's on-device ML stack through the afm API surface, I'd like to propose adding a /v1/embeddings endpoint. Starting scope is intentionally narrow: Apple's NLContextualEmbedding only — no MLX embedding backend, no third-party models. Once the surface is stable we can add more backends behind the same API.
What this adds
Embeddings (/v1/embeddings)
OpenAI-compatible embeddings endpoint backed by NLContextualEmbedding from the NaturalLanguage framework. Accepts a string or array of strings, returns float vectors plus usage counts. Rejects empty/whitespace inputs with 400.
Shipped model IDs:
apple-nl-contextual-en — English
apple-nl-contextual-multi — Latin-script multilingual (Apple's NL contextual multilingual model is Latin-script only; non-Latin scripts are out of scope for this backend)
Native dimension and max input length come from the framework at load time (NLContextualEmbedding.dimension / maximumSequenceLength) rather than being hard-coded, so the values stay correct across OS updates.
CLI: afm embeddings -m apple-nl-contextual-en -i \"hello world\"
CLI: afm embeddings --list (enumerates shipped model IDs)
Model listing
Embedding models show up in GET /v1/models alongside chat models so existing OpenAI clients can discover them.
Why this matters
Embeddings are the missing piece for a lot of local-first workflows — RAG, semantic search, clustering, dedup — and everything else in afm is already on-device. Adding embeddings through the same OpenAI-compatible interface means existing tooling (LangChain, LlamaIndex, custom scripts) works against afm without code changes.
NLContextualEmbedding runs on the ANE where available, requires no model download beyond what the OS ships, and has no network dependency. That's the right default for a local server.
Implementation
One feature commit plus review-hardening follow-ups:
EmbeddingsController + EmbeddingBackend protocol + NLContextualEmbeddingBackend + EmbeddingModelRegistry
- Request/response types (
EmbeddingRequest, EmbeddingResponse, usage accounting)
- CLI (
afm embeddings) wired through ArgumentParser
- Route registration and
/v1/models integration
- Unit tests for the controller and registry, plus
Scripts/test-embeddings.sh and an openai Python client round-trip
Deliberately out of scope for this first cut:
- MLX embedding backend (sentence-transformers, BGE, etc.)
- Non-Latin script coverage
- Matryoshka / dimension truncation (the NL backend doesn't support it)
- Batch endpoint parity
Happy to open the PR against main once there's interest — branch is ready locally.
Continuing the theme of exposing Apple's on-device ML stack through the afm API surface, I'd like to propose adding a
/v1/embeddingsendpoint. Starting scope is intentionally narrow: Apple'sNLContextualEmbeddingonly — no MLX embedding backend, no third-party models. Once the surface is stable we can add more backends behind the same API.What this adds
Embeddings (
/v1/embeddings)OpenAI-compatible embeddings endpoint backed by
NLContextualEmbeddingfrom the NaturalLanguage framework. Accepts a string or array of strings, returns float vectors plus usage counts. Rejects empty/whitespace inputs with 400.Shipped model IDs:
apple-nl-contextual-en— Englishapple-nl-contextual-multi— Latin-script multilingual (Apple's NL contextual multilingual model is Latin-script only; non-Latin scripts are out of scope for this backend)Native dimension and max input length come from the framework at load time (
NLContextualEmbedding.dimension/maximumSequenceLength) rather than being hard-coded, so the values stay correct across OS updates.CLI:
afm embeddings -m apple-nl-contextual-en -i \"hello world\"CLI:
afm embeddings --list(enumerates shipped model IDs)Model listing
Embedding models show up in
GET /v1/modelsalongside chat models so existing OpenAI clients can discover them.Why this matters
Embeddings are the missing piece for a lot of local-first workflows — RAG, semantic search, clustering, dedup — and everything else in afm is already on-device. Adding embeddings through the same OpenAI-compatible interface means existing tooling (LangChain, LlamaIndex, custom scripts) works against afm without code changes.
NLContextualEmbeddingruns on the ANE where available, requires no model download beyond what the OS ships, and has no network dependency. That's the right default for a local server.Implementation
One feature commit plus review-hardening follow-ups:
EmbeddingsController+EmbeddingBackendprotocol +NLContextualEmbeddingBackend+EmbeddingModelRegistryEmbeddingRequest,EmbeddingResponse, usage accounting)afm embeddings) wired through ArgumentParser/v1/modelsintegrationScripts/test-embeddings.shand anopenaiPython client round-tripDeliberately out of scope for this first cut:
Happy to open the PR against main once there's interest — branch is ready locally.