Enhancement: Add on-device text embeddings via Apple NaturalLanguage

Continuing the theme of exposing Apple's on-device ML stack through the afm API surface, I'd like to propose adding a `/v1/embeddings` endpoint. Starting scope is intentionally narrow: Apple's `NLContextualEmbedding` only — no MLX embedding backend, no third-party models. Once the surface is stable we can add more backends behind the same API.

## What this adds

### Embeddings (`/v1/embeddings`)
OpenAI-compatible embeddings endpoint backed by `NLContextualEmbedding` from the NaturalLanguage framework. Accepts a string or array of strings, returns float vectors plus usage counts. Rejects empty/whitespace inputs with 400.

Shipped model IDs:
- `apple-nl-contextual-en` — English
- `apple-nl-contextual-multi` — Latin-script multilingual (Apple's NL contextual multilingual model is Latin-script only; non-Latin scripts are out of scope for this backend)

Native dimension and max input length come from the framework at load time (`NLContextualEmbedding.dimension` / `maximumSequenceLength`) rather than being hard-coded, so the values stay correct across OS updates.

CLI: `afm embeddings -m apple-nl-contextual-en -i \"hello world\"`
CLI: `afm embeddings --list` (enumerates shipped model IDs)

### Model listing
Embedding models show up in `GET /v1/models` alongside chat models so existing OpenAI clients can discover them.

## Why this matters

Embeddings are the missing piece for a lot of local-first workflows — RAG, semantic search, clustering, dedup — and everything else in afm is already on-device. Adding embeddings through the same OpenAI-compatible interface means existing tooling (LangChain, LlamaIndex, custom scripts) works against afm without code changes.

`NLContextualEmbedding` runs on the ANE where available, requires no model download beyond what the OS ships, and has no network dependency. That's the right default for a local server.

## Implementation

One feature commit plus review-hardening follow-ups:

1. `EmbeddingsController` + `EmbeddingBackend` protocol + `NLContextualEmbeddingBackend` + `EmbeddingModelRegistry`
2. Request/response types (`EmbeddingRequest`, `EmbeddingResponse`, usage accounting)
3. CLI (`afm embeddings`) wired through ArgumentParser
4. Route registration and `/v1/models` integration
5. Unit tests for the controller and registry, plus `Scripts/test-embeddings.sh` and an `openai` Python client round-trip

Deliberately out of scope for this first cut:
- MLX embedding backend (sentence-transformers, BGE, etc.)
- Non-Latin script coverage
- Matryoshka / dimension truncation (the NL backend doesn't support it)
- Batch endpoint parity

Happy to open the PR against main once there's interest — branch is ready locally.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enhancement: Add on-device text embeddings via Apple NaturalLanguage #118

What this adds

Embeddings (`/v1/embeddings`)

Model listing

Why this matters

Implementation

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Enhancement: Add on-device text embeddings via Apple NaturalLanguage #118

Description

What this adds

Embeddings (/v1/embeddings)

Model listing

Why this matters

Implementation

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

Embeddings (`/v1/embeddings`)