Skip to content

[Enh]: Embedding Phase 1 & 2: Internal subsystem for embedding text. #3103

@JerryNixon

Description

@JerryNixon

What?

A subsystem in DAB to fetch embeddings from text.

Note

Handle chunking natively.

Overview

  1. Provide the embedding subsystem
  2. Provide the /embed endpoint
  3. Provide a chunking mechanism
  4. Cache embeddings in L1 & L2 cache

Why?

For the sake of future use around cache and parameter substitution.

Configuration (runtime.embeddings)

{
  "runtime": {
    "embeddings": {
      "enabled": true, 
      "provider": "azure-openai", // Enum: azure-openai | openai
      "base-url": "@env('EMBEDDINGS_ENDPOINT')",
      "api-key": "@env('EMBEDDINGS_API_KEY')",
      "model": "@env('EMBEDDINGS_MODEL')",
      "dimensions": 1536,
      "timeout-ms": 30000,
      "api-version": "@env('EMBEDDINGS_API_VERSION')",
      "chunking": {
        "enabled": true,
        "size-chars": 800,
        "overlap-chars": 100
      },
      "cache": {
        "enabled": true,
        "level": "L1", // Enum: L1 | L1L2
        "ttl-seconds": 86400 
      },
      "endpoint": {
        "enabled": false,
        "path": "/embed",
        "roles": ["authenticated"]
      },
      "health": {
        "enabled": false, // Enable embedding health checks
        "threshold-ms": 1000, // Max acceptable latency in milliseconds
        "test-text": "health check", // Input used to validate provider
        "expected-dimensions": 1536 // Expected embedding vector size
      }
    }
  }
}

Properties

Parameter Type Required Default Description
enabled bool No true Master toggle for embedding subsystem
provider string Yes* - azure-openai or openai
base-url string Yes* - Provider base URL
api-key string Yes* - Authentication key (supports @env())
model string Yes* - Deployment name (Azure) or model name (OpenAI)
dimensions int No 1536 Embedding vector size
timeout-ms int No 30000 HTTP request timeout in milliseconds (min: 1, max: 300000)
api-version string No 2023-05-15 Azure OpenAI API version (Azure only)
chunking.enabled bool No true Enable automatic chunking
chunking.size-chars int No 800 Chunk size in characters
chunking.overlap-chars int No 100 Overlap between chunks in characters
cache.enabled bool No true Enable embedding cache
cache.level string No L1 Cache level: L1 (in-memory) or L1L2 (memory + distributed)
cache.ttl-seconds int No 86400 Cache time to live in seconds
endpoint.enabled bool No false Whether to expose /embed REST endpoint
endpoint.path string No /embed Embed endpoint path
endpoint.roles string[] No ["authenticated"] Roles allowed to access endpoint
health.enabled bool No false Enable embedding health checks
health.threshold-ms int No 1000 Max acceptable latency for health check in milliseconds
health.test-text string No "health check" Input text used to validate embedding provider
health.expected-dimensions int No - Expected embedding vector size for validation

* Required only when enabled: true

Rules

runtime.embeddings
- when enabled=true:
  - provider, base-url, api-key, and model must all be present

provider-specific
- api-version is valid only when provider=azure-openai
- api-version should be ignored or rejected when provider=openai

chunking
- chunking.overlap-chars must be less than chunking.size-chars

endpoint
- endpoint.path should not contain a query string
- endpoint.path should only contain a fragment
- endpoint.roles should contain unique values

Endpoint

Query Parameters (Override Defaults)

  • $chunking.size-chars=512 — Override chunk size
  • $chunking.overlap-chars=50 — Override overlap
  • $chunking.enabled=false — Disable chunking (treat entire text as one)
POST https://{server}/embed?$chunking.size-chars=512&$chunking.overlap-chars=50
Content-Type: application/json

Request

[
  {
    "key": "document-1",
    "text": "Long document text here that may exceed chunk size..."
  },
  {
    "key": "document-2",
    "text": "Another document..."
  }
]

Response

[
  {
    "key": "document-1",
    "data": [ [0.123, 0.456, ...], [0.789, 0.101, ...] ]
  },
  {
    "key": "document-2",
    "data": [ [0.111, 0.222, ...] ]
  }
]

Command Line

dab configure --runtime.embeddings.enabled true
dab configure --runtime.embeddings.provider "azure-openai"
dab configure --runtime.embeddings.base-url "@env('EMBEDDINGS_ENDPOINT')"
dab configure --runtime.embeddings.api-key "@env('EMBEDDINGS_API_KEY')"
dab configure --runtime.embeddings.model "@env('EMBEDDINGS_MODEL')"
dab configure --runtime.embeddings.dimensions 1536
dab configure --runtime.embeddings.timeout-ms 30000
dab configure --runtime.embeddings.api-version "@env('EMBEDDINGS_API_VERSION')"

dab configure --runtime.embeddings.chunking.enabled true
dab configure --runtime.embeddings.chunking.size-chars 800
dab configure --runtime.embeddings.chunking.overlap-chars 100

dab configure --runtime.embeddings.cache.enabled true
dab configure --runtime.embeddings.cache.level "L1"
dab configure --runtime.embeddings.cache.ttl-seconds 86400

dab configure --runtime.embeddings.endpoint.enabled false
dab configure --runtime.embeddings.endpoint.path /embed
dab configure --runtime.embeddings.endpoint.roles "authenticated"

REST Endpoint

  • Optionally exposes /embed for on-demand embedding, secured by role
  • When host.mode is development, anonymous is automatically allowed

Health, Hot Reload, Startup

  • Startup validation fails fast if enabled: true and any required field is missing
  • Health check covers embedding provider connectivity
  • Listens for hot-reload events and reloads configuration

REST API Reference

Azure OpenAI:

  • URL: POST {base-url}/openai/deployments/{model}/embeddings?api-version={api-version}
  • Headers: api-key: {api-key}
  • Body: { "input": string | string[], "dimensions": 1536 }

OpenAI:

  • URL: POST {base-url}/v1/embeddings
  • Headers: Authorization: Bearer {api-key}
  • Body: { "model": "{model}", "input": string | string[], "dimensions": 1536 }

Metadata

Metadata

Projects

Status

Todo

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions