Skip to content

RAG demo improvements: model picker, haystack fixes#1775

Merged
chocobar merged 12 commits intomainfrom
rag-demo
Mar 2, 2026
Merged

RAG demo improvements: model picker, haystack fixes#1775
chocobar merged 12 commits intomainfrom
rag-demo

Conversation

@chocobar
Copy link
Collaborator

@chocobar chocobar commented Feb 27, 2026

Summary

  • Fix haystack source metadata preservation (web page references no longer 404)
  • Add RAG embedding model configuration to Dashboard > System Settings (mirrors Kodit pattern)
  • Haystack sends rag-embedding placeholder via socket; Go handler substitutes with admin-configured provider/model
  • Remove direct API fallback from haystack embeddings — all requests go through Helix socket for accounting
  • Add shared_preload_libraries to pgvector service for vchord extensions

Test plan

  • cd api && go build ./... compiles
  • cd frontend && yarn build compiles
  • go test -run TestOpenAIEmbeddingsSuite ./pkg/server/ — 4 tests pass (including 2 new placeholder substitution tests)
  • Dashboard > System Settings shows new "RAG Embedding Model" row
  • Select an embedding model (e.g., openai/text-embedding-3-small) — saves and displays correctly
  • Upload a knowledge source PDF — haystack embeds via socket using configured model
  • Clear the setting — shows "Not configured" warning
  • Web page knowledge source references link correctly (no 404)

🤖 Generated with Claude Code

chocobar and others added 12 commits February 27, 2026 13:03
The haystack service used VLLM_BASE_URL and VLLM_API_KEY internally,
even though it just needs any OpenAI-compatible embeddings API. The
docker-compose files were renaming RAG_HAYSTACK_EMBEDDINGS_* to VLLM_*,
making it impossible to trace what config was actually being used.

Now the same RAG_HAYSTACK_EMBEDDINGS_API_BASE_URL and
RAG_HAYSTACK_EMBEDDINGS_API_KEY names flow straight through from .env
to docker-compose to the python service with no renaming.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Instead of hardcoding claude-opus-4-6/anthropic as the default model,
show the AdvancedModelPicker dialog when clicking "New Agent" so users
can select any available provider+model. This fixes setups where
Anthropic isn't configured and the server rejects the request.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When the crawler is enabled but a URL points to a document (PDF, DOCX,
etc.), extract it directly via Unstructured instead of sending it through
the HTML crawler which silently fails on non-HTML content.

Detection uses two strategies: file extension check (fast path) then a
HEAD request to inspect Content-Type for ambiguous URLs like
arxiv.org/pdf/2602.23242 that have no extension.

Also fixes unsafe mutation of Knowledge.Source.Web.URLs by cloning the
struct before passing to the crawler.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The enabled and disabled model rendering paths were near-identical
(~150 lines each). Collapse into a single listItemContent, conditionally
wrapped in a Tooltip for disabled items. Pricing is hidden for disabled
models via a guard in the unified path.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The vchord_bm25-postgres image ships vchord, vchord_bm25, and vector
extensions but they must be loaded via shared_preload_libraries.
Without this, the haystack service fails at startup with
"vchord must be loaded via shared_preload_libraries".

Applied to both docker-compose.dev.yaml and docker-compose.yaml.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
URLs like arxiv.org/pdf/2602.23242 produce filename "2602.23242" with
no valid extension. Haystack uses the extension to determine the file
converter, so it fails with "Unable to get page count" when saving
a PDF with a .23242 suffix. Detect file type from content magic bytes
(e.g. %PDF header) and append the correct extension before sending.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Reverts commits 103b4a5 and 2f856a9. The URL classification
(extension-based + HEAD request) approach doesn't fully solve PDF
indexing from URLs - the extracted text still gets misidentified by
haystack. Will revisit with a different approach.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Prefer RAG_HAYSTACK_EMBEDDINGS_API_BASE_URL over HELIX_EMBEDDINGS_SOCKET
  when both are configured, fixing text-embedding-3-small not found error
- Preserve source metadata in process_and_index() instead of overwriting
  with filename, fixing 404s on web page knowledge source references
- Fail fast if neither embeddings backend is configured

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Reverts the API-over-socket priority change per feedback that all
embedding calls should go through helix for accounting. Socket is
the primary path; API URL is only a fallback when no socket is set.
Keeps the fail-fast ValueError when neither backend is configured.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Adds a new "RAG Embedding Model" setting in Dashboard > System Settings,
following the same placeholder model substitution pattern as Kodit.

Haystack sends "rag-embedding" as the model name via the Unix socket.
The Go embeddings handler looks up SystemSettings for the configured
provider + model and substitutes them before forwarding to the backend.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
All embedding requests must go through the Helix Unix socket for
accounting. Removes EMBEDDINGS_API_BASE_URL/EMBEDDINGS_API_KEY config,
the OpenAI embedder fallback path, and corresponding docker-compose
env vars. Socket is now validated once in __init__.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…model filter

Three pre-existing bugs exposed when haystack embedding requests were routed
through Helix middleware (after removing direct API fallback):

1. URL path was /v1/embeddings but baseURL already includes /v1, producing
   /v1/v1/embeddings (404 from OpenAI). Fixed to /embeddings.

2. OpenAI Python SDK (used by haystack) sends encoding_format: "base64" by
   default. Our response struct expects []float32, not base64 strings. Force
   encoding_format: "float" in the rag-embedding handler and as a client default.

3. OpenAI model list filter dropped text-embedding-* models. Added them to the
   allowed prefix list with type "embed" so AdvancedModelPicker can show them.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@chocobar chocobar merged commit c4f20a4 into main Mar 2, 2026
3 checks passed
@chocobar chocobar deleted the rag-demo branch March 2, 2026 11:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant