Your team spends hours searching SharePoint, Notion, and email threads for answers that already exist in your documents.
KnowledgeBase AI gives every employee instant, cited answers from your internal knowledge — in under 2 seconds, with full audit trail, and zero data leaving your infrastructure.
| Before | After | |
|---|---|---|
| Find a policy answer | 15–30 min search + email thread | 2 seconds, cited answer |
| Onboard a new hire | 40 hrs reading docs + Q&A | Day-1 chat access to all documents |
| Repeated questions | Same question, different person, every week | Self-serve, zero repeat tickets |
| Compliance audit | Manual document trail | Exported conversation + citation log |
ROI: 100 employees × 3 knowledge questions/day × 15 min average search time = 75 hours/week recovered. At $50/hr blended rate → $195,000/year in reclaimed productivity. KnowledgeBase AI cost: ~$0.003/query × 300 queries/day = $27/month.
| AI Chat with Inline Citations | Document Manager |
![]() |
![]() |
| Analytics Dashboard | Workspace Settings & RBAC |
![]() |
![]() |
| Login — Magic Link Auth | |
![]() |
|
flowchart TD
subgraph Ingestion["Ingestion Pipeline"]
A[Upload PDF / DOCX / URL] --> B[Supabase Storage]
B --> C[Inngest Step Function]
C --> D[Extract Text]
D --> E[Semantic Chunker\n400 token max]
E --> F[OpenAI Embeddings\ntext-embedding-3-small]
F --> G[(pgvector · chunks)]
G --> H[status: ready]
end
subgraph Query["RAG Query Pipeline"]
I[User Question] --> J[Embed Query\nRedis cache]
J --> K[Hybrid Search RPC\nBM25 + Vector + RRF]
K --> L[Top 20 Candidates]
L --> M[Cohere Rerank 3\nCross-encoder]
M --> N[Top 5 Chunks]
N --> O[Claude via AI Gateway\nclaude-sonnet-4.6]
O --> P[Streaming + Citations]
end
G -.->|hybrid_search RPC| K
sequenceDiagram
actor User
participant UI as Next.js UI
participant API as API Route
participant Stor as Supabase Storage
participant DB as Supabase DB
participant Inn as Inngest
participant OAI as OpenAI
User->>UI: Drop PDF / paste URL
UI->>API: POST /api/workspaces/{id}/documents
API->>Stor: Upload file
API->>DB: INSERT (status: pending)
API->>Inn: Send document/process event
API-->>UI: 200 { document }
Inn->>Inn: Step 1 — extract-text
Inn->>Inn: Step 2 — chunk-text (400 token max)
Inn->>OAI: Batch embed chunks
OAI-->>Inn: float32[1536] per chunk
Inn->>DB: INSERT chunks + UPDATE status: ready
DB-->>UI: Realtime update
erDiagram
organizations ||--o{ workspaces : "has"
workspaces ||--o{ workspace_members : "has"
workspaces ||--o{ documents : "contains"
workspaces ||--o{ conversations : "has"
documents ||--o{ chunks : "split into"
conversations ||--o{ messages : "has"
chunks {
uuid id PK
uuid document_id FK
vector_1536 embedding
tsvector fts
text content
text heading_context
}
messages {
uuid id PK
text content
uuid_array cited_chunk_ids
numeric faithfulness_score
numeric relevance_score
}
Most RAG tutorials use pure vector similarity. This uses three layers:
- Hybrid BM25 + vector search — a single Postgres RPC fuses keyword matches (BM25) and semantic matches (cosine) via Reciprocal Rank Fusion. A query like "FMLA leave policy section 4" returns the exact policy section even if the embedding distance is mediocre.
- Cross-encoder reranking (Cohere Rerank 3) — the top 20 candidates are re-scored by a model that reads both the query and the document together, not just their embeddings independently. This is the difference between a retrieval system and a good one.
- Semantic chunking — chunks preserve
heading_contextfrom the nearest preceding heading, so the model always knows what section a chunk belongs to even without surrounding text.
Every [1], [2], [3] in the streaming response is a live citation badge. Hover to see the source document name, page number, heading context, and Cohere confidence score. A sliding sources panel shows all citations for the current message side by side.
After every response, an Inngest step function fires a second Claude call (Haiku, not Sonnet — cheap) that scores the response on two dimensions: faithfulness (is every claim grounded in the retrieved context?) and relevance (does the answer actually address the question?). Scores appear as color-coded badges below assistant messages.
Each workspace is a hard boundary at the Postgres layer. Supabase RLS policies prevent any query from crossing workspace boundaries — even if application code has a bug. Tested by attempting direct REST calls across user accounts.
Every query is recorded in query_events with: user ID, workspace, retrieved chunk IDs, reranked chunk IDs, response token count, cost in USD, and latency. Conversations can be exported to PDF with citations intact — useful for compliance, onboarding audits, and knowledge handoffs.
- Hybrid BM25 + vector search with Reciprocal Rank Fusion — catches both semantic and keyword matches that either alone would miss
- Cross-encoder reranking (Cohere Rerank 3) — re-orders top 20 candidates by true semantic relevance, not just cosine distance
- Semantic chunking — paragraph-boundary aware, 400 token max, preserves
heading_contextfor each chunk
- Streaming with inline citation badges —
[1],[2]appear as content streams; HoverCard shows source + page + confidence % - Real-time document processing —
pending → processing → readyvia Supabase Realtime - Async RAG evaluation — faithfulness + relevance scores shown below each assistant response (Claude Haiku, fire-and-forget via Inngest)
- Conversation export to PDF — audit trail with cited document names per message (
@react-pdf/renderer)
- Multi-tenant workspaces — Supabase RLS enforces tenant isolation at the database layer; workspaces cannot read each other's data
- RBAC — owner / editor / viewer roles per workspace; checked on every API route
- Workspace system prompt — owners customize the assistant persona per workspace
- Analytics dashboard — queries/day chart, top cited documents, avg latency, 7-day cost (Recharts + TanStack Query)
- Soft deletes — documents are never hard-deleted; 30-day purge via Inngest nightly cron
| Layer | Technology | Why |
|---|---|---|
| Framework | Next.js 16 App Router + React 19 + TypeScript strict | App Router co-locates streaming API routes with UI; noUncheckedIndexedAccess catches RAG array bugs at compile time |
| Styling | Tailwind v4 + shadcn/ui (new-york) | shadcn gives copy-owned components — no version conflicts with Radix upgrades |
| AI Generation | Claude claude-sonnet-4.6 via Vercel AI Gateway (OIDC auth) |
Gateway adds <20ms routing overhead but gives provider failover, per-user cost attribution, and zero API key rotation |
| Embeddings | OpenAI text-embedding-3-small (1536d) — direct fetch, Redis-cached |
Best cost/quality ratio for RAG; SHA-256 Redis cache eliminates redundant embedding calls on repeated queries |
| Reranking | Cohere Rerank 3 — cross-encoder, graceful fallback to RRF order | Cross-encoders see the full (query, document) pair simultaneously — fundamentally more accurate than bi-encoder similarity |
| Database | Supabase PostgreSQL + pgvector (HNSW index, m=16) | Single service handles relational data, vector search, file storage, RLS, Realtime, and auth — zero infra sprawl |
| Background Jobs | Inngest v4 step functions | Idempotent retries per step, not per job — a failed embed step doesn't re-extract or re-chunk |
| Rate Limiting | Upstash Redis + @upstash/ratelimit (sliding window) |
Serverless-native; shares the same Redis instance as embedding cache |
| Charts | Recharts + TanStack Query | Lazy-loaded via next/dynamic — ~450KB deferred until first analytics visit |
| PDF Export | @react-pdf/renderer v4 |
Declarative React components render to PDF server-side; no headless browser required |
| Error Monitoring | Sentry (src/instrumentation.ts) |
Next.js 16 instrumentation hook captures both server and client errors in one DSN |
- Your data stays in your infrastructure — documents stored in Supabase (your account, your region)
- Zero training on your data — Claude API contract prohibits using API calls for model training
- Row-Level Security — database-enforced tenant isolation at the Postgres layer
- RBAC — owner / editor / viewer roles per workspace, enforced on every mutation
- Audit log — every query logged with user ID, timestamp, latency, and token cost in
query_events - SSRF protection — URL ingestion resolves hostnames via DNS and blocks all private IP ranges (
169.254.x.x,10.x.x.x,172.16-31.x.x,192.168.x.x, IPv6 link-local) before any network request is made (src/lib/rag/extractors/ssrf.ts) - Prompt injection defense — chunk content sanitized before insertion into Claude's context window; injection phrases replaced with
[redacted] - Rate limiting — 20 chat requests/user/min + 100 requests/IP/min + 10 uploads/user/10min with
Retry-Afterheaders - Security headers — CSP, X-Frame-Options, X-Content-Type-Options, Referrer-Policy, Permissions-Policy configured in
next.config.ts
See SETUP.md for the complete 10-command bootstrap.
git clone https://github.com/your-username/rag-knowledge-base
cd rag-knowledge-base
vercel link && vercel env pull .env.local
npx supabase db push
npm run db:seed
npm run devPrerequisites: Node.js 20+, a Vercel project, Supabase project, OpenAI API key (embeddings), Cohere API key (reranking).
All documentation visuals are reproducible from source. No running app or live API required for the screenshots:
npm run generate:all
# Outputs:
# docs/architecture.svg (Mermaid CLI, dark theme)
# docs/ingestion.svg
# docs/schema.svg
# docs/hero.png (Gemini Imagen 4.0 — requires GEMINI_API_KEY)
# docs/chat-interface.png (Playwright screenshot, no running app)
# docs/document-manager.png
# docs/analytics-dashboard.png
# docs/login-page.png
# docs/workspace-settings.pngsrc/
├── app/
│ ├── (auth)/login/ # Magic Link auth
│ ├── (app)/workspaces/[id]/
│ │ ├── chat/ # RAG chat interface
│ │ ├── documents/ # Document manager + Realtime status
│ │ ├── analytics/ # Usage analytics dashboard
│ │ └── settings/ # Workspace settings + member management
│ └── api/workspaces/[id]/
│ ├── chat/ # Streaming RAG endpoint
│ ├── documents/ # Upload + soft delete
│ ├── analytics/ # Query analytics
│ ├── members/ # RBAC invite/remove
│ ├── conversations/[id]/export # PDF conversation export
│ └── health/ # Unauthenticated health check
├── lib/
│ ├── rag/ # search, rerank, chunker, embedder, prompts
│ ├── inngest/functions/ # process-document, evaluate-response
│ ├── analytics/ # trackQueryEvent, getAnalytics
│ ├── pdf/ # @react-pdf/renderer template
│ ├── validation/ # Zod schemas per domain
│ ├── config.ts # All magic numbers centralized
│ └── errors.ts # Typed error classes
└── components/
├── chat/ # ChatInterface, MessageBubble, CitationBadge
├── documents/ # DocumentManager, UploadZone, DocumentList
├── analytics/ # AnalyticsDashboard, QueriesChart, KpiCard
└── settings/ # SettingsForm, MembersTable
Can I use this with my existing documents without re-uploading everything?
Not directly — documents must be processed through the ingestion pipeline to generate embeddings and chunks. You can automate bulk ingestion via the API route if you have many files. See scripts/seed-demo-docs.ts for the pattern.
Why Cohere Rerank and not just more vector search results? Vector similarity measures how close two embeddings are — it doesn't model the relationship between a query and a document. Cross-encoders (what Cohere Rerank uses) read both texts together and produce a relevance score. In practice, the top-ranked vector result is often not the most relevant chunk when the query is keyword-heavy or domain-specific. Reranking typically improves answer quality measurably on document-heavy knowledge bases.
What document types are supported? PDF, DOCX, TXT, MD, and any public URL (web scraping via Cheerio). File size limit: 20MB. URLs are SSRF-filtered before fetch.
Does this work without Cohere? Yes — if Cohere reranking fails (rate limit, bad key, API down), the system falls back gracefully to the raw RRF order from the hybrid search. You'll get slightly lower answer quality but the system stays live.
Can I deploy this without Vercel?
The AI Gateway integration uses Vercel OIDC for authentication, which requires a Vercel project. You can replace gateway('anthropic/claude-sonnet-4.6') with createAnthropic()('claude-sonnet-4-6') and remove the gateway dependency, then deploy anywhere Node.js runs.
Is there a hosted demo? No. This is a portfolio/reference project. Deploy it to your own infrastructure following SETUP.md.
See CONTRIBUTING.md for development setup and contribution guidelines.
MIT — see LICENSE.

![Chat interface showing streaming response with [1][2] citation badges and a sources panel](/RexOwenDev/rag-chat-app/raw/main/docs/chat-interface.png)



