HIPAA-compliant RAG (Retrieval-Augmented Generation) framework for building modular, open-source health data applications on AWS -- with pluggable vector backends including Amazon S3 Vectors (GA Dec 2025).
A production-grade, HIPAA-compliant RAG chatbot that lets patients query their personal health data across Apple HealthKit, FHIR R4, and legacy EHR systems. Designed for 10M+ daily users with $0 idle cost.
Key differentiators:
- Patient isolation by design --
patient_idinjected from JWT, never user input - PHI redaction before embedding -- raw PHI never enters the vector store
- Pluggable backends -- swap vector store, LLM, or embedder with one env var
- $0 idle cost -- S3 Vectors + Lambda + DynamoDB = pay only when queried
graph TB
Patient["Patient (Health App)"] -->|HTTPS| CF["CloudFront + WAF (optional edge layer)"]
CF -.-> APIGW["API Gateway"]
APIGW -->|Cognito JWT| Lambda["Lambda: Query Orchestrator"]
Lambda --> HR["Hybrid Retriever"]
HR --> VR["Vector Search (top 20)"]
HR --> BM["BM25 Keywords (top 20)"]
VR --> S3V["S3 Vectors / ChromaDB"]
BM --> S3V
Lambda --> RR["Reranker (top 5)"]
Lambda --> LLM["Claude Haiku 4.5"]
Lambda --> GR["Guardrails"]
GR --> PHI["PHI Check"]
GR --> TOPIC["Denied Topics"]
GR --> GROUND["Grounding"]
subgraph "HIPAA Controls (Architectural)"
ISO["Patient Isolation<br/>patient_id from JWT, never user input"]
REDACT["PHI Redaction<br/>Comprehend Medical before embedding"]
AUDIT["Audit Trail<br/>CloudTrail all API calls"]
end
| Decision | Rationale | ADR |
|---|---|---|
| S3 Vectors over OpenSearch/Qdrant | $0 idle, ~100ms latency, 2B vectors/index | ADR-001 |
| Cognita patterns, not codebase | Interface contracts adopted, archived codebase avoided | ADR-002 |
| DynamoDB over Aurora | Zero idle cost, Lambda-native, free tier | ADR-003 |
| Async queue at >500 QPS | SQS buffer + WebSocket for Bedrock throttle prevention | ADR-004 |
| Hybrid retrieval (vector + BM25) | Medical terminology needs exact match | ADR-005 |
| Claude Haiku 4.5 | Current model, $0.0045/query, lifecycle-aware | ADR-006 |
| Lambda inference optimisation | Provisioned concurrency, DLQ, context budget | ADR-007 |
# Clone
git clone https://github.com/melroyanthony/healthstream-rag.git
cd healthstream-rag
# Option A: Docker (recommended)
cd solution && docker compose up --build -d
curl -s http://localhost:8000/health | python3 -m json.tool
# Option B: Local dev
cd solution/backend
uv sync
MOCK_AUTH=true uv run uvicorn app.api.main:app --reload --port 8000
# Ingest sample data + query
curl -X POST http://localhost:8000/api/v1/ingest \
-H "Content-Type: application/json" \
-H "Authorization: Bearer synthetic-patient-001" \
-d '{"documents": [{"text": "Sleep session: sleep score 88, AHI 2.8", "source_type": "healthkit", "source_id": "s1"}]}'
curl -X POST http://localhost:8000/api/v1/query \
-H "Content-Type: application/json" \
-H "Authorization: Bearer synthetic-patient-001" \
-d '{"question": "What was my sleep score?"}'healthstream-rag/
├── problem/
│ └── problem.md # Problem statement, architecture overview, SDLC walkthrough
│
├── solution/ # All implementation artifacts
│ ├── backend/ # FastAPI application
│ │ ├── app/ # Application code
│ │ │ ├── api/ # Routes, query controller, Lambda handler
│ │ │ ├── core/ # Base interfaces (Cognita-inspired)
│ │ │ ├── vector_db/ # ChromaDB + S3 Vectors backends
│ │ │ ├── retrievers/ # Vector, BM25, hybrid retriever
│ │ │ ├── generators/ # Anthropic + Bedrock generators
│ │ │ ├── embedders/ # Local + Bedrock Titan embedders
│ │ │ ├── loaders/ # HealthKit, FHIR, EHR data loaders
│ │ │ ├── middleware/ # Patient isolation + PHI redaction
│ │ │ └── guardrails/ # PHI check, grounding, disclaimer
│ │ ├── tests/ # 35 unit tests
│ │ ├── data/ # Sample data + 15 golden test Q&A pairs
│ │ └── scripts/ # Evaluation, ingestion, Lambda packaging
│ │
│ ├── infra/terraform/ # AWS IaC (6 modules)
│ │ └── modules/ # networking, compute, storage, security, monitoring, edge
│ │
│ ├── docs/
│ │ ├── architecture/ # System design, OpenAPI, database schema
│ │ │ ├── c4/ # 6 C4 Mermaid diagrams
│ │ │ └── workspace.dsl # Structurizr DSL (canonical C4 source)
│ │ ├── decisions/ # 7 ADRs (001-007)
│ │ └── deployment/ # AWS deployment guide
│ │
│ ├── Makefile # dev, test, lint, docker, deploy, eval
│ ├── docker-compose.yml # Local dev stack
│ └── README.md # Detailed solution documentation
│
├── .github/ # CI/CD, issue templates, Copilot review config
│ ├── workflows/ # CI (tests + Docker), release (semantic versioning)
│ └── ISSUE_TEMPLATE/ # Bug, feature forms
│
├── LICENSE # MIT
├── CONTRIBUTING.md # Contribution guidelines
├── SECURITY.md # Vulnerability disclosure policy
└── README.md # This file
| Layer | Local Dev | Production (AWS) |
|---|---|---|
| API | FastAPI + Uvicorn | Lambda + API Gateway + Cognito |
| Vector Store | ChromaDB | S3 Vectors |
| LLM | Anthropic direct API | Bedrock Claude Haiku 4.5 |
| Embeddings | sentence-transformers (384d) | Bedrock Titan V2 (1024d) |
| BM25 Retrieval | ChromaDB corpus | DynamoDB corpus |
| PHI Redaction | Regex patterns | AWS Comprehend Medical |
| Auth | Mock (Bearer token) | Cognito JWT |
| IaC | Docker Compose | Terraform (6 modules) |
All configuration via environment variables. Copy the appropriate profile to .env (used by both uv run and docker compose):
# Local dev (ChromaDB + Anthropic)
cp solution/backend/.env.local solution/backend/.env
# AWS production (S3 Vectors + Bedrock)
cp solution/backend/.env.aws.example solution/backend/.env| Variable | Default | Description |
|---|---|---|
VECTOR_BACKEND |
chroma |
Vector store: chroma, s3vectors |
LLM_BACKEND |
anthropic |
LLM: anthropic, bedrock |
EMBEDDER_BACKEND |
local |
Embedder: local, bedrock |
ANTHROPIC_API_KEY |
(empty) | Anthropic API key (leave blank for mock) |
MOCK_AUTH |
true |
Use mock JWT authentication |
AWS_REGION |
eu-west-1 |
AWS region for production services |
| Document | Description |
|---|---|
| C4 Context | System context -- patients, clinicians, data sources |
| C4 Container | Containers -- API GW, Query Orchestrator, data stores |
| C4 Component: Query | RAG pipeline internals |
| C4 Component: Ingestion | Ingestion pipeline |
| C4 Deployment | AWS deployment topology |
| HIPAA Controls | 4-layer defense model |
| System Design | Scale analysis, patterns, trade-offs |
| OpenAPI Spec | 8 endpoints, full schemas |
| Database Schema | Vector store + DynamoDB tables |
| AWS Deployment Guide | Step-by-step deploy |
cd solution/backend
# Unit tests (35 tests, ~5s)
MOCK_AUTH=true uv run pytest tests/ -v
# RAGAS evaluation (15 golden Q&A pairs)
MOCK_AUTH=true uv run python scripts/evaluate.py
# E2E happy path (requires running server)
bash ../scripts/test-e2e.sh| Test Suite | Count | What It Validates |
|---|---|---|
| Unit tests | 34 | Health, query, ingest, collections, vector DB, patient isolation, PHI redaction, guardrails |
| RAGAS eval | 15 | Faithfulness, answer relevancy, context precision, context recall, PHI leakage (=0), patient isolation (PASS) |
| E2E | 9 | Full CRUD flow against running API |
See CONTRIBUTING.md for development setup, code standards, and pull request process.
See SECURITY.md for vulnerability disclosure policy and HIPAA security design.
Melroy Anthony -- AI Architect & Lead Software Engineer | Dublin, Ireland
Architecture designed for patient impact -- not dashboards.
Built with Claude Code