Skip to content

melroyanthony/healthstream-rag

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

HealthStream RAG

HIPAA-compliant RAG (Retrieval-Augmented Generation) framework for building modular, open-source health data applications on AWS -- with pluggable vector backends including Amazon S3 Vectors (GA Dec 2025).

CI Python 3.13 License: MIT AWS Built With


What Is This?

A production-grade, HIPAA-compliant RAG chatbot that lets patients query their personal health data across Apple HealthKit, FHIR R4, and legacy EHR systems. Designed for 10M+ daily users with $0 idle cost.

Key differentiators:

  • Patient isolation by design -- patient_id injected from JWT, never user input
  • PHI redaction before embedding -- raw PHI never enters the vector store
  • Pluggable backends -- swap vector store, LLM, or embedder with one env var
  • $0 idle cost -- S3 Vectors + Lambda + DynamoDB = pay only when queried

Architecture

graph TB
    Patient["Patient (Health App)"] -->|HTTPS| CF["CloudFront + WAF (optional edge layer)"]
    CF -.-> APIGW["API Gateway"]
    APIGW -->|Cognito JWT| Lambda["Lambda: Query Orchestrator"]

    Lambda --> HR["Hybrid Retriever"]
    HR --> VR["Vector Search (top 20)"]
    HR --> BM["BM25 Keywords (top 20)"]
    VR --> S3V["S3 Vectors / ChromaDB"]
    BM --> S3V

    Lambda --> RR["Reranker (top 5)"]
    Lambda --> LLM["Claude Haiku 4.5"]
    Lambda --> GR["Guardrails"]
    GR --> PHI["PHI Check"]
    GR --> TOPIC["Denied Topics"]
    GR --> GROUND["Grounding"]

    subgraph "HIPAA Controls (Architectural)"
        ISO["Patient Isolation<br/>patient_id from JWT, never user input"]
        REDACT["PHI Redaction<br/>Comprehend Medical before embedding"]
        AUDIT["Audit Trail<br/>CloudTrail all API calls"]
    end
Loading

Key Design Decisions

Decision Rationale ADR
S3 Vectors over OpenSearch/Qdrant $0 idle, ~100ms latency, 2B vectors/index ADR-001
Cognita patterns, not codebase Interface contracts adopted, archived codebase avoided ADR-002
DynamoDB over Aurora Zero idle cost, Lambda-native, free tier ADR-003
Async queue at >500 QPS SQS buffer + WebSocket for Bedrock throttle prevention ADR-004
Hybrid retrieval (vector + BM25) Medical terminology needs exact match ADR-005
Claude Haiku 4.5 Current model, $0.0045/query, lifecycle-aware ADR-006
Lambda inference optimisation Provisioned concurrency, DLQ, context budget ADR-007

Quick Start

# Clone
git clone https://github.com/melroyanthony/healthstream-rag.git
cd healthstream-rag

# Option A: Docker (recommended)
cd solution && docker compose up --build -d
curl -s http://localhost:8000/health | python3 -m json.tool

# Option B: Local dev
cd solution/backend
uv sync
MOCK_AUTH=true uv run uvicorn app.api.main:app --reload --port 8000

# Ingest sample data + query
curl -X POST http://localhost:8000/api/v1/ingest \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer synthetic-patient-001" \
  -d '{"documents": [{"text": "Sleep session: sleep score 88, AHI 2.8", "source_type": "healthkit", "source_id": "s1"}]}'

curl -X POST http://localhost:8000/api/v1/query \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer synthetic-patient-001" \
  -d '{"question": "What was my sleep score?"}'

Repository Structure

healthstream-rag/
├── problem/
│   └── problem.md                # Problem statement, architecture overview, SDLC walkthrough
│
├── solution/                     # All implementation artifacts
│   ├── backend/                  # FastAPI application
│   │   ├── app/                  # Application code
│   │   │   ├── api/              # Routes, query controller, Lambda handler
│   │   │   ├── core/             # Base interfaces (Cognita-inspired)
│   │   │   ├── vector_db/        # ChromaDB + S3 Vectors backends
│   │   │   ├── retrievers/       # Vector, BM25, hybrid retriever
│   │   │   ├── generators/       # Anthropic + Bedrock generators
│   │   │   ├── embedders/        # Local + Bedrock Titan embedders
│   │   │   ├── loaders/          # HealthKit, FHIR, EHR data loaders
│   │   │   ├── middleware/       # Patient isolation + PHI redaction
│   │   │   └── guardrails/       # PHI check, grounding, disclaimer
│   │   ├── tests/                # 35 unit tests
│   │   ├── data/                 # Sample data + 15 golden test Q&A pairs
│   │   └── scripts/              # Evaluation, ingestion, Lambda packaging
│   │
│   ├── infra/terraform/          # AWS IaC (6 modules)
│   │   └── modules/              # networking, compute, storage, security, monitoring, edge
│   │
│   ├── docs/
│   │   ├── architecture/         # System design, OpenAPI, database schema
│   │   │   ├── c4/               # 6 C4 Mermaid diagrams
│   │   │   └── workspace.dsl    # Structurizr DSL (canonical C4 source)
│   │   ├── decisions/            # 7 ADRs (001-007)
│   │   └── deployment/           # AWS deployment guide
│   │
│   ├── Makefile                  # dev, test, lint, docker, deploy, eval
│   ├── docker-compose.yml        # Local dev stack
│   └── README.md                 # Detailed solution documentation
│
├── .github/                      # CI/CD, issue templates, Copilot review config
│   ├── workflows/                # CI (tests + Docker), release (semantic versioning)
│   └── ISSUE_TEMPLATE/           # Bug, feature forms
│
├── LICENSE                       # MIT
├── CONTRIBUTING.md               # Contribution guidelines
├── SECURITY.md                   # Vulnerability disclosure policy
└── README.md                     # This file

Technology Stack

Layer Local Dev Production (AWS)
API FastAPI + Uvicorn Lambda + API Gateway + Cognito
Vector Store ChromaDB S3 Vectors
LLM Anthropic direct API Bedrock Claude Haiku 4.5
Embeddings sentence-transformers (384d) Bedrock Titan V2 (1024d)
BM25 Retrieval ChromaDB corpus DynamoDB corpus
PHI Redaction Regex patterns AWS Comprehend Medical
Auth Mock (Bearer token) Cognito JWT
IaC Docker Compose Terraform (6 modules)

Configuration

All configuration via environment variables. Copy the appropriate profile to .env (used by both uv run and docker compose):

# Local dev (ChromaDB + Anthropic)
cp solution/backend/.env.local solution/backend/.env

# AWS production (S3 Vectors + Bedrock)
cp solution/backend/.env.aws.example solution/backend/.env
Variable Default Description
VECTOR_BACKEND chroma Vector store: chroma, s3vectors
LLM_BACKEND anthropic LLM: anthropic, bedrock
EMBEDDER_BACKEND local Embedder: local, bedrock
ANTHROPIC_API_KEY (empty) Anthropic API key (leave blank for mock)
MOCK_AUTH true Use mock JWT authentication
AWS_REGION eu-west-1 AWS region for production services

Architecture Documentation

Document Description
C4 Context System context -- patients, clinicians, data sources
C4 Container Containers -- API GW, Query Orchestrator, data stores
C4 Component: Query RAG pipeline internals
C4 Component: Ingestion Ingestion pipeline
C4 Deployment AWS deployment topology
HIPAA Controls 4-layer defense model
System Design Scale analysis, patterns, trade-offs
OpenAPI Spec 8 endpoints, full schemas
Database Schema Vector store + DynamoDB tables
AWS Deployment Guide Step-by-step deploy

Testing

cd solution/backend

# Unit tests (35 tests, ~5s)
MOCK_AUTH=true uv run pytest tests/ -v

# RAGAS evaluation (15 golden Q&A pairs)
MOCK_AUTH=true uv run python scripts/evaluate.py

# E2E happy path (requires running server)
bash ../scripts/test-e2e.sh
Test Suite Count What It Validates
Unit tests 34 Health, query, ingest, collections, vector DB, patient isolation, PHI redaction, guardrails
RAGAS eval 15 Faithfulness, answer relevancy, context precision, context recall, PHI leakage (=0), patient isolation (PASS)
E2E 9 Full CRUD flow against running API

Contributing

See CONTRIBUTING.md for development setup, code standards, and pull request process.

Security

See SECURITY.md for vulnerability disclosure policy and HIPAA security design.

License

MIT


Melroy Anthony -- AI Architect & Lead Software Engineer | Dublin, Ireland

Architecture designed for patient impact -- not dashboards.

Built with Claude Code

About

HIPAA-compliant RAG (Retrieval-Augmented Generation) framework for building modular, open-source health data applications on AWS

Topics

Resources

License

Contributing

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors