π AWS re:Invent 2025 Workshop | For educational purposes - demonstrates production patterns
Duration: 60 minutes | Lab 1: 25 min | Lab 2: 20 min
Learn to build enterprise-grade hybrid search combining semantic similarity, lexical matching, and fuzzy search with Aurora PostgreSQL. Integrate Model Context Protocol (MCP) for natural language database queries with Row-Level Security (RLS).
What You'll Build:
- Multi-modal search system over 21,704 products
- AI agent with natural language database access
- Secure multi-tenant system with PostgreSQL RLS
βββ lab1-hybrid-search/
β βββ notebook/
β β βββ dat409-hybrid-search-notebook.ipynb # Lab 1: Hybrid search implementation
β βββ data/
β β βββ amazon-products.csv # 21,704 product dataset
β βββ requirements.txt
βββ lab2-mcp-agent/
β βββ streamlit_app.py # Lab 2: Interactive demo app
β βββ test_personas.sh # RLS testing script
β βββ requirements.txt
βββ scripts/
β βββ bootstrap-code-editor.sh # Environment setup
β βββ setup-database.sh # Database initialization
β βββ setup/ # Helper utilities
βββ solutions/ # Reference implementations
Build a multi-modal search system combining three complementary techniques:
Method | Technology | Use Case |
---|---|---|
Semantic | pgvector + HNSW + Cohere | Conceptual queries ("eco-friendly products") |
Keyword | PostgreSQL tsvector + GIN | Exact terms ("iPhone 15 Pro") |
Fuzzy | pg_trgm + GIN | Typo tolerance ("wireles hedphones") |
What You'll Learn:
- When to use semantic vs keyword search
- Index strategies for production workloads (HNSW vs IVFFlat)
- Result fusion with Reciprocal Rank Fusion (RRF)
- Cohere Rerank for ML-based result optimization
Hands-On:
cd /workshop/lab1-hybrid-search/notebook
# Open dat409-hybrid-search-notebook.ipynb
You'll implement fuzzy search, semantic search, and hybrid RRF queries with TODO blocks guiding you through each step.
Build an AI agent that queries databases using natural language:
User: "Show warranty info for headphones"
β
Strands Agent (Claude Sonnet 4)
β
MCP Tools β SQL Query
β
Aurora PostgreSQL (RLS filtered)
β
Results based on user persona
What You'll Learn:
- Model Context Protocol (MCP) for standardized database access
- Application-level security with PostgreSQL RLS
- AI agent patterns for database queries
- Multi-tenant data isolation strategies
Hands-On:
cd /workshop/lab2-mcp-agent
./test_personas.sh # Test RLS policies
streamlit run streamlit_app.py # Interactive demo
Explore how different personas (customer, support agent, product manager) see different data through RLS policies.
For AWS re:Invent Participants:
- Access your Code Editor environment via the provided CloudFront URL
- Navigate to
/workshop/lab1-hybrid-search/notebook/
- Open
dat409-hybrid-search-notebook.ipynb
- Follow the guided TODO blocks
Environment Includes:
- β Aurora PostgreSQL 17.5 with pgvector
- β 21,704 products pre-loaded with embeddings
- β Python 3.13 + Jupyter + all dependencies
- β Amazon Bedrock access (Cohere models)
- β MCP server pre-configured
Component | Technology | Purpose |
---|---|---|
Database | Aurora PostgreSQL 17.5 | Vector storage with pgvector extension |
Vector Index | HNSW (pgvector) | Fast ANN search with 95%+ recall |
Embeddings | Cohere Embed English v3 | 1024-dimensional dense vectors |
Full-Text | PostgreSQL tsvector + GIN |
Lexical search and BM25-style ranking |
Fuzzy Match | pg_trgm + GIN | Trigram similarity for typo tolerance |
MCP Server | awslabs.postgres-mcp-server |
Standardized database access tools |
AI Agent | Strands Agent Framework | Tool-calling orchestration layer |
LLM | Claude Sonnet 4 | Natural language β SQL translation |
RLS | PostgreSQL Row-Level Security | Declarative multi-tenancy |
Data API | Aurora Data API | Serverless, IAM-authenticated access |
Python | 3.13 (pandas, psycopg3, boto3) | Data loading and orchestration |
How Natural Language Queries Work:
"Show warranty info" β Strands Agent β MCP Tools β Aurora PostgreSQL β Filtered Results
β β β
Claude Sonnet 4 SQL Query RLS Policies
Key Components:
Component | Role | Technology |
---|---|---|
Strands Agent | Orchestration & tool calling | Python framework |
Claude Sonnet 4 | Natural language β SQL | Amazon Bedrock |
MCP Client | Standardized database tools | awslabs.postgres-mcp-server |
Aurora Data API | Serverless database access | IAM authentication |
RLS Policies | Row-level security | PostgreSQL |
Why This Pattern?
- β Standard Practice: Agent uses admin access; security via application-level filtering
- β Serverless: No VPC required with Data API
- β Portable: MCP tools work across different AI frameworks
- β Intelligent: Multi-step reasoning with context awareness
Production Considerations:
- Agent requires admin credentials (standard for AI agents)
- Data API adds ~10ms latency (acceptable for agentic workflows)
- RLS provides database-level isolation per persona
Documentation:
- Aurora PostgreSQL - Managed PostgreSQL service
- pgvector - Vector similarity search extension
- Model Context Protocol - Standardized AI tool protocol
- PostgreSQL RLS - Row-level security
Related AWS Services:
- Amazon Bedrock - Cohere embeddings & rerank
- RDS Data API - Serverless database access
- Secrets Manager - Credential management
Found this helpful?
- β Star this repository
- π΄ Fork for your own use cases
- π Report issues
- π‘ Submit pull requests
- π’ Share with colleagues
Contributions welcome! See CONTRIBUTING.md for guidelines.
MIT-0 License - See LICENSE
Security issues: CONTRIBUTING.md
AWS re:Invent 2025 | DAT409 Builder's Session
Hybrid Search with Aurora PostgreSQL for MCP Retrieval
Β© 2025 Shayon Sanyal