Skip to content

aws-samples/sample-dat409-hybrid-search-aurora-mcp

DAT409 - Hybrid Search with Aurora PostgreSQL for MCP Retrieval

Platform & Infrastructure

AWS Aurora pgvector Bedrock

Languages & Frameworks

Python MCP Streamlit

License

πŸŽ“ AWS re:Invent 2025 Workshop | For educational purposes - demonstrates production patterns

πŸš€ Overview

Duration: 60 minutes | Lab 1: 25 min | Lab 2: 20 min

Learn to build enterprise-grade hybrid search combining semantic similarity, lexical matching, and fuzzy search with Aurora PostgreSQL. Integrate Model Context Protocol (MCP) for natural language database queries with Row-Level Security (RLS).

What You'll Build:

  • Multi-modal search system over 21,704 products
  • AI agent with natural language database access
  • Secure multi-tenant system with PostgreSQL RLS

πŸ“ Repository Structure

β”œβ”€β”€ lab1-hybrid-search/
β”‚   β”œβ”€β”€ notebook/
β”‚   β”‚   └── dat409-hybrid-search-notebook.ipynb  # Lab 1: Hybrid search implementation
β”‚   β”œβ”€β”€ data/
β”‚   β”‚   └── amazon-products.csv                  # 21,704 product dataset
β”‚   └── requirements.txt
β”œβ”€β”€ lab2-mcp-agent/
β”‚   β”œβ”€β”€ streamlit_app.py                         # Lab 2: Interactive demo app
β”‚   β”œβ”€β”€ test_personas.sh                         # RLS testing script
β”‚   └── requirements.txt
β”œβ”€β”€ scripts/
β”‚   β”œβ”€β”€ bootstrap-code-editor.sh                 # Environment setup
β”‚   β”œβ”€β”€ setup-database.sh                        # Database initialization
β”‚   └── setup/                                   # Helper utilities
└── solutions/                                   # Reference implementations

🎯 Workshop Labs

Lab 1: Hybrid Search Architecture (25 min)

Build a multi-modal search system combining three complementary techniques:

Method Technology Use Case
Semantic pgvector + HNSW + Cohere Conceptual queries ("eco-friendly products")
Keyword PostgreSQL tsvector + GIN Exact terms ("iPhone 15 Pro")
Fuzzy pg_trgm + GIN Typo tolerance ("wireles hedphones")

What You'll Learn:

  • When to use semantic vs keyword search
  • Index strategies for production workloads (HNSW vs IVFFlat)
  • Result fusion with Reciprocal Rank Fusion (RRF)
  • Cohere Rerank for ML-based result optimization

Hands-On:

cd /workshop/lab1-hybrid-search/notebook
# Open dat409-hybrid-search-notebook.ipynb

You'll implement fuzzy search, semantic search, and hybrid RRF queries with TODO blocks guiding you through each step.


Lab 2: MCP Agent with Row-Level Security (20 min)

Build an AI agent that queries databases using natural language:

User: "Show warranty info for headphones"
  ↓
Strands Agent (Claude Sonnet 4)
  ↓
MCP Tools β†’ SQL Query
  ↓
Aurora PostgreSQL (RLS filtered)
  ↓
Results based on user persona

What You'll Learn:

  • Model Context Protocol (MCP) for standardized database access
  • Application-level security with PostgreSQL RLS
  • AI agent patterns for database queries
  • Multi-tenant data isolation strategies

Hands-On:

cd /workshop/lab2-mcp-agent
./test_personas.sh           # Test RLS policies
streamlit run streamlit_app.py  # Interactive demo

Explore how different personas (customer, support agent, product manager) see different data through RLS policies.


πŸŽ“ Workshop Access

For AWS re:Invent Participants:

  1. Access your Code Editor environment via the provided CloudFront URL
  2. Navigate to /workshop/lab1-hybrid-search/notebook/
  3. Open dat409-hybrid-search-notebook.ipynb
  4. Follow the guided TODO blocks

Environment Includes:

  • βœ… Aurora PostgreSQL 17.5 with pgvector
  • βœ… 21,704 products pre-loaded with embeddings
  • βœ… Python 3.13 + Jupyter + all dependencies
  • βœ… Amazon Bedrock access (Cohere models)
  • βœ… MCP server pre-configured

πŸ› οΈ Technology Stack

Component Technology Purpose
Database Aurora PostgreSQL 17.5 Vector storage with pgvector extension
Vector Index HNSW (pgvector) Fast ANN search with 95%+ recall
Embeddings Cohere Embed English v3 1024-dimensional dense vectors
Full-Text PostgreSQL tsvector + GIN Lexical search and BM25-style ranking
Fuzzy Match pg_trgm + GIN Trigram similarity for typo tolerance
MCP Server awslabs.postgres-mcp-server Standardized database access tools
AI Agent Strands Agent Framework Tool-calling orchestration layer
LLM Claude Sonnet 4 Natural language β†’ SQL translation
RLS PostgreSQL Row-Level Security Declarative multi-tenancy
Data API Aurora Data API Serverless, IAM-authenticated access
Python 3.13 (pandas, psycopg3, boto3) Data loading and orchestration

πŸ€– MCP Agent Architecture

How Natural Language Queries Work:

"Show warranty info" β†’ Strands Agent β†’ MCP Tools β†’ Aurora PostgreSQL β†’ Filtered Results
                           ↓              ↓              ↓
                    Claude Sonnet 4   SQL Query    RLS Policies

Key Components:

Component Role Technology
Strands Agent Orchestration & tool calling Python framework
Claude Sonnet 4 Natural language β†’ SQL Amazon Bedrock
MCP Client Standardized database tools awslabs.postgres-mcp-server
Aurora Data API Serverless database access IAM authentication
RLS Policies Row-level security PostgreSQL

Why This Pattern?

  • βœ… Standard Practice: Agent uses admin access; security via application-level filtering
  • βœ… Serverless: No VPC required with Data API
  • βœ… Portable: MCP tools work across different AI frameworks
  • βœ… Intelligent: Multi-step reasoning with context awareness

Production Considerations:

  • Agent requires admin credentials (standard for AI agents)
  • Data API adds ~10ms latency (acceptable for agentic workflows)
  • RLS provides database-level isolation per persona

πŸ“š Learn More

Documentation:

Related AWS Services:

🀝 Contributing

Found this helpful?

  • ⭐ Star this repository
  • 🍴 Fork for your own use cases
  • πŸ› Report issues
  • πŸ’‘ Submit pull requests
  • πŸ“’ Share with colleagues

Contributions welcome! See CONTRIBUTING.md for guidelines.

πŸ“„ License

MIT-0 License - See LICENSE

Security issues: CONTRIBUTING.md


AWS re:Invent 2025 | DAT409 Builder's Session

Hybrid Search with Aurora PostgreSQL for MCP Retrieval

Β© 2025 Shayon Sanyal

About

DAT409 - Hybrid Search with Aurora PostgreSQL for MCP Retrieval

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •