Purpose: Expose iris-vector-rag pipelines as MCP tools for Claude Desktop and other MCP clients Status: Production-Ready Protocol Version: MCP 1.0
The Model Context Protocol (MCP) allows iris-vector-rag pipelines to be used directly within Claude Desktop and other MCP-compatible applications. This enables conversational RAG workflows where Claude can query your document collections during conversations.
Key Benefits:
- 🔌 Zero-code integration - No API endpoints to maintain
- 💬 Conversational RAG - Claude queries your docs during chats
- 🚀 All pipelines supported - basic, basic_rerank, crag, graphrag, multi_query_rrf, pylate_colbert
- 🔒 Local execution - Documents stay on your machine
- ⚡ Fast - Direct Python execution, no HTTP overhead
# From repository root
python -m iris_vector_rag.mcp
# Server starts on localhost
# Available tools automatically registeredAdd to your Claude Desktop configuration (~/Library/Application Support/Claude/claude_desktop_config.json on macOS):
{
"mcpServers": {
"iris-rag": {
"command": "python",
"args": ["-m", "iris_vector_rag.mcp"],
"env": {
"OPENAI_API_KEY": "your-openai-api-key",
"IRIS_HOST": "localhost",
"IRIS_PORT": "1972",
"IRIS_NAMESPACE": "USER",
"IRIS_USER": "_SYSTEM",
"IRIS_PASSWORD": "SYS"
}
}
}
}In a Claude conversation:
You: What does my medical documentation say about diabetes treatment?
Claude: Let me search your documents...
[Uses rag_basic tool automatically]
Claude: Based on your medical documentation, diabetes treatment typically involves...
The MCP server exposes the following tools:
Purpose: Standard vector similarity search Best For: General Q&A, getting started
{
"name": "rag_basic",
"description": "Query documents using basic vector similarity search",
"inputSchema": {
"type": "object",
"properties": {
"query": {
"type": "string",
"description": "The question to answer"
},
"top_k": {
"type": "number",
"description": "Number of documents to retrieve (default: 5)"
}
},
"required": ["query"]
}
}Purpose: Vector search with cross-encoder reranking Best For: Higher accuracy requirements
{
"name": "rag_basic_rerank",
"description": "Query documents with reranking for improved precision",
"inputSchema": {
"type": "object",
"properties": {
"query": {"type": "string"},
"top_k": {"type": "number"},
"candidate_k": {"type": "number", "description": "Candidates before reranking (default: 50)"}
},
"required": ["query"]
}
}Purpose: Self-correcting RAG with web search fallback Best For: Current events, fact-checking
{
"name": "rag_crag",
"description": "Corrective RAG with self-evaluation and web search fallback",
"inputSchema": {
"type": "object",
"properties": {
"query": {"type": "string"},
"top_k": {"type": "number"}
},
"required": ["query"]
}
}Purpose: Hybrid search with knowledge graphs Best For: Entity relationships, complex queries
{
"name": "rag_graphrag",
"description": "Query using vector, text, and knowledge graph retrieval",
"inputSchema": {
"type": "object",
"properties": {
"query": {"type": "string"},
"top_k": {"type": "number"}
},
"required": ["query"]
}
}Purpose: Multiple query variations with fusion Best For: Ambiguous queries, exploratory search
{
"name": "rag_multi_query_rrf",
"description": "Generate query variations and fuse results",
"inputSchema": {
"type": "object",
"properties": {
"query": {"type": "string"},
"top_k": {"type": "number"},
"num_variations": {"type": "number", "description": "Query variations to generate (default: 3)"}
},
"required": ["query"]
}
}Purpose: Token-level late interaction retrieval Best For: Long documents, fine-grained matching
{
"name": "rag_pylate_colbert",
"description": "ColBERT-based retrieval with late interaction",
"inputSchema": {
"type": "object",
"properties": {
"query": {"type": "string"},
"top_k": {"type": "number"}
},
"required": ["query"]
}
}Purpose: Verify MCP server and database connectivity
{
"name": "health_check",
"description": "Check MCP server and database health",
"inputSchema": {
"type": "object",
"properties": {}
}
}Purpose: List all available RAG tools
{
"name": "list_tools",
"description": "List all available RAG pipeline tools",
"inputSchema": {
"type": "object",
"properties": {}
}
}Set these in the env section of your Claude Desktop config:
| Variable | Required | Description | Default |
|---|---|---|---|
OPENAI_API_KEY |
Yes | OpenAI API key for LLM | - |
ANTHROPIC_API_KEY |
No | Anthropic API key (alternative to OpenAI) | - |
IRIS_HOST |
Yes | IRIS database host | localhost |
IRIS_PORT |
Yes | IRIS database port | 1972 |
IRIS_NAMESPACE |
Yes | IRIS namespace | USER |
IRIS_USER |
Yes | IRIS username | _SYSTEM |
IRIS_PASSWORD |
Yes | IRIS password | SYS |
TAVILY_API_KEY |
No | Tavily API key (for CRAG web search) | - |
To use custom pipeline configurations with MCP:
- Create pipeline config file:
config/pipelines.yaml - Set environment variable:
PIPELINE_CONFIG_PATH=config/pipelines.yaml - Restart MCP server
Example config/pipelines.yaml:
basic:
top_k: 10
chunk_size: 512
graphrag:
enable_entity_extraction: true
entity_types: ["Disease", "Medication", "Symptom"]Claude Desktop Conversation:
User: What does my documentation say about diabetes?
Claude: Let me search your documents...
[Calls rag_basic with query="diabetes"]
Response: Based on your medical documentation, diabetes is a metabolic disorder...
[Shows sources: medical_text.pdf page 45, diabetes_guide.pdf page 12]
Claude Desktop Conversation:
User: What medications interact with metformin?
Claude: Let me use the knowledge graph to find medication interactions...
[Calls rag_graphrag with query="metformin interactions"]
Response: According to your knowledge base, metformin has the following interactions:
1. Insulin - May cause hypoglycemia
2. Alcohol - Increases risk of lactic acidosis
...
[Shows entity relationships and sources]
Claude Desktop Conversation:
User: What's the latest guidance on diabetes treatment?
Claude: Let me check your docs and search for recent updates...
[Calls rag_crag with query="latest diabetes treatment guidance"]
[CRAG evaluates: Internal docs are 2 years old]
[CRAG triggers: Web search fallback]
Response: Your internal documents mention standard treatment, but recent research from 2024 shows...
[Combines internal docs + web search results]
Claude automatically selects the appropriate pipeline based on query characteristics:
- Entity-heavy queries → Uses
rag_graphrag - Time-sensitive queries → Uses
rag_crag(web fallback) - Ambiguous queries → Uses
rag_multi_query_rrf - Default → Uses
rag_basic
You can also explicitly request a specific pipeline:
User: Use graphrag to find drug interactions for metformin
Claude: [Explicitly calls rag_graphrag]
MCP supports multi-turn RAG conversations:
User: What is diabetes?
Claude: [Calls rag_basic] Diabetes is a metabolic disorder...
User: What are the treatment options?
Claude: [Calls rag_basic with context] Based on the previous answer, treatment options include...
User: What about side effects of metformin?
Claude: [Calls rag_graphrag for drug info] Metformin side effects include...
MCP supports streaming responses for real-time updates:
# MCP server automatically streams
# Claude Desktop displays results incrementally# 1. Start MCP server
python -m iris_vector_rag.mcp
# 2. Test health check
curl -X POST http://localhost:8000/mcp/health
# Expected response:
{
"status": "healthy",
"database": "connected",
"tools": 8,
"version": "1.0"
}# test_mcp.py
import json
from iris_vector_rag.mcp import MCPServer
server = MCPServer()
# Test rag_basic
result = server.call_tool(
name="rag_basic",
arguments={"query": "What is diabetes?", "top_k": 5}
)
print(json.dumps(result, indent=2))- Open Claude Desktop
- Start new conversation
- Type: "Can you search my documents?"
- Claude should show available RAG tools
- Ask a question about your documents
- Claude should query via MCP and return results
Symptom: python -m iris_vector_rag.mcp fails
Solutions:
- Check Python version:
python --version(requires 3.11+) - Verify installation:
pip list | grep iris-vector-rag - Check database connection:
ping <IRIS_HOST> - Review logs:
tail -f logs/mcp_server.log
Symptom: Claude shows "Tool unavailable" error
Solutions:
- Verify MCP server is running:
curl http://localhost:8000/mcp/health - Check Claude Desktop config path:
~/Library/Application Support/Claude/claude_desktop_config.json - Restart Claude Desktop after config changes
- Check environment variables are set correctly
Symptom: MCP tools execute but return no documents
Solutions:
- Verify documents are loaded:
SELECT COUNT(*) FROM documents - Check embeddings exist:
SELECT COUNT(*) FROM embeddings - Test pipeline directly:
python -c "from iris_vector_rag import create_pipeline; pipeline = create_pipeline('basic'); print(pipeline.query('test'))" - Review document loading: Ensure
pipeline.load_documents()completed successfully
Symptom: MCP queries take >5 seconds
Solutions:
- Use faster pipeline: Switch from
graphragtobasic - Reduce top_k:
top_k=3instead oftop_k=10 - Use smaller LLM:
gpt-3.5-turboinstead ofgpt-4 - Enable IRIS EMBEDDING: Model caching speeds up vectorization
Symptom: rag_crag doesn't trigger web search
Solutions:
- Set Tavily API key:
export TAVILY_API_KEY=your-key - Check relevance threshold: May be too low (increase to 0.7)
- Verify CRAG configuration:
cat config/pipelines.yaml - Review logs for evaluation results
Claude Desktop
↓ (MCP protocol)
MCP Server (Python)
↓ (function call)
iris_vector_rag.create_pipeline()
↓ (SQL queries)
IRIS Database
↓ (vector search)
Documents + Embeddings
↓ (results)
Claude Desktop
Tools are automatically registered on server startup:
# iris_vector_rag/mcp/server.py
def register_tools():
tools = [
create_rag_tool("basic"),
create_rag_tool("basic_rerank"),
create_rag_tool("crag"),
create_rag_tool("graphrag"),
create_rag_tool("multi_query_rrf"),
create_rag_tool("pylate_colbert"),
create_health_check_tool(),
create_list_tools_tool(),
]
return tools- User query → Claude Desktop
- Tool selection → Claude chooses appropriate RAG tool
- MCP request → Sent to MCP server
- Pipeline execution → iris_vector_rag processes query
- Results → Returned via MCP protocol
- Response generation → Claude synthesizes answer
- Display → Results shown to user
- MCP server runs locally (localhost only by default)
- No data sent to external services (except LLM API for generation)
- Documents stay on your machine
Best Practice: Use environment variables, not hardcoded keys
{
"mcpServers": {
"iris-rag": {
"env": {
"OPENAI_API_KEY": "${OPENAI_API_KEY}", // Read from shell env
}
}
}
}Set in shell:
export OPENAI_API_KEY=your-key- MCP server requires IRIS database credentials
- Use read-only user if only querying (not updating)
- Restrict network access to IRIS (localhost or VPN only)
MCP server uses connection pooling for IRIS:
# config/mcp_config.yaml
database:
pool_size: 10
max_overflow: 5Enable result caching for repeated queries:
# config/mcp_config.yaml
caching:
enabled: true
ttl: 3600 # 1 hour
max_entries: 1000MCP server supports concurrent tool invocations:
# config/mcp_config.yaml
server:
max_concurrent_requests: 10Create /etc/systemd/system/iris-rag-mcp.service:
[Unit]
Description=IRIS RAG MCP Server
After=network.target
[Service]
Type=simple
User=iris-rag
WorkingDirectory=/opt/iris-vector-rag
Environment="OPENAI_API_KEY=your-key"
Environment="IRIS_HOST=localhost"
ExecStart=/usr/bin/python3 -m iris_vector_rag.mcp
Restart=always
[Install]
WantedBy=multi-user.targetEnable and start:
sudo systemctl enable iris-rag-mcp
sudo systemctl start iris-rag-mcp
sudo systemctl status iris-rag-mcpFROM python:3.11-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .
CMD ["python", "-m", "iris_vector_rag.mcp"]docker build -t iris-rag-mcp .
docker run -d -p 8000:8000 \
-e OPENAI_API_KEY=your-key \
-e IRIS_HOST=host.docker.internal \
iris-rag-mcp- User Guide - Complete iris-vector-rag usage guide
- API Reference - Full API documentation
- Pipeline Guide - Pipeline selection guide
- Model Context Protocol Specification - Official MCP docs