chatguru Agent is a production-ready whitelabel chatbot with RAG capabilities and agentic commerce integration, built with FastAPI, LangChain, and Azure OpenAI.
Brought with ❤️ by Netguru
Read the full Docs at: https://github.com/netguru/chatguru
chatguru Agent ships with WebSocket streaming, RAG capabilities, and comprehensive observability!
Key Features:
- Real-time WebSocket streaming for instant responses
- RAG-powered product search and recommendations
- Comprehensive API documentation with Swagger UI
ℹ️ Library supports Python 3.12+
# Clone the repository
git clone <repository-url>
cd chatguru
# Complete development setup
make setupAfter installation:
# Configure environment variables
make env-setup
# Edit .env with your credentials
# Start the development server
make devCheck the live demo at http://localhost:8000/
This is how you can use the WebSocket API in your app:
import asyncio
import websockets
import json
async def chat():
uri = "ws://localhost:8000/ws"
async with websockets.connect(uri) as websocket:
# Send message
await websocket.send(json.dumps({
"message": "Hello, how are you?",
"session_id": None
}))
# Receive streaming response
async for message in websocket:
data = json.loads(message)
if data["type"] == "token":
print(data["content"], end="", flush=True)
elif data["type"] == "end":
print("\n")
break
elif data["type"] == "error":
print(f"Error: {data['content']}")
break
asyncio.run(chat())- 🚀 WebSocket Streaming: Real-time streaming chat responses via WebSocket
- 🧪 Minimal Test UI: Lightweight HTML at
/for smoke testing only - 🎨 Whitelabel Design: Easily customizable for different brands and tenants
- 🧠 RAG Capabilities: Semantic product search with sqlite-vec vector database
- 🛒 Agentic Commerce: Ready for MCP (Model Context Protocol) integration
- 📊 Observability: Built-in Langfuse tracing and monitoring
- ✅ Testing: Comprehensive test suite with promptfoo LLM evaluation
- 🐳 Production Ready: Docker containerization with health checks
Simple, modular architecture designed for whitelabel deployment:
graph LR
subgraph "Current Implementation"
UI[Web Chat UI] -->|WebSocket| API[FastAPI API]
API -->|Streaming| AGENT[Agent Service]
AGENT -->|AzureChatOpenAI| LLM[Azure OpenAI]
AGENT -->|RAG Tool| PRODUCTDB[Product DB<br/>sqlite-vec]
AGENT --> LANGFUSE[Langfuse<br/>Tracing]
end
subgraph "Future Extensions"
MCP[MCP Tools<br/>Commerce Platforms]
AGENT -.-> MCP
end
For detailed architecture documentation, see docs/architecture.md.
- Backend: FastAPI + Uvicorn (async)
- AI/ML: LangChain + Azure OpenAI (direct integration)
- LLM Provider: Azure OpenAI (via langchain-openai)
- Vector Search: sqlite-vec (semantic product search)
- Observability: Langfuse
- Testing: pytest + promptfoo + GenericFakeChatModel
- Code Quality: mypy + ruff + pre-commit
- Containerization: Docker + Docker Compose
- Package Management: uv (Python) + npm (Node.js)
- Development: Makefile for task automation
- The previous React/Vite frontend has been moved to a separate repository and is not shipped here.
- This repo only contains a minimal HTML page at
/(src/api/templates/index.html) for smoke testing. - For a full experience, run your own frontend (e.g., in another container) that:
- Connects to the backend WebSocket at
/wsand supports token-by-token streaming. - Sends full conversation history via the
messagesarray (role+content) alongside eachmessage. - Persists/maintains
session_idper chat.
- Connects to the backend WebSocket at
The frontend is responsible for maintaining conversation history. The included test UI (index.html) demonstrates the recommended approach:
- Storage: Use
localStorageto persist conversation history and session ID across page reloads - Format: Store messages as
[{role: "user"|"assistant", content: "..."}] - Sending: Include all previous messages (excluding the current one) in the
messagesarray with each request - Session ID: Extract and save
session_idfrom ALL response types (token,end,error)
// Example localStorage keys
const STORAGE_KEY = 'chatguru_chat_history'; // Array of {role, content}
const SESSION_KEY = 'chatguru_session_id'; // Session ID string
// On page load: restore history and display messages
// On send: add user message to history, send with messages array
// On response end: add assistant message to history, save session_id
// On error: preserve session_id if valid (not "unknown")Before you begin, ensure you have the following installed:
- Python 3.12+ (Download)
- uv - Fast Python package installer (Installation guide)
- Docker and Docker Compose (optional, for containerized deployment)
- Azure OpenAI account with API access
- Langfuse account (for observability and tracing)
git clone <repository-url>
cd chatguru# Install dependencies and set up pre-commit hooks
make setupThis command will:
- Install Python dependencies using
uv - Install and configure pre-commit hooks
- Set up the development environment
# Copy environment template
make env-setup
# Edit .env with your credentials
# Required: LLM_* and LANGFUSE_* variables (see Configuration section below)make dev- Test UI (Minimal): http://localhost:8000/ (for smoke testing only)
- Backend API: http://localhost:8000
- API Documentation: http://localhost:8000/docs
- WebSocket Endpoint: ws://localhost:8000/ws
git clone <repository-url>
cd chatguru
# Copy and configure environment variables
make env-setup
# Edit .env with your credentials# Build and start all services
make docker-run
# Or run in background
make docker-run-detached- Test UI (Minimal): http://localhost:8000/ (for smoke testing only)
- Backend API: http://localhost:8000
- API Documentation: http://localhost:8000/docs
- WebSocket Endpoint: ws://localhost:8000/ws
The application uses environment variables for configuration. Copy env.example to .env and configure the following:
| Variable | Description | Example |
|---|---|---|
LLM_ENDPOINT |
Azure OpenAI endpoint URL | https://your-resource.openai.azure.com/ |
LLM_API_KEY |
Azure OpenAI API key | your-api-key-here |
LLM_DEPLOYMENT_NAME |
Azure OpenAI deployment name | gpt-4o-mini |
LLM_API_VERSION |
Azure OpenAI API version | 2024-02-15-preview |
LANGFUSE_PUBLIC_KEY |
Langfuse public key | pk-lf-... |
LANGFUSE_SECRET_KEY |
Langfuse secret key | sk-lf-... |
LANGFUSE_HOST |
Langfuse host URL | https://cloud.langfuse.com |
| Variable | Description | Default |
|---|---|---|
FASTAPI_HOST |
API host address | 0.0.0.0 |
FASTAPI_PORT |
API port | 8000 |
FASTAPI_CORS_ORIGINS |
CORS allowed origins (JSON array) | ["*"] |
APP_NAME |
Application name | chatguru Agent |
DEBUG |
Enable debug mode | false |
LOG_LEVEL |
Logging level | INFO |
VECTOR_DB_TYPE |
Database type | sqlite |
VECTOR_DB_SQLITE_URL |
SQLite service URL | http://product-db:8001 |
See env.example for a complete template with detailed comments.
The primary interface for chat is via WebSocket at ws://localhost:8000/ws.
{
"message": "Your message here",
"session_id": "optional-session-id",
"messages": [
{"role": "user", "content": "previous user message"},
{"role": "assistant", "content": "previous assistant response"}
]
}Responses are streamed as JSON messages:
// Token chunk (streamed multiple times)
{"type": "token", "content": "chunk of text", "session_id": "session-id"}
// End of stream (includes the full response as safety)
{"type": "end", "content": "full assistant response", "session_id": "session-id"}
// Error response
{"type": "error", "content": "error message", "session_id": "session-id"}- Health Check:
GET /health - API Documentation:
GET /docs(Swagger UI) - OpenAPI Schema:
GET /openapi.json
Run make help to see all available commands. Key commands:
make setup # Complete development setup
make env-setup # Copy environment template
make install # Install production dependenciesmake dev # Start backend development server (auto-reload)
make run # Start production server (no auto-reload)make test # Run all tests
make coverage # Run tests with coverage report
make promptfoo-eval # Run LLM evaluation tests
make promptfoo-view # View evaluation resultsmake pre-commit-install # Install pre-commit hooks
make pre-commit # Run pre-commit checks manuallymake docker-build # Build Docker images
make docker-run # Run with Docker Compose (foreground)
make docker-run-detached # Run with Docker Compose (background)
make docker-stop # Stop services
make docker-down # Stop and remove containers
make docker-logs # View logs
make docker-clean # Clean all Docker resourcesmake version # Show current version
make clean # Clean Python cache fileschatguru/
├── src/ # Main application code
│ ├── api/ # FastAPI application
│ │ ├── main.py # FastAPI app setup
│ │ ├── templates/ # Minimal HTML test UI
│ │ └── routes/ # API routes
│ │ └── chat.py # WebSocket chat endpoint
│ ├── agent/ # Agent implementation
│ │ ├── service.py # LangChain agent with streaming
│ │ ├── prompt.py # System prompts
│ │ └── __init__.py
│ ├── product_db/ # Product database (sqlite-vec)
│ │ ├── api.py # FastAPI service
│ │ ├── store.py # ProductStore with embeddings
│ │ ├── sqlite.py # HTTP client for agent
│ │ ├── base.py # Abstract interface
│ │ └── factory.py # Database factory
│ ├── rag/ # RAG components
│ │ ├── documents.py # Document handling
│ │ ├── simple_retriever.py # Retriever interface
│ │ └── products.json # Sample products data
│ ├── config.py # Configuration management
│ └── main.py # Application entry point
├── tests/ # Test suite
│ ├── test_api.py # API endpoint tests
│ ├── test_agent.py # Agent tests
│ └── conftest.py # Test configuration
├── docs/ # Documentation
│ └── architecture.md # Architecture documentation
├── promptfoo/ # LLM evaluation config
│ ├── provider.py # Python provider adapter
│ └── promptfooconfig.yaml
├── docker/ # Docker configuration
│ ├── Dockerfile # Backend Dockerfile
│ └── Dockerfile.db # Product database Dockerfile
├── .pre-commit-config.yaml # Pre-commit hooks
├── docker-compose.yml # Docker Compose setup
├── Makefile # Development commands
├── pyproject.toml # Python project configuration
├── env.example # Environment template
└── README.md # This file
# Run all tests
make test
# Run with coverage report
make coverageTests use GenericFakeChatModel from LangChain for reliable, deterministic testing without API calls.
# Run evaluation suite
make promptfoo-eval
# View results in browser
make promptfoo-view
# Run specific test file
make promptfoo-test TEST=tests/basic_greeting.yamlPromptfoo tests evaluate response quality, helpfulness, and boundary conditions.
RAGAS (Retrieval-Augmented Generation Assessment) and RAG Evaluator are frameworks/tools for evaluating the performance of Retrieval-Augmented Generation (RAG) systems. They provide metrics to assess aspects like faithfulness, answer relevance, context precision, and retrieval quality in RAG pipelines.
For detailed information on RAG testing and evaluation using RAGAS and RAG Evaluator, see docs/rag_eval_readme.md.
# Build and run backend with Docker Compose
make docker-run# Build backend image
docker build -f docker/Dockerfile -t chatguru-agent .
# Run backend container
docker run -p 8000:8000 --env-file .env chatguru-agent- Backend API:
8000(host) →8000(container) - Product DB:
8001(host) →8001(container) - WebSocket:
ws://localhost:8000/ws - Test UI:
http://localhost:8000/(minimal, not production)
Run your preferred frontend in a separate container or process and point it to the backend:
- HTTP base:
http://<backend-host>:8000 - WebSocket:
ws://<backend-host>:8000/ws - Include full conversation history in every message as
messages: [{role, content}, ...]. - Preserve a stable
session_idper chat thread.
Your frontend must maintain conversation history to enable context-aware responses:
- Persist locally: Store conversation history in
localStorage(web) or equivalent persistent storage - Send with each message: Include all previous messages in the
messagesarray (excluding the current message being sent) - Update after responses: Add assistant responses to history when the
endmessage is received - Handle session_id correctly:
- Extract
session_idfrom ALL response types (token,end,error) - Use
is not nullchecks (empty string is valid,"unknown"is fallback) - Persist session_id alongside conversation history
- Extract
Example WebSocket payload:
{
"message": "Hi there!",
"session_id": "chat-123",
"messages": [
{"role": "user", "content": "Earlier user message"},
{"role": "assistant", "content": "Earlier assistant reply"}
]
}Example response handling:
// Handle all response types
ws.onmessage = (event) => {
const data = JSON.parse(event.data);
if (data.type === 'token') {
// Append to current response display
} else if (data.type === 'end') {
// Save assistant response to history
history.push({role: 'assistant', content: fullResponse});
sessionId = data.session_id;
saveToStorage();
} else if (data.type === 'error') {
// Preserve session_id even on errors
if (data.session_id && data.session_id !== 'unknown') {
sessionId = data.session_id;
saveToStorage();
}
}
};Solution: Ensure dependencies are installed:
make installSolution:
- Verify backend is running:
curl http://localhost:8000/health - Check WebSocket endpoint:
ws://localhost:8000/ws - Ensure CORS is configured correctly in
.env
Solution:
- Verify
LLM_ENDPOINTincludes trailing slash - Check
LLM_API_KEYis correct - Ensure
LLM_DEPLOYMENT_NAMEmatches your Azure deployment - Verify
LLM_API_VERSIONis supported
Solution:
- Verify Langfuse credentials in
.env - Check
LANGFUSE_HOSTis correct (default:https://cloud.langfuse.com) - Ensure network connectivity to Langfuse
Solution:
- Ensure
uv.lockfile exists (runuv synclocally first) - Check Docker has sufficient resources
- Verify all required files are present
Solution:
- Backend (8000): Stop other services using port 8000 or change
FASTAPI_PORT - Frontend: Configure your external frontend to target the correct backend host/port
- Check docs/architecture.md for architecture details
- Review CONTRIBUTING.md for development guidelines
- Open an issue on GitHub for bugs or feature requests
- Architecture Guide - Detailed architecture documentation
- Contributing Guide - How to contribute to the project
- Getting Started Guide - Detailed setup instructions
We welcome contributions! Please see CONTRIBUTING.md for:
- Development setup instructions
- Code style guidelines
- Testing requirements
- Pull request process
- Issue reporting guidelines
- Vector Database Integration: sqlite-vec for semantic search ✅
- Streaming Responses: Real-time chat streaming via WebSocket ✅
- MCP Tools: Integration with commerce platforms (PimCore, Strapi, Medusa.js)
- Authentication: JWT-based API authentication
- Rate Limiting: API rate limiting and quotas
- Session Management: Client-side persistent conversation history (localStorage) ✅
- Server-side Sessions: Backend-persisted conversation history
- Multi-tenancy: Database-backed tenant configuration
This library is available as open source under the terms of the MIT License.
- FastAPI - Modern web framework
- LangChain - LLM application framework
- Langfuse - LLM observability platform
- promptfoo - LLM evaluation framework
For support and questions:
- 📖 Check the documentation
- 🐛 Open an issue for bugs
- 💬 Start a discussion for questions
- 📧 Contact the maintainers