chatguru AI Agent

Documentation | Preview | Installation | Contributing

chatguru Agent is a production-ready whitelabel chatbot with RAG capabilities and agentic commerce integration, built with FastAPI, LangChain, and Azure OpenAI.

Brought with ❤️ by Netguru

Documentation

Read the full Docs at: https://github.com/netguru/chatguru

Preview

chatguru Agent ships with WebSocket streaming, RAG capabilities, and comprehensive observability!

Key Features:

Real-time WebSocket streaming for instant responses
RAG-powered product search and recommendations
Comprehensive API documentation with Swagger UI

Installation

Installation & requirements

Install latest library version

ℹ️ Library supports Python 3.12+

Install library's dependencies

# Clone the repository
git clone <repository-url>
cd chatguru

# Complete development setup
make setup

After installation:

# Configure environment variables
make env-setup
# Edit .env with your credentials

# Start the development server
make dev

In Use

Check the live demo at http://localhost:8000/

This is how you can use the WebSocket API in your app:

import asyncio
import websockets
import json

async def chat():
    uri = "ws://localhost:8000/ws"
    async with websockets.connect(uri) as websocket:
        # Send message
        await websocket.send(json.dumps({
            "message": "Hello, how are you?",
            "session_id": None
        }))

        # Receive streaming response
        async for message in websocket:
            data = json.loads(message)
            if data["type"] == "token":
                print(data["content"], end="", flush=True)
            elif data["type"] == "end":
                print("\n")
                break
            elif data["type"] == "error":
                print(f"Error: {data['content']}")
                break

asyncio.run(chat())

✨ Features

🚀 WebSocket Streaming: Real-time streaming chat responses via WebSocket
🧪 Minimal Test UI: Lightweight HTML at / for smoke testing only
🎨 Whitelabel Design: Easily customizable for different brands and tenants
🧠 RAG Capabilities: Semantic product search with sqlite-vec vector database
🛒 Agentic Commerce: Ready for MCP (Model Context Protocol) integration
📊 Observability: Built-in Langfuse tracing and monitoring
✅ Testing: Comprehensive test suite with promptfoo LLM evaluation
🐳 Production Ready: Docker containerization with health checks

🏗️ Architecture

Simple, modular architecture designed for whitelabel deployment:

graph LR
    subgraph "Current Implementation"
        UI[Web Chat UI] -->|WebSocket| API[FastAPI API]
        API -->|Streaming| AGENT[Agent Service]
        AGENT -->|AzureChatOpenAI| LLM[Azure OpenAI]
        AGENT -->|RAG Tool| PRODUCTDB[Product DB<br/>sqlite-vec]
        AGENT --> LANGFUSE[Langfuse<br/>Tracing]
    end

    subgraph "Future Extensions"
        MCP[MCP Tools<br/>Commerce Platforms]
        AGENT -.-> MCP
    end

For detailed architecture documentation, see docs/architecture.md.

🛠️ Technology Stack

Backend: FastAPI + Uvicorn (async)
AI/ML: LangChain + Azure OpenAI (direct integration)
LLM Provider: Azure OpenAI (via langchain-openai)
Vector Search: sqlite-vec (semantic product search)
Observability: Langfuse
Testing: pytest + promptfoo + GenericFakeChatModel
Code Quality: mypy + ruff + pre-commit
Containerization: Docker + Docker Compose
Package Management: uv (Python) + npm (Node.js)
Development: Makefile for task automation

🌐 Frontend Status

The previous React/Vite frontend has been moved to a separate repository and is not shipped here.
This repo only contains a minimal HTML page at / (src/api/templates/index.html) for smoke testing.
For a full experience, run your own frontend (e.g., in another container) that:
- Connects to the backend WebSocket at /ws and supports token-by-token streaming.
- Sends full conversation history via the messages array (role + content) alongside each message.
- Persists/maintains session_id per chat.

Conversation History Management

The frontend is responsible for maintaining conversation history. The included test UI (index.html) demonstrates the recommended approach:

Storage: Use localStorage to persist conversation history and session ID across page reloads
Format: Store messages as [{role: "user"|"assistant", content: "..."}]
Sending: Include all previous messages (excluding the current one) in the messages array with each request
Session ID: Extract and save session_id from ALL response types (token, end, error)

// Example localStorage keys
const STORAGE_KEY = 'chatguru_chat_history';   // Array of {role, content}
const SESSION_KEY = 'chatguru_session_id';      // Session ID string

// On page load: restore history and display messages
// On send: add user message to history, send with messages array
// On response end: add assistant message to history, save session_id
// On error: preserve session_id if valid (not "unknown")

📋 Prerequisites

Before you begin, ensure you have the following installed:

Python 3.12+ (Download)
uv - Fast Python package installer (Installation guide)
Docker and Docker Compose (optional, for containerized deployment)
Azure OpenAI account with API access
Langfuse account (for observability and tracing)

🚀 Quick Start

Option 1: Local Development (Recommended for Development)

1. Clone the Repository

git clone <repository-url>
cd chatguru

2. Complete Development Setup

# Install dependencies and set up pre-commit hooks
make setup

This command will:

Install Python dependencies using uv
Install and configure pre-commit hooks
Set up the development environment

3. Configure Environment Variables

# Copy environment template
make env-setup

# Edit .env with your credentials
# Required: LLM_* and LANGFUSE_* variables (see Configuration section below)

4. Start the Development Server

make dev

5. Access the Application

Test UI (Minimal): http://localhost:8000/ (for smoke testing only)
Backend API: http://localhost:8000
API Documentation: http://localhost:8000/docs
WebSocket Endpoint: ws://localhost:8000/ws

Option 2: Docker Deployment (Recommended for Production)

1. Clone and Configure

git clone <repository-url>
cd chatguru

# Copy and configure environment variables
make env-setup
# Edit .env with your credentials

2. Build and Run

# Build and start all services
make docker-run

# Or run in background
make docker-run-detached

3. Access the Application

Test UI (Minimal): http://localhost:8000/ (for smoke testing only)
Backend API: http://localhost:8000
API Documentation: http://localhost:8000/docs
WebSocket Endpoint: ws://localhost:8000/ws

🔧 Configuration

The application uses environment variables for configuration. Copy env.example to .env and configure the following:

Required Environment Variables

Variable	Description	Example
`LLM_ENDPOINT`	Azure OpenAI endpoint URL	`https://your-resource.openai.azure.com/`
`LLM_API_KEY`	Azure OpenAI API key	`your-api-key-here`
`LLM_DEPLOYMENT_NAME`	Azure OpenAI deployment name	`gpt-4o-mini`
`LLM_API_VERSION`	Azure OpenAI API version	`2024-02-15-preview`
`LANGFUSE_PUBLIC_KEY`	Langfuse public key	`pk-lf-...`
`LANGFUSE_SECRET_KEY`	Langfuse secret key	`sk-lf-...`
`LANGFUSE_HOST`	Langfuse host URL	`https://cloud.langfuse.com`

Optional Environment Variables

Variable	Description	Default
`FASTAPI_HOST`	API host address	`0.0.0.0`
`FASTAPI_PORT`	API port	`8000`
`FASTAPI_CORS_ORIGINS`	CORS allowed origins (JSON array)	`["*"]`
`APP_NAME`	Application name	`chatguru Agent`
`DEBUG`	Enable debug mode	`false`
`LOG_LEVEL`	Logging level	`INFO`
`VECTOR_DB_TYPE`	Database type	`sqlite`
`VECTOR_DB_SQLITE_URL`	SQLite service URL	`http://product-db:8001`

See env.example for a complete template with detailed comments.

📡 API Documentation

WebSocket API

The primary interface for chat is via WebSocket at ws://localhost:8000/ws.

Request Format

{
  "message": "Your message here",
  "session_id": "optional-session-id",
  "messages": [
    {"role": "user", "content": "previous user message"},
    {"role": "assistant", "content": "previous assistant response"}
  ]
}

Response Format

Responses are streamed as JSON messages:

// Token chunk (streamed multiple times)
{"type": "token", "content": "chunk of text", "session_id": "session-id"}

// End of stream (includes the full response as safety)
{"type": "end", "content": "full assistant response", "session_id": "session-id"}

// Error response
{"type": "error", "content": "error message", "session_id": "session-id"}

REST API

Health Check: GET /health
API Documentation: GET /docs (Swagger UI)
OpenAPI Schema: GET /openapi.json

🛠️ Development

Available Commands

Run make help to see all available commands. Key commands:

Installation & Setup

make setup          # Complete development setup
make env-setup      # Copy environment template
make install        # Install production dependencies

Development Servers

make dev            # Start backend development server (auto-reload)
make run            # Start production server (no auto-reload)

Testing

make test           # Run all tests
make coverage       # Run tests with coverage report
make promptfoo-eval # Run LLM evaluation tests
make promptfoo-view # View evaluation results

Code Quality

make pre-commit-install  # Install pre-commit hooks
make pre-commit          # Run pre-commit checks manually

Docker

make docker-build        # Build Docker images
make docker-run          # Run with Docker Compose (foreground)
make docker-run-detached # Run with Docker Compose (background)
make docker-stop         # Stop services
make docker-down         # Stop and remove containers
make docker-logs         # View logs
make docker-clean        # Clean all Docker resources

Utilities

make version        # Show current version
make clean          # Clean Python cache files

Project Structure

chatguru/
├── src/                     # Main application code
│   ├── api/                 # FastAPI application
│   │   ├── main.py         # FastAPI app setup
│   │   ├── templates/      # Minimal HTML test UI
│   │   └── routes/         # API routes
│   │       └── chat.py     # WebSocket chat endpoint
│   ├── agent/              # Agent implementation
│   │   ├── service.py      # LangChain agent with streaming
│   │   ├── prompt.py       # System prompts
│   │   └── __init__.py
│   ├── product_db/          # Product database (sqlite-vec)
│   │   ├── api.py          # FastAPI service
│   │   ├── store.py        # ProductStore with embeddings
│   │   ├── sqlite.py       # HTTP client for agent
│   │   ├── base.py         # Abstract interface
│   │   └── factory.py      # Database factory
│   ├── rag/                # RAG components
│   │   ├── documents.py    # Document handling
│   │   ├── simple_retriever.py  # Retriever interface
│   │   └── products.json   # Sample products data
│   ├── config.py           # Configuration management
│   └── main.py             # Application entry point
├── tests/                  # Test suite
│   ├── test_api.py         # API endpoint tests
│   ├── test_agent.py       # Agent tests
│   └── conftest.py         # Test configuration
├── docs/                   # Documentation
│   └── architecture.md      # Architecture documentation
├── promptfoo/              # LLM evaluation config
│   ├── provider.py         # Python provider adapter
│   └── promptfooconfig.yaml
├── docker/                 # Docker configuration
│   ├── Dockerfile          # Backend Dockerfile
│   └── Dockerfile.db       # Product database Dockerfile
├── .pre-commit-config.yaml # Pre-commit hooks
├── docker-compose.yml      # Docker Compose setup
├── Makefile                # Development commands
├── pyproject.toml          # Python project configuration
├── env.example             # Environment template
└── README.md               # This file

🧪 Testing

Unit Tests

# Run all tests
make test

# Run with coverage report
make coverage

Tests use GenericFakeChatModel from LangChain for reliable, deterministic testing without API calls.

LLM Evaluation with Promptfoo

# Run evaluation suite
make promptfoo-eval

# View results in browser
make promptfoo-view

# Run specific test file
make promptfoo-test TEST=tests/basic_greeting.yaml

Promptfoo tests evaluate response quality, helpfulness, and boundary conditions.

RAG Evaluation with RAGAS and RAG Evaluator

RAGAS (Retrieval-Augmented Generation Assessment) and RAG Evaluator are frameworks/tools for evaluating the performance of Retrieval-Augmented Generation (RAG) systems. They provide metrics to assess aspects like faithfulness, answer relevance, context precision, and retrieval quality in RAG pipelines.

For detailed information on RAG testing and evaluation using RAGAS and RAG Evaluator, see docs/rag_eval_readme.md.

🐳 Docker Deployment

Quick Start

# Build and run backend with Docker Compose
make docker-run

Manual Docker Commands

# Build backend image
docker build -f docker/Dockerfile -t chatguru-agent .

# Run backend container
docker run -p 8000:8000 --env-file .env chatguru-agent

Ports

Backend API: 8000 (host) → 8000 (container)
Product DB: 8001 (host) → 8001 (container)
WebSocket: ws://localhost:8000/ws
Test UI: http://localhost:8000/ (minimal, not production)

Using an External Frontend

Run your preferred frontend in a separate container or process and point it to the backend:

HTTP base: http://<backend-host>:8000
WebSocket: ws://<backend-host>:8000/ws
Include full conversation history in every message as messages: [{role, content}, ...].
Preserve a stable session_id per chat thread.

Conversation History Requirements

Your frontend must maintain conversation history to enable context-aware responses:

Persist locally: Store conversation history in localStorage (web) or equivalent persistent storage
Send with each message: Include all previous messages in the messages array (excluding the current message being sent)
Update after responses: Add assistant responses to history when the end message is received
Handle session_id correctly:
- Extract session_id from ALL response types (token, end, error)
- Use is not null checks (empty string is valid, "unknown" is fallback)
- Persist session_id alongside conversation history

Example WebSocket payload:

{
  "message": "Hi there!",
  "session_id": "chat-123",
  "messages": [
    {"role": "user", "content": "Earlier user message"},
    {"role": "assistant", "content": "Earlier assistant reply"}
  ]
}

Example response handling:

// Handle all response types
ws.onmessage = (event) => {
  const data = JSON.parse(event.data);

  if (data.type === 'token') {
    // Append to current response display
  } else if (data.type === 'end') {
    // Save assistant response to history
    history.push({role: 'assistant', content: fullResponse});
    sessionId = data.session_id;
    saveToStorage();
  } else if (data.type === 'error') {
    // Preserve session_id even on errors
    if (data.session_id && data.session_id !== 'unknown') {
      sessionId = data.session_id;
      saveToStorage();
    }
  }
};

🐛 Troubleshooting

Common Issues

1. "Module not found" errors

Solution: Ensure dependencies are installed:

make install

2. WebSocket connection fails

Solution:

Verify backend is running: curl http://localhost:8000/health
Check WebSocket endpoint: ws://localhost:8000/ws
Ensure CORS is configured correctly in .env

3. Azure OpenAI authentication errors

Solution:

Verify LLM_ENDPOINT includes trailing slash
Check LLM_API_KEY is correct
Ensure LLM_DEPLOYMENT_NAME matches your Azure deployment
Verify LLM_API_VERSION is supported

4. Langfuse connection errors

Solution:

Verify Langfuse credentials in .env
Check LANGFUSE_HOST is correct (default: https://cloud.langfuse.com)
Ensure network connectivity to Langfuse

5. Docker build fails

Solution:

Ensure uv.lock file exists (run uv sync locally first)
Check Docker has sufficient resources
Verify all required files are present

6. Port already in use

Solution:

Backend (8000): Stop other services using port 8000 or change FASTAPI_PORT
Frontend: Configure your external frontend to target the correct backend host/port

Getting Help

Check docs/architecture.md for architecture details
Review CONTRIBUTING.md for development guidelines
Open an issue on GitHub for bugs or feature requests

📚 Documentation

Architecture Guide - Detailed architecture documentation
Contributing Guide - How to contribute to the project
Getting Started Guide - Detailed setup instructions

🤝 Contributing

We welcome contributions! Please see CONTRIBUTING.md for:

Development setup instructions
Code style guidelines
Testing requirements
Pull request process
Issue reporting guidelines

🔮 Roadmap

Vector Database Integration: sqlite-vec for semantic search ✅
Streaming Responses: Real-time chat streaming via WebSocket ✅
MCP Tools: Integration with commerce platforms (PimCore, Strapi, Medusa.js)
Authentication: JWT-based API authentication
Rate Limiting: API rate limiting and quotas
Session Management: Client-side persistent conversation history (localStorage) ✅
Server-side Sessions: Backend-persisted conversation history
Multi-tenancy: Database-backed tenant configuration

📄 License

This library is available as open source under the terms of the MIT License.

🙏 Acknowledgments

FastAPI - Modern web framework
LangChain - LLM application framework
Langfuse - LLM observability platform
promptfoo - LLM evaluation framework

🆘 Support

For support and questions:

📖 Check the documentation
🐛 Open an issue for bugs
💬 Start a discussion for questions
📧 Contact the maintainers

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.github		.github
docker		docker
docs		docs
evaluation		evaluation
promptfoo		promptfoo
src		src
tests		tests
.cursorrules		.cursorrules
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
CONTRIBUTING.md		CONTRIBUTING.md
GETTING_STARTED.md		GETTING_STARTED.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
docker-compose.yml		docker-compose.yml
env.example		env.example
pyproject.toml		pyproject.toml

License

netguru/chatguru

Folders and files

Latest commit

History

Repository files navigation

chatguru AI Agent

Documentation

Preview

Installation

Installation & requirements

Install latest library version

Install library's dependencies

In Use

✨ Features

🏗️ Architecture

🛠️ Technology Stack

🌐 Frontend Status

Conversation History Management

📋 Prerequisites

🚀 Quick Start

Option 1: Local Development (Recommended for Development)

1. Clone the Repository

2. Complete Development Setup

3. Configure Environment Variables

4. Start the Development Server

5. Access the Application

Option 2: Docker Deployment (Recommended for Production)

1. Clone and Configure

2. Build and Run

3. Access the Application

🔧 Configuration

Required Environment Variables

Optional Environment Variables

📡 API Documentation

WebSocket API

Request Format

Response Format

REST API

🛠️ Development

Available Commands

Installation & Setup

Development Servers

Testing

Code Quality

Docker

Utilities

Project Structure

🧪 Testing

Unit Tests

LLM Evaluation with Promptfoo

RAG Evaluation with RAGAS and RAG Evaluator

🐳 Docker Deployment

Quick Start

Manual Docker Commands

Ports

Using an External Frontend

Conversation History Requirements

🐛 Troubleshooting

Common Issues

1. "Module not found" errors

2. WebSocket connection fails

3. Azure OpenAI authentication errors

4. Langfuse connection errors

5. Docker build fails

6. Port already in use

Getting Help

📚 Documentation

🤝 Contributing

🔮 Roadmap

📄 License

🙏 Acknowledgments

🆘 Support

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Packages