Lightweight, production-style RAG backend with a Streamlit chatbot frontend β built for Pinecone, Groq, and LangGraph/LangChain.
Built with the tools and technologies:
Python |
FastAPI |
Pinecone |
LangChain |
LangGraph |
Groq |
Streamlit |
Docling |
httpx
This repository contains a lightweight RAG backend built with FastAPI, Pinecone (integrated embeddings), and LangGraph/LangChain for agentic RAG flows, plus a Streamlit chatbot frontend.
At a high level:
- The backend exposes ingestion, semantic search, and production-style RAG chat endpoints (with optional web-search fallback, rate limiting, caching, metrics, and API key protection).
- The frontend is a Streamlit chatbot UI that talks to the backend
/chatendpoint, supports streaming responses, and offers a modal-based document upload workflow that ingests local files via/documents/upload-text.
-
Backend API
- FastAPI-based RAG backend with Pinecone integrated embeddings.
- Agentic RAG chat powered by LangGraph and LangChain.
- Groq LLM integration via OpenAI-compatible API.
- Optional Tavily web-search fallback.
- Ingestion endpoints for arXiv, OpenAlex, Wikipedia, and manual text uploads.
- Caching, rate limiting, metrics endpoint, and API key protection for secured deployments.
- Dockerized backend suitable for Hugging Face Spaces.
-
Frontend (Streamlit)
- Chatbot UI using
st.chat_messageandst.chat_input. - Streaming support via
/chat/streamwhen available, with automatic fallback to/chat. - Sidebar controls for query behaviour (top_k, min_score, web fallback, show sources).
- Modal Upload Document dialog to convert and upload local PDFs/MD/TXT/Office/HTML files to the backend.
- Recent uploads panel with quick βSearch this documentβ actions.
- Chatbot UI using
-
Developer Experience
- Simple configuration via
.envand Streamlit secrets. - Utility scripts for seeding, smoke tests, benchmarking, and Docling-based local ingestion.
- Clear work package history and operational runbook under
docs/.
- Simple configuration via
- Backend API: see
backend/README.mdfor setup, environment variables, API key protection, endpoint examples, and deployment instructions (including/chat,/chat/stream,/metrics, and Hugging Face Spaces notes).
Typical flow:
-
Create a Python 3.11+ virtual environment.
-
Install backend dependencies:
cd backend pip install -r requirements.txt -
Copy
.env.exampleβ.envand configure:- Pinecone (integrated embeddings).
- Groq LLM parameters.
- Optional Tavily, LangSmith, rate limiting, caching, and API key (
API_KEY) for protected deployments.
-
Run the backend locally:
uvicorn app.main:app --reload --port 8000
-
Browse:
http://localhost:8000/healthhttp://localhost:8000/docs
- Frontend: Streamlit chat app under
frontend/app.pyintended for Streamlit Community Cloud or local runs.
For local usage:
pip install -r requirements.txt # root requirements (Streamlit + frontend deps)
streamlit run frontend/app.pyConfigure:
BACKEND_BASE_URL(e.g.http://localhost:8000or your HF Space URL).API_KEY(if the backend is protected) via:st.secretsin.streamlit/secrets.toml, or- environment variables.
- Backend API: see
backend/README.mdfor setup, environment variables, API key protection, endpoint examples, and deployment instructions (including/chat,/chat/stream,/metrics, and Hugging Face Spaces notes). - Architecture and design context: see
docs/CONTEXT.mdfor work package history, security hardening notes, and operational runbook. - Frontend: Streamlit chat app under
frontend/app.pyintended for Streamlit Community Cloud or local runs. - Utility scripts: see the
scripts/directory for ingestion, smoke-test helpers, Docling-based local ingestion, and benchmarking (includingscripts/bench_local.py).
A high-level layout:
rag-agent-workbench/
ββ backend/ # FastAPI app, core logic, routers, services, config
ββ frontend/ # Streamlit chatbot UI
ββ docs/ # Context, worklog, and design documentation
ββ scripts/ # Ingestion, smoke tests, benchmark, and docling helpers
ββ requirements.txt # Frontend / root-level dependencies
ββ backend/requirements.txt # Backend dependencies
ββ LICENSE
ββ README.md
-
Backend API & operations
Seebackend/README.mdfor:- Environment variables and configuration.
- Endpoint catalogue (ingest, search, chat, metrics).
- Hugging Face Spaces deployment notes.
- LangSmith, Tavily, Groq, and Pinecone configuration.
-
Architecture & work packages
Seedocs/CONTEXT.mdfor:- Overall architecture and design decisions.
- Work package history (A/B/C, security + UI + ingestion).
- Operational runbook (key rotation, toggling rate limiting/caching, diagnosing issues).
-
Worklog
Seedocs/WORKLOG.mdfor a chronological summary of changes and key files per work package.
This project is licensed under the MIT License.
See the LICENSE file for details.
For questions, suggestions, or collaboration:
- Open an issue or discussion in this repository.
- Refer to the maintainers listed in project documentation or commit history. rag-agent-workbench
This repository contains a lightweight RAG backend built with FastAPI, Pinecone (integrated embeddings), and LangGraph/LangChain for agentic RAG flows, plus a minimal Streamlit frontend.
- Backend API: see
backend/README.mdfor setup, environment variables, endpoint examples, and deployment instructions (including/chat,/chat/stream,/metrics, and Hugging Face Spaces notes). - Architecture and design context: see
docs/CONTEXT.mdfor work package history and operational runbook. - Frontend: minimal Streamlit app under
frontend/app.pyintended for Streamlit Community Cloud or local runs. - Utility scripts: see the
scripts/directory for ingestion, smoke tests, and benchmarking helpers (includingscripts/bench_local.py).