A modern, responsive chat interface for interacting with PDF documents through AI. Upload any PDF and start asking questions about its content with our sleek dark-mode UI.
RAG PDF Chatbot is a Retrieval-Augmented Generation (RAG) chatbot built using FastAPI and Llama-2. It processes PDF documents and allows users to interact with the content through a conversational interface. The project leverages advanced language models and embeddings to provide accurate and context-aware responses. The chatbot operates locally & completely offline, boosting security and supporting any model file in gguf format or integration with LMStudio for enhanced flexibility.
- Load and process PDF documents.
- Upload PDF files dynamically via the frontend.
- Generate embeddings using
sentence-transformers. - Use
Chromaas a vector store for efficient retrieval. - Integrate with Llama-2 for natural language understanding and generation. (Can use any other open source model in gguf format or even replace with LMStudio)
- FastAPI backend with endpoints for chat, file upload, and health checks.
- Next.js-based frontend for user interaction.
- Python 3.10 or higher
- CUDA-enabled GPU (optional, for faster processing)
-
Clone the repository:
git clone https://github.com/Anshulgada/RAG-Chatbot.git cd RAG-Chatbot -
Install dependencies (recommend using uv package manager):
uv venv uv sync
-
Ensure the Llama-2 model file is placed in the
models/directory:- File:
llama-2-7b-chat.Q4_K_M.gguf
- File:
-
Start the FastAPI server:
uvicorn app:app --reload
-
Navigate to the frontend directory and start the React app:
cd frontend npm install npm run dev
- Access the backend at
http://0.0.0.0:8000. - Use the
/uploadendpoint to upload a PDF file for processing. - Use the
/chatendpoint to send chat messages and receive responses. - Open the frontend at
http://localhost:3000for a user-friendly interface.
- File Upload: Upload a PDF file directly from the interface.
- Chat Interface: Type messages and receive responses in real-time.
- Responsive Design: Optimized for both light and dark modes.
POST /upload: Upload a PDF file for processing.POST /chat: Send a chat message and receive a response.GET /: Health check endpoint.
app.py: FastAPI backend implementation.frontend/: React-based frontend.models/: Directory for storing the Llama-2 model.Harry Potter and the Sorcerers Stone.pdf: Sample PDF for testing.pyproject.toml: Project dependencies and configuration.
For further details and updates on llama-cpp-python, refer to the following resources:
Before installing PyTorch, ensure that the CUDA Toolkit is downloaded and installed. The version of the CUDA Toolkit must match the version of PyTorch you are installing. For example, if PyTorch is version 12.8, the CUDA Toolkit should also be version 12.8.
- For the latest CUDA Toolkit, visit: NVIDIA CUDA Downloads
- If you need an older version of the CUDA Toolkit, visit: NVIDIA CUDA Toolkit Archive
At the time of writing, the latest CUDA Toolkit version is 13, while the latest PyTorch version is 12.9. Ensure compatibility by downloading the appropriate versions.
Torch should be installed based on your system configuration. For Windows machines using CUDA and Python, visit the PyTorch Get Started page to find the appropriate installation command.
For example, if you are using CUDA version cu128, the installation command would be:
uv pip install torch torchvision --index-url https://download.pytorch.org/whl/cu128Replace cu128 with the current CUDA version available at the time. Always refer to the PyTorch website for the latest instructions.
Anshul Gada
