Skip to content

Retrieval-Augmented Generation (RAG) chatbot that answers questions from PDFs using Llama-2 embeddings.

Notifications You must be signed in to change notification settings

Anshulgada/RAG-Chatbot

Repository files navigation

RAG PDF Chatbot

Frontend-UI Look

A modern, responsive chat interface for interacting with PDF documents through AI. Upload any PDF and start asking questions about its content with our sleek dark-mode UI.

Overview

RAG PDF Chatbot is a Retrieval-Augmented Generation (RAG) chatbot built using FastAPI and Llama-2. It processes PDF documents and allows users to interact with the content through a conversational interface. The project leverages advanced language models and embeddings to provide accurate and context-aware responses. The chatbot operates locally & completely offline, boosting security and supporting any model file in gguf format or integration with LMStudio for enhanced flexibility.

Features

  • Load and process PDF documents.
  • Upload PDF files dynamically via the frontend.
  • Generate embeddings using sentence-transformers.
  • Use Chroma as a vector store for efficient retrieval.
  • Integrate with Llama-2 for natural language understanding and generation. (Can use any other open source model in gguf format or even replace with LMStudio)
  • FastAPI backend with endpoints for chat, file upload, and health checks.
  • Next.js-based frontend for user interaction.

Prerequisites

  • Python 3.10 or higher
  • CUDA-enabled GPU (optional, for faster processing)

Installation

  1. Clone the repository:

    git clone https://github.com/Anshulgada/RAG-Chatbot.git
    cd RAG-Chatbot
  2. Install dependencies (recommend using uv package manager):

    uv venv
    uv sync
  3. Ensure the Llama-2 model file is placed in the models/ directory:

    • File: llama-2-7b-chat.Q4_K_M.gguf
  4. Start the FastAPI server:

    uvicorn app:app --reload
  5. Navigate to the frontend directory and start the React app:

    cd frontend
    npm install
    npm run dev

Usage

  • Access the backend at http://0.0.0.0:8000.
  • Use the /upload endpoint to upload a PDF file for processing.
  • Use the /chat endpoint to send chat messages and receive responses.
  • Open the frontend at http://localhost:3000 for a user-friendly interface.

Frontend Features

  • File Upload: Upload a PDF file directly from the interface.
  • Chat Interface: Type messages and receive responses in real-time.
  • Responsive Design: Optimized for both light and dark modes.

Backend Endpoints

  • POST /upload: Upload a PDF file for processing.
  • POST /chat: Send a chat message and receive a response.
  • GET /: Health check endpoint.

Project Structure

  • app.py: FastAPI backend implementation.
  • frontend/: React-based frontend.
  • models/: Directory for storing the Llama-2 model.
  • Harry Potter and the Sorcerers Stone.pdf: Sample PDF for testing.
  • pyproject.toml: Project dependencies and configuration.

Additional Notes

Llama-CPP-Python

For further details and updates on llama-cpp-python, refer to the following resources:

CUDA Toolkit

Before installing PyTorch, ensure that the CUDA Toolkit is downloaded and installed. The version of the CUDA Toolkit must match the version of PyTorch you are installing. For example, if PyTorch is version 12.8, the CUDA Toolkit should also be version 12.8.

At the time of writing, the latest CUDA Toolkit version is 13, while the latest PyTorch version is 12.9. Ensure compatibility by downloading the appropriate versions.

Torch Installation

Torch should be installed based on your system configuration. For Windows machines using CUDA and Python, visit the PyTorch Get Started page to find the appropriate installation command.

For example, if you are using CUDA version cu128, the installation command would be:

uv pip install torch torchvision --index-url https://download.pytorch.org/whl/cu128

Replace cu128 with the current CUDA version available at the time. Always refer to the PyTorch website for the latest instructions.

Author

Anshul Gada

About

Retrieval-Augmented Generation (RAG) chatbot that answers questions from PDFs using Llama-2 embeddings.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •