This repository contains a Retrieval-Augmented Generation (RAG) chatbot that leverages OpenAI's GPT models and Pinecone for semantic search and retrieval. The chatbot is designed to answer user queries by retrieving relevant documents, reranking them, and generating a response using a language model. It is built with a backend powered by FastAPI and a frontend using Streamlit.
- What is RAG?
- Features
- Project Structure
- Setup Instructions
- Environment Variables
- How to Run
- Evaluation
- Usage
- Model Details
- Contributing
- License
Retrieval-Augmented Generation (RAG) is a framework that combines information retrieval with generative models. Instead of relying solely on a language model's training data, RAG retrieves relevant documents from an external knowledge base (e.g., Pinecone) and uses them as context for generating responses. This approach improves the accuracy and relevance of responses, especially for domain-specific queries.
The chatbot first retrieves relevant document chunks using a vector similarity search in Pinecone. This ensures that the most contextually relevant data is available for response generation.
The retrieved document chunks are passed to the LLM as context, ensuring that the response is generated based on actual document content rather than generic knowledge.
The LLM synthesizes a response by leveraging the retrieved context, ensuring answers remain accurate and grounded in the uploaded documents.
- Document Retrieval: Uses Pinecone to retrieve relevant documents based on user queries.
- Reranking: Reranks retrieved documents using a SentenceTransformer model for better relevance.
- Generative Responses: Generates responses using OpenAI's GPT-3.5-turbo model.
- Frontend: A user-friendly Streamlit interface for interacting with the chatbot.
- Backend: A FastAPI-based backend for handling queries and managing retrieval logic.
backend/
lambda_handler.py
requirements.txt
frontend/
app.py
query_handler.py
data_ingestion.py
requirements.txt
evaluation/
evaluation.py
Dockerfile
env_template
README.md
- Backend: Handles query processing, document retrieval, and response generation.
- Frontend: Provides a web interface for users to interact with the chatbot.
- Model: Pretrained SentenceTransformer model for embedding and reranking.
- Evaluation: Contains scripts for evaluating the chatbot's performance.
- Python 3.9 or higher
- Docker (optional, for containerized deployment)
- Pinecone account and API key
- OpenAI API key
-
Clone the repository:
git clone https://github.com/ajith_vernekar/rag-chatbot.git cd rag-chatbot -
Set up the environment variables:
- Copy the
env_templatefile to.env:cp env_template .env
- Fill in the required values in the .env file:
OPENAI_API_KEY=<your_openai_api_key> PINECONE_API_KEY=<your_pinecone_api_key> PINECONE_ENVIRONMENT=<your_pinecone_environment> INDEX_NAME=<your_pinecone_index_name> BASE_URL=<backend_base_url>
- Copy the
-
Create a virtual environment and install dependencies:
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate pip install -r backend/requirements.txt pip install -r frontend/requirements.txt
The project requires the following environment variables:
-
OpenAI Configurations:
OPENAI_API_KEY: Your OpenAI API key for GPT models.
-
Pinecone Configurations:
PINECONE_API_KEY: Your Pinecone API key.PINECONE_ENVIRONMENT: The Pinecone environment (e.g.,us-west1-gcp).INDEX_NAME: The name of the Pinecone index.
-
Backend Configuration:
BASE_URL: The public URL to access the backend service.
-
Navigate to the backend folder:
cd backend -
Start the backend server:
uvicorn lambda_handler:app --host 0.0.0.0 --port 8000
-
Navigate to the frontend folder:
cd frontend -
Start the Streamlit app:
streamlit run app.py
-
Open your browser and go to
http://localhost:8501.
-
Build the Docker image:
docker build -t rag-chatbot . -
Run the Docker container:
docker run -p 8080:8080 rag-chatbot
-
The backend will be accessible at
http://localhost:8080.
The evaluation script is used to assess the performance of the RAG (Retrieval-Augmented Generation) chatbot pipeline using various metrics such as context_recall, faithfulness, and answer_relevancy.
-
Set Up Environment Variables: Ensure the following environment variables are set in your
.envfile:OPENAI_API_KEY: Your OpenAI API key.
-
Install Dependencies: Make sure all required Python dependencies are installed. You can do this by running:
pip install -r requirements.txt pip install ragas
-
Run the Evaluation Script: Navigate to the
evaluation/folder and execute the evaluation script:python -m evaluation.evaluation
The evaluation results will be saved to a CSV file named evaluation/evaluation_results.csv. The results will also be printed to the console.
- Context Recall: Measures how well the retrieved documents align with the context of the question.
- Faithfulness: Evaluates whether the generated answers are consistent with the retrieved documents.
- Answer Relevancy: Assesses the relevance of the generated answers to the questions.
The evaluation script processes a set of predefined questions and reference answers from the book Atomic Habits. It queries the RAG API, retrieves documents, and generates answers, which are then evaluated against the reference answers.
- Validation Errors: Ensure that the
retrieved_documentsfield in the API response is a list of strings. - API Errors: Check that the
BASE_URLandOPENAI_API_KEYare correctly configured in the.envfile. - Dependencies: Ensure all required libraries are installed and compatible with your Python version.
- Test Endpoint:
GET / - Query Endpoint:
POST /query- Request Body:
{ "user_input": "What is RAG?", "openai_api_key": "<your_openai_api_key>" } - Response:
{ "response": "RAG stands for Retrieval-Augmented Generation..." }
- Request Body:
- Enter your OpenAI API key in the sidebar.
- Upload a document for indexing.
- Ask questions about the uploaded document.
The project uses the all-MiniLM-L6-v2 model from SentenceTransformers. This model maps sentences and paragraphs to a 384-dimensional dense vector space, making it suitable for tasks like semantic search and clustering.
- Source: Hugging Face Model Hub
- Usage:
- For embedding:
SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2') - For fine-tuning: Refer to the training scripts in the model folder.
- For embedding:
Contributions are welcome! Please follow these steps:
- Fork the repository.
- Create a new branch:
git checkout -b feature-name
- Commit your changes:
git commit -m "Add feature-name" - Push to the branch:
git push origin feature-name
- Open a pull request.
This project is licensed under the Apache 2.0 License. See the LICENSE file for details.
