This repository provides a Retrieval-Augmented Generation (RAG) pipeline for processing and utilizing RedBooks. The RedBooks are pre-converted into markdown files using the Python library docling. This pipeline uses ChromaDB for vector database storage and llama-cpp-python for Large Language Model (LLM) inference.
Before using this project, ensure you have the following dependencies installed:
- ChromaDB: A vector database for storing embeddings.
- llama-cpp-python: Python bindings for running LLaMA-based models locally.
For ppc64le you can use these commands to get chroma, llama.cpp.python and other libraries: micromamba create -n env python=3.10
micromamba install -c rocketce -c defaults pytorch-cpu scikit-learn pyyaml httptools onnxruntime "pandas<1.6.0" tokenizers
pip install -U --extra-index-url https://repo.fury.io/mgiessing --prefer-binary chromadb transformers psutil langchain sentence_transformers gradio==3.50.2 llama-cpp-python
Install the other libraries with pip for x86 and with conda (rocketce or defaults as the channel)
- Run the
converter_docling.pyscript:python converter_docling.py
To generate the vector database from your markdown files:
- Run the
chromaDB_md.pyscript:python chromaDB_md.py
- This will create a vector database in the
/dbdirectory. The database includes 5 collections, each corresponding to a markdown file.
To use the Large Language Model with the context from the vector DB (LLM):
- Open
run_model.pyin your preferred text editor. - Update the
model:pathvariable to point to your GGUF model.
Execute the pipeline by running:
python run_model.pyThis will start serving the gradio UI over HTTP port 8082
Alternatively this demo can be installed on a remote or local ppc64le RHEL host using the ansible playbook in the ansible directory.
For possible configuration options see the example inventory file.
/db: Contains the vector database with collections generated by ChromaDB.chromaDB_md.py: Script for creating the vector database.run_model.py: Script for running the RAG pipeline using the configured LLM.
- Ensure the RedBooks markdown files are in the expected format before running the pipeline.
- Make sure the GGUF model is compatible with
llama-cpp-python.
If you would like to contribute to this project, feel free to fork the repository, make changes, and submit a pull request.
This project is licensed under the MIT License. Feel free to use, modify, and distribute this project.
Happy experimenting with the RAG Pipeline for RedBooks!