RAG Modulo is a Retrieval-Augmented Generation (RAG) solution that integrates various vector databases for efficient information retrieval and generation.
- Features
- Document Processing Flow
- Prerequisites
- Installation
- Usage
- Project Structure
- Configuration
- Testing
- CI/CD
- Contributing
- License
- Service-based architecture with clean separation of concerns
- Repository pattern for database operations
- Provider abstraction for LLM integration
- Dependency injection for better testability
- Asynchronous API for efficient operations
- Support for multiple vector databases (Elasticsearch, Milvus, Pinecone, Weaviate, ChromaDB)
- Flexible document processing for various formats (PDF, TXT, DOCX, XLSX)
- Customizable chunking strategies
- Configurable embedding models
- Separation of vector storage and metadata storage
- Multiple LLM provider support (WatsonX, OpenAI, Anthropic)
- Runtime provider configuration
- Template-based prompt management
- Error handling and recovery
- Concurrent request handling
- Comprehensive test suite with:
- Unit tests for components
- Integration tests for flows
- Performance tests for scalability
- Service-specific test suites
- Continuous Integration/Deployment
- Code quality checks
- Performance monitoring
- Security auditing
The following diagram illustrates how documents are processed in our RAG solution:
graph TD
A[User Uploads Document] --> B[DocumentProcessor]
B --> C{Document Type?}
C -->|PDF| D[PdfProcessor]
C -->|TXT| E[TxtProcessor]
C -->|DOCX| F[WordProcessor]
C -->|XLSX| G[ExcelProcessor]
D --> H[Extract Text, Tables, Images]
E --> I[Process Text]
F --> J[Extract Paragraphs]
G --> K[Extract Sheets and Data]
H --> L[Chunking]
I --> L
J --> L
K --> L
L --> M[Get Embeddings]
M --> N{Store Data}
N -->|Vector Data| O[VectorStore]
O --> P{Vector DB Type}
P -->|Milvus| Q[MilvusStore]
P -->|Elasticsearch| R[ElasticsearchStore]
P -->|Pinecone| S[PineconeStore]
P -->|Weaviate| T[WeaviateStore]
P -->|ChromaDB| U[ChromaDBStore]
N -->|Metadata| V[PostgreSQL]
V --> W[Repository Layer]
W --> X[Service Layer]
Explanation of the document processing flow:
- A user uploads a document to the system.
- The DocumentProcessor determines the type of document and routes it to the appropriate processor (PdfProcessor, TxtProcessor, WordProcessor, or ExcelProcessor).
- Each processor extracts the relevant content from the document.
- The extracted content goes through a chunking process to break it into manageable pieces.
- Embeddings are generated for the chunked content.
- The data is then stored in two places:
- Vector data (embeddings) are stored in the VectorStore, which can be one of several types (Milvus, Elasticsearch, Pinecone, Weaviate, or ChromaDB).
- Metadata is stored in PostgreSQL, accessed through the Repository Layer and Service Layer.
This architecture allows for flexibility in choosing vector databases and ensures efficient storage and retrieval of both vector data and metadata.
- Python 3.11+
- Docker and Docker Compose
- Clone the repository:
git clone https://github.com/manavgup/rag-modulo.git cd rag-modulo
- Set up your environment variables by copying the
.env.example
file:Then, edit thecp env.example .env
.env
file with your specific configuration. - Make sure you have container runtime installed (e.g., podman)
- Build app, start infra services (DBs, etc) and application containers (frontend, backend):
make run-app
- Access the API at
http://localhost:8000
and the frontend athttp://localhost:3000
.
rag_modulo/
├── .github/workflows/ci.yml # GitHub Actions workflow for build/test/publish
├── backend # Python backend application
│ ├── auth/ # Authentication code (e.g. OIDC)
│ ├── core/ # Config, exceptions, middleware
│ ├── rag_solution/ # Main application code
│ │ ├── data_ingestion/ # Data ingestion modules
│ │ ├── docs/ # Documentation files
│ │ ├── evaluation/ # Evaluation modules
│ │ ├── generation/ # Text generation modules
│ │ │ └── providers/ # LLM provider implementations
│ │ ├── models/ # Data models and schemas
│ │ ├── pipeline/ # RAG pipeline implementation
│ │ ├── query_rewriting/ # Query rewriting modules
│ │ ├── repository/ # Repository layer implementations
│ │ ├── retrieval/ # Data retrieval modules
│ │ ├── router/ # API route handlers
│ │ ├── schemas/ # Pydantic schemas
│ │ └── services/ # Service layer implementations
│ ├── tests/ # Test suite
│ │ ├── integration/ # Integration tests
│ │ ├── performance/ # Performance tests
│ │ ├── services/ # Service tests
│ │ └── README.md # Testing documentation
│ └── vectordbs/ # Vector database interfaces
├── webui/ # Frontend code
│ ├── src/
│ │ ├── components/ # React components
│ │ ├── services/ # Frontend services
│ │ └── config/ # Frontend configuration
├── .env # Environment variables
├── docker-compose-infra.yml # Infrastructure services configuration
├── docker-compose.yml # Application services configuration
├── Makefile # Project management commands
├── requirements.txt # Project dependencies
└── README.md # Project documentation
Key architectural components:
-
Service Layer:
- Implements business logic
- Manages transactions
- Handles dependencies
- Provides clean interfaces
-
Repository Layer:
- Data access abstraction
- Database operations
- Query optimization
- Transaction management
-
Provider System:
- LLM provider abstraction
- Multiple provider support
- Configuration management
- Error handling
-
Test Organization:
- Unit tests by component
- Integration tests
- Performance tests
- Service-specific tests
The following diagram illustrates the OAuth 2.0 Authorization Code flow used in our application with IBM as the identity provider:
sequenceDiagram
participant User
participant Frontend
participant Backend
participant IBM_OIDC
participant Database
User->>Frontend: Clicks Login
Frontend->>Backend: GET /api/auth/login
Backend->>IBM_OIDC: Redirect to Authorization Endpoint
IBM_OIDC->>User: Present Login Page
User->>IBM_OIDC: Enter Credentials
IBM_OIDC->>Backend: Redirect with Authorization Code
Backend->>IBM_OIDC: POST /token (exchange code for tokens)
IBM_OIDC-->>Backend: Access Token & ID Token
Backend->>Backend: Parse ID Token
Backend->>Database: Get or Create User
Database-->>Backend: User Data
Backend->>Backend: Set Session Data
Backend->>Frontend: Redirect to Dashboard
Frontend->>Backend: GET /api/auth/session
Backend-->>Frontend: User Data
Frontend->>User: Display Authenticated UI
The system uses a layered configuration approach with both environment variables and runtime configuration through services.
Basic infrastructure settings:
# Database Configuration
VECTOR_DB=milvus # Vector database type
MILVUS_HOST=localhost # Vector DB host
MILVUS_PORT=19530 # Vector DB port
DB_HOST=localhost # PostgreSQL host
DB_PORT=5432 # PostgreSQL port
# LLM Provider Settings
WATSONX_INSTANCE_ID=your-id # WatsonX instance ID
WATSONX_APIKEY=your-key # WatsonX API key
OPENAI_API_KEY=your-key # OpenAI API key (optional)
ANTHROPIC_API_KEY=your-key # Anthropic API key (optional)
# Application Settings
EMBEDDING_MODEL=all-minilm-l6-v2 # Default embedding model
DATA_DIR=/path/to/data # Data directory
Runtime configuration through services:
-
Provider Configuration:
provider_config = ProviderConfigInput( provider="watsonx", api_key="${WATSONX_APIKEY}", project_id="${WATSONX_INSTANCE_ID}", active=True ) config_service.create_provider_config(provider_config)
-
LLM Parameters:
parameters = LLMParametersInput( name="default-params", provider="watsonx", model_id="granite-13b", temperature=0.7, max_new_tokens=1000 ) parameters_service.create_parameters(parameters)
-
Template Configuration:
template = PromptTemplateInput( name="rag-query", provider="watsonx", template_type=PromptTemplateType.RAG_QUERY, template_format="Context:\n{context}\nQuestion:{question}" ) template_service.create_template(template)
-
Pipeline Configuration:
pipeline_config = PipelineConfigInput( name="default-pipeline", provider_id=provider.id, llm_parameters_id=parameters.id ) pipeline_service.create_pipeline_config(pipeline_config)
For detailed configuration options and examples, see:
The project includes a comprehensive test suite with unit tests, integration tests, and performance tests. For detailed information about testing, see Testing Documentation.
Run all tests:
make test
-
Unit Tests:
pytest backend/tests/
-
Integration Tests:
pytest backend/tests/integration/
-
Performance Tests:
pytest backend/tests/performance/
Generate coverage report:
pytest --cov=backend/rag_solution --cov-report=html
The performance test suite includes:
- Throughput testing
- Latency testing
- Resource usage monitoring
- Stability testing
For detailed performance test configuration and execution, refer to the Testing Documentation.
The project uses GitHub Actions for continuous integration and deployment, with a focus on maintaining service quality and performance.
-
Code Quality
quality: steps: - name: Code Formatting run: black backend/ - name: Type Checking run: mypy backend/ - name: Linting run: flake8 backend/ - name: Import Sorting run: isort backend/
-
Testing
test: steps: - name: Unit Tests run: pytest backend/tests/services/ - name: Integration Tests run: pytest backend/tests/integration/ - name: Performance Tests run: | pytest backend/tests/performance/ \ --html=performance-report.html - name: Coverage Report run: | pytest --cov=backend/rag_solution \ --cov-report=xml \ --cov-fail-under=80
-
Security
security: steps: - name: Dependency Scan run: safety check - name: SAST Analysis run: bandit -r backend/ - name: Secret Detection run: detect-secrets scan
-
Build & Deploy
deploy: steps: - name: Build Images run: docker-compose build - name: Run Tests in Container run: docker-compose run test - name: Push Images run: docker-compose push
The pipeline enforces several quality gates:
-
Code Quality
- No formatting errors
- No type checking errors
- No linting violations
- Proper import sorting
-
Testing
- All tests must pass
- 80% minimum coverage
- Performance tests within thresholds
- No integration test failures
-
Security
- No critical vulnerabilities
- No exposed secrets
- Clean SAST scan
-
Service Requirements
- Service tests pass
- API contracts validated
- Configuration validated
- Performance metrics met
For detailed CI/CD configuration, see:
Contributions are welcome! Please follow these guidelines when contributing to the project.
-
Service Layer Architecture
- Follow the service-based architecture pattern
- Implement new features as services
- Use dependency injection
- Follow repository pattern for data access
- Document service interfaces
-
Code Style
- Use type hints
- Write comprehensive docstrings
- Follow PEP 8 guidelines
- Use async/await where appropriate
- Handle errors properly
-
Testing Requirements
- Write unit tests for services
- Add integration tests for flows
- Include performance tests for critical paths
- Maintain test coverage above 80%
- Document test scenarios
-
Fork and Clone
git clone https://github.com/yourusername/rag-modulo.git cd rag-modulo
-
Set Up Development Environment
# Create virtual environment python -m venv venv source venv/bin/activate # or `venv\Scripts\activate` on Windows # Install dependencies pip install -r requirements.txt pip install -r requirements-dev.txt
-
Create Feature Branch
git checkout -b feature/YourFeature
-
Development Workflow
- Write tests first (TDD)
- Implement feature
- Run test suite
- Update documentation
- Run linters
-
Testing
# Run all tests pytest # Run specific test types pytest backend/tests/services/ # Service tests pytest backend/tests/integration/ # Integration tests pytest backend/tests/performance/ # Performance tests # Check coverage pytest --cov=backend/rag_solution
-
Submit Changes
- Push changes to your fork
- Create pull request
- Fill out PR template
- Respond to reviews
When adding new features:
- Update service documentation
- Add configuration examples
- Update testing documentation
- Include performance considerations
- Document API changes
For detailed development guidelines, see:
This project is licensed under the MIT License - see the LICENSE file for details.