GraphRAG-rs

A high-performance Rust implementation of GraphRAG (Graph-based Retrieval Augmented Generation) for building knowledge graphs from documents and querying them with natural language.

Quick Start (30 seconds)

# Clone and build
git clone https://github.com/your-username/graphrag-rs.git
cd graphrag-rs
cargo build --release

# Use a pre-configured template (multiple available!)
cp config.toml my_config.toml
# Or choose a specific template:
# cp config_tom_sawyer.toml my_config.toml
# cp config_complete.toml my_config.toml

# Edit the config to point to YOUR document:
# nano my_config.toml
# Change this line: input_document_path = "path/to/your/document.txt"
# Change this line: output_dir = "./output/your_project"

# Process your document and ask questions
cargo run --bin simple_cli my_config.toml "What is this document about?"

Config files explained:

input_document_path - Path to your text file to analyze
output_dir - Where GraphRAG saves the knowledge graph
Templates in root: config.toml, config_complete.toml, config_tom_sawyer.toml
Pick one, copy it, change the document path, and you're ready!

Installation

Prerequisites

Rust 1.70 or later
(Optional) Ollama for local LLM support - Install Ollama

From Source

git clone https://github.com/your-username/graphrag-rs.git
cd graphrag-rs
cargo build --release

# Optional: Install globally
cargo install --path .

Basic Usage

1. Simple API (One Line)

use graphrag_rs::simple;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let answer = simple::answer("Your document text", "Your question")?;
    println!("Answer: {}", answer);
    Ok(())
}

2. Stateful API (Multiple Queries)

use graphrag_rs::easy::SimpleGraphRAG;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let mut graph = SimpleGraphRAG::from_text("Your document text")?;

    let answer1 = graph.ask("What is this about?")?;
    let answer2 = graph.ask("Who are the main characters?")?;

    println!("Answer 1: {}", answer1);
    println!("Answer 2: {}", answer2);
    Ok(())
}

3. Builder API (Configurable)

use graphrag_rs::{GraphRAG, ConfigPreset};

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let mut graphrag = GraphRAG::builder()
        .with_preset(ConfigPreset::Balanced)
        .auto_detect_llm()
        .build()?;

    graphrag.add_document("Your document")?;
    let answer = graphrag.ask("Your question")?;

    println!("Answer: {}", answer);
    Ok(())
}

4. CLI Usage

GraphRAG-rs provides two CLI tools:

Smart CLI (Recommended) - `simple_cli`

Automatically detects if the knowledge graph needs building and handles everything for you:

# Build the Smart CLI
cargo build --release --bin simple_cli

# Process document and answer question in one command
cargo run --bin simple_cli config.toml "What are the main themes?"

# Interactive mode - builds graph if needed, then waits for questions
cargo run --bin simple_cli config.toml

# How it works:
# 1. Loads your TOML configuration
# 2. Checks if knowledge graph exists
# 3. Builds graph if needed (shows progress)
# 4. Answers your question using Ollama
# 5. Saves results to output directory

Manual CLI - `graphrag-rs`

For advanced users who want full control:

# Build the manual CLI
cargo build --release

# Step 1: Build knowledge graph
./target/release/graphrag-rs config.toml build

# Step 2: Query the graph
./target/release/graphrag-rs config.toml query "Your question"

Configuration

Basic Configuration (config.toml)

The project includes several ready-to-use configuration templates:

Available Templates:

config.toml - Basic configuration for general use
config_complete.toml - Full configuration with all options
config_tom_sawyer.toml - Pre-configured for book processing
config_example.toml - Annotated template with explanations

Essential Configuration Fields:

[general]
# IMPORTANT: Change these two paths for your project!
input_document_path = "path/to/your/document.txt"  # Your document to process
output_dir = "./output/your_project"                # Where to save results

[pipeline]
chunk_size = 800        # Size of text chunks (adjust based on document type)
chunk_overlap = 200     # Overlap to preserve context between chunks

[ollama]
enabled = true
host = "http://localhost"
port = 11434
chat_model = "llama3.1:8b"           # LLM for text generation
embedding_model = "nomic-embed-text"  # Model for embeddings

Quick Setup:

Copy a template: cp config_complete.toml my_project.toml
Edit input_document_path to point to your document
Edit output_dir to set where results are saved
Run: cargo run --bin simple_cli my_project.toml

See config_example.toml for detailed explanations of all options.

Core Features

Knowledge Graph Construction

Entity Extraction: Automatically identifies people, places, organizations, and concepts
Relationship Discovery: Finds connections between entities
Hierarchical Organization: Creates multi-level document summaries
Incremental Updates: Real-time graph updates without full reprocessing

Intelligent Retrieval

Fast-GraphRAG Implementation: PageRank-based retrieval with 6x cost reduction
Personalized PageRank: Optimized query processing at inference time
Semantic Search: Find information using meaning, not just keywords
Hybrid Retrieval: Combines keyword, semantic, and graph-based search for best results
Context-Aware Answers: Generates responses based on document context

Performance

LightRAG Integration: 6000x token reduction vs traditional GraphRAG
Parallel Processing: Utilizes all CPU cores for fast processing
Efficient Storage: Minimal memory footprint (<100MB for typical documents)
Fast Queries: Sub-second response times for most queries
Query Caching: Intelligent caching for repeated queries

Flexibility

Local LLM Support: Works with Ollama for private, offline processing
Configurable Pipeline: Adjust chunking, extraction, and retrieval parameters
Multiple APIs: Choose complexity level based on your needs
Modular Architecture: Swap components without affecting the system

Examples

Quick Example: Using Config Templates

# Example 1: Process a book using existing template
cp config_tom_sawyer.toml my_book_config.toml
# Edit my_book_config.toml:
#   input_document_path = "books/my_book.txt"
#   output_dir = "./output/my_book"
cargo run --bin simple_cli my_book_config.toml "Who are the main characters?"

# Example 2: Process a research paper
cp config.toml research_config.toml
# Edit research_config.toml:
#   input_document_path = "papers/research.txt"
#   output_dir = "./output/research"
#   chunk_size = 500  # Smaller chunks for technical content
cargo run --bin simple_cli research_config.toml "What is the main hypothesis?"

# Example 3: Process with full configuration
cp config_complete.toml advanced_config.toml
# Edit all the parameters you need in advanced_config.toml
cargo run --bin simple_cli advanced_config.toml

Process a Book

use graphrag_rs::{GraphRAG, Document};
use std::fs;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Read document
    let content = fs::read_to_string("book.txt")?;

    // Create and configure GraphRAG
    let mut graphrag = GraphRAG::builder()
        .with_chunk_size(1000)
        .with_chunk_overlap(200)
        .build()?;

    // Process document
    let doc = Document::new("book", content);
    graphrag.add_document(doc)?;

    // Query
    let answer = graphrag.ask("What are the main themes?")?;
    println!("Answer: {}", answer);

    Ok(())
}

Use with Ollama

use graphrag_rs::{GraphRAG, OllamaConfig};

fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Configure Ollama
    let ollama = OllamaConfig::new()
        .with_model("llama3.1:8b")
        .with_embedding_model("nomic-embed-text");

    // Create GraphRAG with Ollama
    let mut graphrag = GraphRAG::builder()
        .with_llm(ollama)
        .build()?;

    // Use as normal
    graphrag.add_text("Your document")?;
    let answer = graphrag.ask("Your question")?;

    Ok(())
}

Batch Processing

use graphrag_rs::GraphRAG;
use std::fs;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let mut graphrag = GraphRAG::new_default()?;

    // Process multiple documents
    for file in ["doc1.txt", "doc2.txt", "doc3.txt"] {
        let content = fs::read_to_string(file)?;
        graphrag.add_text(&content)?;
    }

    // Query across all documents
    let answer = graphrag.ask("What connects these documents?")?;
    println!("Answer: {}", answer);

    Ok(())
}

Technical Achievements

GraphRAG-rs implements cutting-edge 2024 research in retrieval-augmented generation:

Fast-GraphRAG: PageRank-based retrieval with 6x cost reduction compared to traditional GraphRAG
Incremental Updates: Zero-downtime real-time graph processing for dynamic documents
LightRAG Integration: Achieves 6000x token reduction through efficient dual-level retrieval
Personalized PageRank: Optimized query processing using PageRank at inference time
Hybrid Retrieval: Combines semantic, keyword, and graph-based search strategies
LMCD Entity Linking: Advanced entity resolution with multiple matching algorithms
Trait-Based Architecture: 12+ core abstractions for maximum modularity
Memory-Safe Implementation: Leverages Rust's ownership system for reliability
Comprehensive Testing: 168+ test cases ensuring production readiness

Architecture Overview

GraphRAG-rs processes documents through a multi-stage pipeline:

Document → Chunking → Entity Extraction → Graph Building → Vector Index → Query Engine → Answer

Key innovations:

Fast-GraphRAG approach for efficient retrieval
Incremental processing for real-time updates
Dual-level retrieval from LightRAG
PageRank-based relevance scoring

For detailed architecture information, see ARCHITECTURE.md.

API Reference

Core Types

// Main GraphRAG interface
pub struct GraphRAG { /* ... */ }

// Document representation
pub struct Document {
    pub id: String,
    pub content: String,
    pub metadata: HashMap<String, String>,
}

// Query results
pub struct QueryResult {
    pub answer: String,
    pub confidence: f32,
    pub sources: Vec<String>,
}

Main Methods

impl GraphRAG {
    // Create new instance
    pub fn new(config: Config) -> Result<Self>;

    // Add content
    pub fn add_document(&mut self, doc: Document) -> Result<()>;
    pub fn add_text(&mut self, text: &str) -> Result<()>;

    // Query
    pub fn ask(&self, question: &str) -> Result<String>;
    pub fn query(&self, question: &str) -> Result<QueryResult>;

    // Management
    pub fn clear(&mut self);
    pub fn save(&self, path: &str) -> Result<()>;
    pub fn load(&mut self, path: &str) -> Result<()>;
}

Performance Tuning

Memory Optimization

[performance]
chunk_size = 500  # Smaller chunks use less memory
max_entities_per_chunk = 10
enable_caching = false

Speed Optimization

[performance]
enable_parallel = true
num_threads = 8  # Adjust based on CPU cores
batch_size = 50

Accuracy Optimization

[pipeline]
chunk_overlap = 400  # Higher overlap preserves more context
min_confidence = 0.7
enable_reranking = true

Troubleshooting

Common Issues

Build fails with "rust version" error

# Update Rust
rustup update

Out of memory error

# Reduce chunk size in config.toml
chunk_size = 300
enable_parallel = false

Slow processing

# Enable parallel processing
enable_parallel = true
num_threads = 8

Ollama connection error

# Ensure Ollama is running
ollama serve

# Check if model is available
ollama list

Debug Mode

# Enable debug logging
RUST_LOG=debug cargo run --bin simple_cli config.toml

# Enable backtrace for errors
RUST_BACKTRACE=1 cargo run

Contributing

We welcome contributions! Please see CONTRIBUTING.md for guidelines.

Development Setup

# Clone repository
git clone https://github.com/your-username/graphrag-rs.git
cd graphrag-rs

# Run tests
cargo test

# Run with debug info
RUST_LOG=debug cargo run

# Check code quality
cargo clippy
cargo fmt --check

FAQ

Q: What file formats are supported? A: Currently supports plain text (.txt) and markdown (.md). PDF support is planned.

Q: Can I use this without Ollama? A: Yes, the library includes a mock LLM for testing and can work with embeddings only.

Q: How much memory does it need? A: Typically under 100MB for documents up to 500k characters.

Q: Is it production ready? A: Yes, the core functionality is stable and well-tested.

Q: Can I use commercial LLMs? A: OpenAI support is planned. Currently works with Ollama's local models.

Roadmap

License

MIT License - see LICENSE for details.

Acknowledgments

Microsoft GraphRAG for the original concept
Ollama for local LLM support
Rust community for excellent libraries

Built with Rust | Documentation | Report Issues

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.vscode		.vscode
benches		benches
config/templates		config/templates
docs-example		docs-example
examples		examples
output/symposium		output/symposium
src		src
tests		tests
.gitignore		.gitignore
ARCHITECTURE.md		ARCHITECTURE.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
README.md		README.md
diagram.md		diagram.md

automataIA/graphrag-rs

Folders and files

Latest commit

History

Repository files navigation