A high-performance Rust implementation of GraphRAG (Graph-based Retrieval Augmented Generation) for building knowledge graphs from documents and querying them with natural language.
# Clone and build
git clone https://github.com/your-username/graphrag-rs.git
cd graphrag-rs
cargo build --release
# Use a pre-configured template (multiple available!)
cp config.toml my_config.toml
# Or choose a specific template:
# cp config_tom_sawyer.toml my_config.toml
# cp config_complete.toml my_config.toml
# Edit the config to point to YOUR document:
# nano my_config.toml
# Change this line: input_document_path = "path/to/your/document.txt"
# Change this line: output_dir = "./output/your_project"
# Process your document and ask questions
cargo run --bin simple_cli my_config.toml "What is this document about?"
Config files explained:
input_document_path
- Path to your text file to analyzeoutput_dir
- Where GraphRAG saves the knowledge graph- Templates in root:
config.toml
,config_complete.toml
,config_tom_sawyer.toml
- Pick one, copy it, change the document path, and you're ready!
- Rust 1.70 or later
- (Optional) Ollama for local LLM support - Install Ollama
git clone https://github.com/your-username/graphrag-rs.git
cd graphrag-rs
cargo build --release
# Optional: Install globally
cargo install --path .
use graphrag_rs::simple;
fn main() -> Result<(), Box<dyn std::error::Error>> {
let answer = simple::answer("Your document text", "Your question")?;
println!("Answer: {}", answer);
Ok(())
}
use graphrag_rs::easy::SimpleGraphRAG;
fn main() -> Result<(), Box<dyn std::error::Error>> {
let mut graph = SimpleGraphRAG::from_text("Your document text")?;
let answer1 = graph.ask("What is this about?")?;
let answer2 = graph.ask("Who are the main characters?")?;
println!("Answer 1: {}", answer1);
println!("Answer 2: {}", answer2);
Ok(())
}
use graphrag_rs::{GraphRAG, ConfigPreset};
fn main() -> Result<(), Box<dyn std::error::Error>> {
let mut graphrag = GraphRAG::builder()
.with_preset(ConfigPreset::Balanced)
.auto_detect_llm()
.build()?;
graphrag.add_document("Your document")?;
let answer = graphrag.ask("Your question")?;
println!("Answer: {}", answer);
Ok(())
}
GraphRAG-rs provides two CLI tools:
Automatically detects if the knowledge graph needs building and handles everything for you:
# Build the Smart CLI
cargo build --release --bin simple_cli
# Process document and answer question in one command
cargo run --bin simple_cli config.toml "What are the main themes?"
# Interactive mode - builds graph if needed, then waits for questions
cargo run --bin simple_cli config.toml
# How it works:
# 1. Loads your TOML configuration
# 2. Checks if knowledge graph exists
# 3. Builds graph if needed (shows progress)
# 4. Answers your question using Ollama
# 5. Saves results to output directory
For advanced users who want full control:
# Build the manual CLI
cargo build --release
# Step 1: Build knowledge graph
./target/release/graphrag-rs config.toml build
# Step 2: Query the graph
./target/release/graphrag-rs config.toml query "Your question"
The project includes several ready-to-use configuration templates:
Available Templates:
config.toml
- Basic configuration for general useconfig_complete.toml
- Full configuration with all optionsconfig_tom_sawyer.toml
- Pre-configured for book processingconfig_example.toml
- Annotated template with explanations
Essential Configuration Fields:
[general]
# IMPORTANT: Change these two paths for your project!
input_document_path = "path/to/your/document.txt" # Your document to process
output_dir = "./output/your_project" # Where to save results
[pipeline]
chunk_size = 800 # Size of text chunks (adjust based on document type)
chunk_overlap = 200 # Overlap to preserve context between chunks
[ollama]
enabled = true
host = "http://localhost"
port = 11434
chat_model = "llama3.1:8b" # LLM for text generation
embedding_model = "nomic-embed-text" # Model for embeddings
Quick Setup:
- Copy a template:
cp config_complete.toml my_project.toml
- Edit
input_document_path
to point to your document - Edit
output_dir
to set where results are saved - Run:
cargo run --bin simple_cli my_project.toml
See config_example.toml for detailed explanations of all options.
- Entity Extraction: Automatically identifies people, places, organizations, and concepts
- Relationship Discovery: Finds connections between entities
- Hierarchical Organization: Creates multi-level document summaries
- Incremental Updates: Real-time graph updates without full reprocessing
- Fast-GraphRAG Implementation: PageRank-based retrieval with 6x cost reduction
- Personalized PageRank: Optimized query processing at inference time
- Semantic Search: Find information using meaning, not just keywords
- Hybrid Retrieval: Combines keyword, semantic, and graph-based search for best results
- Context-Aware Answers: Generates responses based on document context
- LightRAG Integration: 6000x token reduction vs traditional GraphRAG
- Parallel Processing: Utilizes all CPU cores for fast processing
- Efficient Storage: Minimal memory footprint (<100MB for typical documents)
- Fast Queries: Sub-second response times for most queries
- Query Caching: Intelligent caching for repeated queries
- Local LLM Support: Works with Ollama for private, offline processing
- Configurable Pipeline: Adjust chunking, extraction, and retrieval parameters
- Multiple APIs: Choose complexity level based on your needs
- Modular Architecture: Swap components without affecting the system
# Example 1: Process a book using existing template
cp config_tom_sawyer.toml my_book_config.toml
# Edit my_book_config.toml:
# input_document_path = "books/my_book.txt"
# output_dir = "./output/my_book"
cargo run --bin simple_cli my_book_config.toml "Who are the main characters?"
# Example 2: Process a research paper
cp config.toml research_config.toml
# Edit research_config.toml:
# input_document_path = "papers/research.txt"
# output_dir = "./output/research"
# chunk_size = 500 # Smaller chunks for technical content
cargo run --bin simple_cli research_config.toml "What is the main hypothesis?"
# Example 3: Process with full configuration
cp config_complete.toml advanced_config.toml
# Edit all the parameters you need in advanced_config.toml
cargo run --bin simple_cli advanced_config.toml
use graphrag_rs::{GraphRAG, Document};
use std::fs;
fn main() -> Result<(), Box<dyn std::error::Error>> {
// Read document
let content = fs::read_to_string("book.txt")?;
// Create and configure GraphRAG
let mut graphrag = GraphRAG::builder()
.with_chunk_size(1000)
.with_chunk_overlap(200)
.build()?;
// Process document
let doc = Document::new("book", content);
graphrag.add_document(doc)?;
// Query
let answer = graphrag.ask("What are the main themes?")?;
println!("Answer: {}", answer);
Ok(())
}
use graphrag_rs::{GraphRAG, OllamaConfig};
fn main() -> Result<(), Box<dyn std::error::Error>> {
// Configure Ollama
let ollama = OllamaConfig::new()
.with_model("llama3.1:8b")
.with_embedding_model("nomic-embed-text");
// Create GraphRAG with Ollama
let mut graphrag = GraphRAG::builder()
.with_llm(ollama)
.build()?;
// Use as normal
graphrag.add_text("Your document")?;
let answer = graphrag.ask("Your question")?;
Ok(())
}
use graphrag_rs::GraphRAG;
use std::fs;
fn main() -> Result<(), Box<dyn std::error::Error>> {
let mut graphrag = GraphRAG::new_default()?;
// Process multiple documents
for file in ["doc1.txt", "doc2.txt", "doc3.txt"] {
let content = fs::read_to_string(file)?;
graphrag.add_text(&content)?;
}
// Query across all documents
let answer = graphrag.ask("What connects these documents?")?;
println!("Answer: {}", answer);
Ok(())
}
GraphRAG-rs implements cutting-edge 2024 research in retrieval-augmented generation:
- Fast-GraphRAG: PageRank-based retrieval with 6x cost reduction compared to traditional GraphRAG
- Incremental Updates: Zero-downtime real-time graph processing for dynamic documents
- LightRAG Integration: Achieves 6000x token reduction through efficient dual-level retrieval
- Personalized PageRank: Optimized query processing using PageRank at inference time
- Hybrid Retrieval: Combines semantic, keyword, and graph-based search strategies
- LMCD Entity Linking: Advanced entity resolution with multiple matching algorithms
- Trait-Based Architecture: 12+ core abstractions for maximum modularity
- Memory-Safe Implementation: Leverages Rust's ownership system for reliability
- Comprehensive Testing: 168+ test cases ensuring production readiness
GraphRAG-rs processes documents through a multi-stage pipeline:
Document → Chunking → Entity Extraction → Graph Building → Vector Index → Query Engine → Answer
Key innovations:
- Fast-GraphRAG approach for efficient retrieval
- Incremental processing for real-time updates
- Dual-level retrieval from LightRAG
- PageRank-based relevance scoring
For detailed architecture information, see ARCHITECTURE.md.
// Main GraphRAG interface
pub struct GraphRAG { /* ... */ }
// Document representation
pub struct Document {
pub id: String,
pub content: String,
pub metadata: HashMap<String, String>,
}
// Query results
pub struct QueryResult {
pub answer: String,
pub confidence: f32,
pub sources: Vec<String>,
}
impl GraphRAG {
// Create new instance
pub fn new(config: Config) -> Result<Self>;
// Add content
pub fn add_document(&mut self, doc: Document) -> Result<()>;
pub fn add_text(&mut self, text: &str) -> Result<()>;
// Query
pub fn ask(&self, question: &str) -> Result<String>;
pub fn query(&self, question: &str) -> Result<QueryResult>;
// Management
pub fn clear(&mut self);
pub fn save(&self, path: &str) -> Result<()>;
pub fn load(&mut self, path: &str) -> Result<()>;
}
[performance]
chunk_size = 500 # Smaller chunks use less memory
max_entities_per_chunk = 10
enable_caching = false
[performance]
enable_parallel = true
num_threads = 8 # Adjust based on CPU cores
batch_size = 50
[pipeline]
chunk_overlap = 400 # Higher overlap preserves more context
min_confidence = 0.7
enable_reranking = true
Build fails with "rust version" error
# Update Rust
rustup update
Out of memory error
# Reduce chunk size in config.toml
chunk_size = 300
enable_parallel = false
Slow processing
# Enable parallel processing
enable_parallel = true
num_threads = 8
Ollama connection error
# Ensure Ollama is running
ollama serve
# Check if model is available
ollama list
# Enable debug logging
RUST_LOG=debug cargo run --bin simple_cli config.toml
# Enable backtrace for errors
RUST_BACKTRACE=1 cargo run
We welcome contributions! Please see CONTRIBUTING.md for guidelines.
# Clone repository
git clone https://github.com/your-username/graphrag-rs.git
cd graphrag-rs
# Run tests
cargo test
# Run with debug info
RUST_LOG=debug cargo run
# Check code quality
cargo clippy
cargo fmt --check
Q: What file formats are supported? A: Currently supports plain text (.txt) and markdown (.md). PDF support is planned.
Q: Can I use this without Ollama? A: Yes, the library includes a mock LLM for testing and can work with embeddings only.
Q: How much memory does it need? A: Typically under 100MB for documents up to 500k characters.
Q: Is it production ready? A: Yes, the core functionality is stable and well-tested.
Q: Can I use commercial LLMs? A: OpenAI support is planned. Currently works with Ollama's local models.
- OpenAI API support
- PDF document support
- Web UI interface
- Incremental index updates
- Distributed processing
- GPU acceleration for embeddings
MIT License - see LICENSE for details.
- Microsoft GraphRAG for the original concept
- Ollama for local LLM support
- Rust community for excellent libraries
Built with Rust | Documentation | Report Issues