A citation-grounded code comprehension system using hybrid retrieval (BM25 + dense embeddings), graph-based expansion, query expansion (RM3), cross-encoder reranking, and submodular/greedy context packing.
- Hybrid Retrieval: Combines BM25 sparse retrieval with dense semantic embeddings
- Query Expansion (RM3): Pseudo-relevance feedback for improved recall
- Cross-Encoder Reranking: Neural reranking for improved precision
- Graph-RAG: Neo4j-based code graph expansion for cross-file reasoning
- Submodular Packing: Diversity-aware context selection
- Citation Verification: Strict/loose citation validation against source code
- Developer Brief: Structured output with strategy, evidence, and citations
- Interactive Selection: Select repos/models interactively (e.g., "1-3,5" or "all")
- RQ Presets: Pre-configured experiments for research questions
- Progress Tracking: Real-time progress with ETA estimates
- Verbose Logging: Comprehensive colored output showing what's happening
-
Python 3.10+
-
LM Studio running at
http://localhost:1234/v1(OpenAI-compatible). -
Optional Neo4j 5 (for Graph-RAG):
docker run -d -p 7474:7474 -p 7687:7687 \ -e NEO4J_AUTH=neo4j/VeryStrongPass123 neo4j:5
# Create and activate virtual environment
python -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt# Clone sample repos (Flask + Werkzeug)
python driver.py --faiss \
--base-url http://localhost:1234/v1 --api-key lm-studio \
--model llama-3-groq-8b-tool-useOr manually:
# Create staging directory and clone repos
mkdir -p third_party/preset_fw
git clone --depth 1 https://github.com/pallets/flask third_party/preset_fw/flask
git clone --depth 1 https://github.com/pallets/werkzeug third_party/preset_fw/werkzeug
# Build index
python cc_cli.py index --repo third_party/preset_fw --out fw_index.json
# Create dense embeddings
python cc_cli.py embed --index fw_index.json --out fw_dense.pkl
# Build FAISS index
python cc_cli.py faiss-build --dense fw_dense.pkl --out-dir fw_index_faisspython cc_cli.py graph-load \
--index fw_index.json \
--repo third_party/preset_fw \
--neo4j-uri bolt://localhost:7687 \
--neo4j-user neo4j --neo4j-pass VeryStrongPass123 \
--wipeBasic Query (Hybrid Retrieval):
python cc_cli.py ask \
--index fw_index.json --dense fw_dense.pkl --faiss-dir fw_index_faiss \
--retrieval hybrid \
--question "Where are HTTP method checks performed before routing? Cite [path.py:start-end]." \
--base-url http://localhost:1234/v1 --api-key lm-studio \
--model llama-3-groq-8b-tool-use --verify-citations strictWith Developer Brief and Auto-Citation:
python cc_cli.py ask \
--index fw_index.json --dense fw_dense.pkl --faiss-dir fw_index_faiss \
--retrieval hybrid --alpha 0.35 --beta 0.65 \
--k 24 --per-chunk-lines 80 --max-context-chars 10000 \
--strict-style --dev-brief --auto-cite-first \
--llm-timeout 90 --verbose --show-top 8 \
--question "Where are HTTP method checks handled before routing? Brief + bullets, then CITATION line." \
--base-url http://localhost:1234/v1 --api-key lm-studio \
--model llama-3-groq-8b-tool-use --verify-citations strictWith Path and Function Filters:
python cc_cli.py ask \
--index fw_index.json --dense fw_dense.pkl --faiss-dir fw_index_faiss \
--retrieval hybrid --alpha 0.35 --beta 0.65 \
--k 24 --per-chunk-lines 100 --max-context-chars 11000 \
--path-filter "werkzeug/src/werkzeug/utils.py$" \
--function-filter "(?i)append_slash_redirect" \
--strict-style --dev-brief --auto-cite-first --verbose \
--question "Show the exact lines that construct a trailing-slash canonical redirect (method-preserving). Brief + bullets, then CITATION." \
--base-url http://localhost:1234/v1 --api-key lm-studio \
--model llama-3-groq-8b-tool-use --verify-citations strictWith Graph Expansion:
python cc_cli.py ask \
--index fw_index.json --dense fw_dense.pkl --faiss-dir fw_index_faiss \
--retrieval hybrid --alpha 0.45 --beta 0.55 \
--k 28 --per-chunk-lines 80 --max-context-chars 11000 \
--graph-expand --graph-hops 1 --graph-seeds 4 --graph-neighbors 8 \
--graph-bonus 0.25 --graph-decay 0.6 \
--graph-timeout 3 --graph-exclude-regex "/tests?/|^tests?/|/test_|/docs?/|\\.rst$|/examples?/" \
--strict-style --dev-brief --auto-cite-first --verbose --show-top 10 \
--llm-timeout 90 \
--question "Where are HTTP method checks handled before routing? Include a brief + bullets, then CITATION line." \
--base-url http://localhost:1234/v1 --api-key lm-studio \
--model llama-3-groq-8b-tool-use --verify-citations strictWith RM3 Query Expansion:
python cc_cli.py ask \
--index fw_index.json --dense fw_dense.pkl --faiss-dir fw_index_faiss \
--retrieval hybrid --alpha 0.35 --beta 0.65 \
--k 24 --per-chunk-lines 80 --max-context-chars 10000 \
--qe rm3 --qe-fb-docs 10 --qe-fb-terms 10 \
--strict-style --dev-brief --auto-cite-first \
--question "Where are HTTP method checks handled before routing?" \
--base-url http://localhost:1234/v1 --api-key lm-studio \
--model llama-3-groq-8b-tool-use --verify-citations strictWith Cross-Encoder Reranking:
python cc_cli.py ask \
--index fw_index.json --dense fw_dense.pkl --faiss-dir fw_index_faiss \
--retrieval hybrid --alpha 0.35 --beta 0.65 \
--k 24 --per-chunk-lines 80 --max-context-chars 10000 \
--rerank --rerank-depth 100 \
--strict-style --dev-brief --auto-cite-first \
--question "Where are HTTP method checks handled before routing?" \
--base-url http://localhost:1234/v1 --api-key lm-studio \
--model llama-3-groq-8b-tool-use --verify-citations strictWith Greedy Packer:
python cc_cli.py ask \
--index fw_index.json --dense fw_dense.pkl --faiss-dir fw_index_faiss \
--retrieval hybrid \
--packer greedy --per-chunk-lines 60 --max-context-chars 9000 \
--strict-style --dev-brief --auto-cite-first \
--question "Where are HTTP method checks handled before routing?" \
--base-url http://localhost:1234/v1 --api-key lm-studio \
--model llama-3-groq-8b-tool-use --verify-citations strictpython run_experiment.py --list-rqsSelect repos and models interactively:
# RQ1: Hybrid vs Sparse vs Dense
python run_experiment.py --rq RQ1 --interactive \
--repos-file repos_500.txt --models-file models_lmstudio_10.txt \
--per-repo-questions suites/flask_qas.jsonl \
--workdir work_rq1 --base-url http://localhost:1234/v1 --api-key lm-studio
# RQ2: Graph-RAG Lift
python run_experiment.py --rq RQ2 --interactive \
--repos-file repos_500.txt --models-file models_lmstudio_10.txt \
--per-repo-questions suites/flask_qas.jsonl \
--workdir work_rq2 --base-url http://localhost:1234/v1 --api-key lm-studio
# RQ3: Packer & Budget Sensitivity
python run_experiment.py --rq RQ3 --interactive \
--repos-file repos_500.txt --models-file models_lmstudio_10.txt \
--per-repo-questions suites/flask_qas.jsonl \
--workdir work_rq3 --base-url http://localhost:1234/v1 --api-key lm-studio
# RQ4: Model Comparisons
python run_experiment.py --rq RQ4 --interactive \
--repos-file repos_500.txt --models-file models_lmstudio_10.txt \
--per-repo-questions suites/flask_qas.jsonl \
--workdir work_rq4 --base-url http://localhost:1234/v1 --api-key lm-studio
# RQ5: α/β & k Sweeps
python run_experiment.py --rq RQ5 --interactive \
--repos-file repos_500.txt --models-file models_lmstudio_10.txt \
--per-repo-questions suites/flask_qas.jsonl \
--workdir work_rq5 --base-url http://localhost:1234/v1 --api-key lm-studio
# RQ6: RM3 & Re-rank Ablation
python run_experiment.py --rq RQ6 --interactive \
--repos-file repos_500.txt --models-file models_lmstudio_10.txt \
--per-repo-questions suites/flask_qas.jsonl \
--workdir work_rq6 --base-url http://localhost:1234/v1 --api-key lm-studio
# RQ7: Evaluation Suite
python run_experiment.py --rq RQ7 --interactive \
--repos-file repos_500.txt --models-file models_lmstudio_10.txt \
--per-repo-questions suites/flask_qas.jsonl \
--workdir work_rq7 --base-url http://localhost:1234/v1 --api-key lm-studioWhen prompted, you can select:
all- Use all items1-3,5- Range plus individual (repos 1,2,3 and 5)1,2,3- Specific items- Press Enter for default selection
Run with minimal preset defaults (faster):
# RQ1 Quick: 3 repos, 1 model
python run_experiment.py --rq RQ1 --quick \
--repos-file repos_500.txt --per-repo-questions suites/flask_qas.jsonl \
--workdir work_rq1 --base-url http://localhost:1234/v1 --api-key lm-studio
# RQ2 Quick: 2 repos, 1 model
python run_experiment.py --rq RQ2 --quick \
--repos-file repos_500.txt --per-repo-questions suites/flask_qas.jsonl \
--workdir work_rq2 --base-url http://localhost:1234/v1 --api-key lm-studio
# RQ3 Quick: 2 repos, 1 model
python run_experiment.py --rq RQ3 --quick \
--repos-file repos_500.txt --per-repo-questions suites/flask_qas.jsonl \
--workdir work_rq3 --base-url http://localhost:1234/v1 --api-key lm-studio
# RQ4 Quick: 1 repo, all models
python run_experiment.py --rq RQ4 --quick \
--repos-file repos_500.txt --models-file models_lmstudio_10.txt \
--per-repo-questions suites/flask_qas.jsonl \
--workdir work_rq4 --base-url http://localhost:1234/v1 --api-key lm-studio
# RQ5 Quick: 2 repos, 1 model
python run_experiment.py --rq RQ5 --quick \
--repos-file repos_500.txt --per-repo-questions suites/flask_qas.jsonl \
--workdir work_rq5 --base-url http://localhost:1234/v1 --api-key lm-studio
# RQ6 Quick: 2 repos, 1 model
python run_experiment.py --rq RQ6 --quick \
--repos-file repos_500.txt --per-repo-questions suites/flask_qas.jsonl \
--workdir work_rq6 --base-url http://localhost:1234/v1 --api-key lm-studio
# RQ7 Quick: 1 repo, 1 model
python run_experiment.py --rq RQ7 --quick \
--repos-file repos_500.txt --per-repo-questions suites/flask_qas.jsonl \
--workdir work_rq7 --base-url http://localhost:1234/v1 --api-key lm-studio| RQ | Name | Description | Runs |
|---|---|---|---|
| RQ1 | Hybrid vs Sparse vs Dense | Compare retrieval modes | 3 |
| RQ2 | Graph-RAG Lift | With/without graph expansion | 2 |
| RQ3 | Packer & Budget | Packing strategies + context sizes | 4 |
| RQ4 | Model Comparisons | Compare LLM models | 1 |
| RQ5 | α/β & k Sweeps | Sensitivity analysis | 5 |
| RQ6 | RM3 & Re-rank Ablation | Query expansion + reranking | 4 |
| RQ7 | Auto vs Curated | Question suite comparison | 1 |
# Sparse retrieval
python run_experiment.py --retrieval sparse --faiss --rerank --qe none --graph \
--repos-file repos_500.txt --models-file models_lmstudio_10.txt \
--per-repo-questions suites/flask_qas.jsonl \
--workdir work_sparse --base-url http://localhost:1234/v1 --api-key lm-studio
# Dense retrieval
python run_experiment.py --retrieval dense --faiss --rerank --qe none --graph \
--repos-file repos_500.txt --models-file models_lmstudio_10.txt \
--per-repo-questions suites/flask_qas.jsonl \
--workdir work_dense --base-url http://localhost:1234/v1 --api-key lm-studio
# Hybrid retrieval
python run_experiment.py --retrieval hybrid --faiss --rerank --qe none --graph \
--repos-file repos_500.txt --models-file models_lmstudio_10.txt \
--per-repo-questions suites/flask_qas.jsonl \
--workdir work_hybrid --base-url http://localhost:1234/v1 --api-key lm-studio# With graph expansion
python run_experiment.py --graph --graph-bonus 0.2 --graph-hops 2 --graph-decay 0.6 \
--retrieval hybrid --faiss --rerank --qe none \
--repos-file repos_500.txt --models-file models_lmstudio_10.txt \
--per-repo-questions suites/flask_qas.jsonl \
--workdir work_graph --base-url http://localhost:1234/v1 --api-key lm-studio
# Without graph expansion
python run_experiment.py --retrieval hybrid --faiss --rerank --qe none \
--repos-file repos_500.txt --models-file models_lmstudio_10.txt \
--per-repo-questions suites/flask_qas.jsonl \
--workdir work_nograph --base-url http://localhost:1234/v1 --api-key lm-studio# Greedy packer with small budget
python run_experiment.py --retrieval hybrid --faiss --rerank --qe none --graph \
--packer greedy --per-chunk-lines 40 --max-context-chars 7000 \
--repos-file repos_500.txt --models-file models_lmstudio_10.txt \
--per-repo-questions suites/flask_qas.jsonl \
--workdir work_greedy_40_7k --base-url http://localhost:1234/v1 --api-key lm-studio
# Submodular packer with larger budget
python run_experiment.py --retrieval hybrid --faiss --rerank --qe none --graph \
--packer submodular --per-chunk-lines 80 --max-context-chars 12000 \
--repos-file repos_500.txt --models-file models_lmstudio_10.txt \
--per-repo-questions suites/flask_qas.jsonl \
--workdir work_submod_80_12k --base-url http://localhost:1234/v1 --api-key lm-studiopython run_experiment.py --retrieval hybrid --faiss --rerank --qe none --graph \
--repos-file repos_500.txt --models-file models_lmstudio_10.txt \
--per-repo-questions suites/flask_qas.jsonl \
--workdir work_models --base-url http://localhost:1234/v1 --api-key lm-studio# Run grid from experiments.yaml
python run_grid.py --config experiments.yaml
# Aggregate results
python aggregate_results.py --workdir work --out-csv all_sensitivity.csv# No QE, no rerank (baseline)
python run_experiment.py --retrieval hybrid --faiss --graph --qe none \
--repos-file repos_500.txt --models-file models_lmstudio_10.txt \
--per-repo-questions suites/flask_qas.jsonl \
--workdir work_noqe_norerank --base-url http://localhost:1234/v1 --api-key lm-studio
# With RM3 query expansion only
python run_experiment.py --retrieval hybrid --faiss --graph --qe rm3 \
--repos-file repos_500.txt --models-file models_lmstudio_10.txt \
--per-repo-questions suites/flask_qas.jsonl \
--workdir work_qe --base-url http://localhost:1234/v1 --api-key lm-studio
# With reranking only
python run_experiment.py --retrieval hybrid --faiss --graph --rerank --qe none \
--repos-file repos_500.txt --models-file models_lmstudio_10.txt \
--per-repo-questions suites/flask_qas.jsonl \
--workdir work_rerank --base-url http://localhost:1234/v1 --api-key lm-studio
# With both RM3 and reranking
python run_experiment.py --retrieval hybrid --faiss --graph --qe rm3 --rerank \
--repos-file repos_500.txt --models-file models_lmstudio_10.txt \
--per-repo-questions suites/flask_qas.jsonl \
--workdir work_qe_rerank --base-url http://localhost:1234/v1 --api-key lm-studiopython eval_plus.py --index fw_index.json --suite suites/flask_qas.jsonl \
--retrieval hybrid --use-faiss --faiss-dir fw_index_faiss --use-dense --dense fw_dense.pkl \
--base-url http://localhost:1234/v1 --api-key lm-studio --model llama-3-groq-8b-tool-use \
--out-csv curated_vs_auto.csv# Aggregate results from all runs
python aggregate_results.py --workdir work --out-csv all_results.csv
# Generate appendix with methods and hyperparameters
python generate_appendix.py \
--repos-file repos_500.txt --models-file models_lmstudio_10.txt \
--experiments-config experiments.yaml --out appendix.md
# Generate plots
python plots.py --csv work/all_results.csv --out figs/plot.png| Command | Description |
|---|---|
index |
Build an index.json from a repo |
embed |
Create dense vectors for index |
faiss-build |
Build FAISS index from dense.pkl |
ask |
Ask a question against the index |
graph-load |
Load a Neo4j code graph from index + repo |
| Argument | Description | Default |
|---|---|---|
--retrieval |
sparse, dense, or hybrid | hybrid |
--alpha |
Weight for sparse in hybrid | 0.3 |
--beta |
Weight for dense in hybrid | 0.7 |
--k |
Candidates to retrieve | 10 |
--packer |
submodular or greedy | submodular |
--qe |
Query expansion: none or rm3 | none |
--rerank |
Enable cross-encoder reranking | false |
--graph-expand |
Enable Neo4j graph expansion | false |
--strict-style |
Strict system prompt for citations | false |
--dev-brief |
Print developer brief | false |
--auto-cite-first |
Auto-append citation if missing | false |
| Argument | Description | Default |
|---|---|---|
--repos-file |
File with repository URLs | - |
--models-file |
File with model names | - |
--workdir |
Output directory | - |
--retrieval |
sparse, dense, or hybrid | hybrid |
--qe |
Query expansion: none or rm3 | none |
--packer |
submodular or greedy | submodular |
--rerank |
Enable cross-encoder reranking | false |
--graph |
Enable graph expansion | false |
--faiss |
Enable FAISS indexing | false |
.
├── cc_cli.py # Main CLI tool
├── cc_index.py # Code indexing
├── cc_graph.py # Neo4j graph loading
├── run_experiment.py # Experiment runner (direct + YAML modes)
├── run_experiments.py # Wrapper for run_experiment.py
├── run_grid.py # Grid search runner
├── run_grid.sh # Shell-based grid runner
├── eval_plus.py # Evaluation suite runner
├── aggregate_results.py # Results aggregation
├── generate_questions.py # Question generation
├── generate_appendix.py # Appendix generation
├── plots.py # Visualization
├── multi_index.py # Multi-repo indexing
├── driver.py # Quick-start driver
├── experiments.yaml # Grid configuration
├── repos_500.txt # Repository list
├── models_lmstudio_10.txt # Model list
├── requirements.txt # Dependencies
├── suites/ # Question suites
│ └── flask_qas.jsonl
└── README.md
- Tighten with
--path-filterand/or--function-filter - Add
--auto-cite-first - Increase
--kfor more candidates
- Set
--graph-timeout 3 - Reduce
--graph-neighbors - Add
--graph-exclude-regex "/tests?/|/docs?/|/examples?/"
- Use
--llm-timeout 120 - Confirm LM Studio model context length and server health
export TOKENIZERS_PARALLELISM=false- Ensure Neo4j is running:
neo4j start - Check credentials in
--neo4j-uri,--neo4j-user,--neo4j-pass
MIT License