Benchmarks 7 retrieval strategies across 4 chunking methods and 5 embedding models on S&P 500 filings. Best configuration scores 4.50 / 5.00 on LLM-as-judge evaluation.
| Rank | Chunking | Embedding | Retrieval | Score | Latency |
|---|---|---|---|---|---|
| π₯ 1 | hybrid | BGE | hybrid_07 | 0.877 | 19,236ms |
| π₯ 2 | hybrid | MiniLM | hybrid_07 | 0.865 | 11,175ms β‘ |
| π₯ 3 | recursive | BGE | hybrid_07 | 0.856 | 20,553ms |
| 4 | semantic | BGE | hybrid_05 | 0.826 | 16,666ms |
MiniLM at hybrid_07 is the best latency-performance tradeoff: 42% faster than BGE with only 1.4% score drop.
| Dimension | Score |
|---|---|
| π― Relevance | 4.00 |
| π Groundedness | 5.00 |
| π Completeness | 4.00 |
| π¬ Coherence | 5.00 |
| Overall | 4.50 / 5.00 |
FinSight RAG downloads SEC 10-K filings for S&P 500 companies, extracts Item 1A (Risk Factors), and runs them through a fully configurable chunking-embedding-retrieval pipeline. The system benchmarks 7 strategy combinations to find the optimal configuration for financial document QA, then runs end-to-end RAG with Llama 3.2 via the HuggingFace Inference API.
Results are evaluated using an LLM-as-judge scoring rubric across four quality dimensions. A Streamlit dashboard lets you query any processed ticker, inspect citations, and compare strategy performance interactively.
SEC EDGAR
β
βΌ
π₯ data_pipeline/
βββ sec_downloader.py β pulls 10-K filings via EDGAR full-text search
βββ item_extractor.py β isolates Item 1A (Risk Factors) section
βββ text_cleaner.py β normalizes whitespace, removes boilerplate
β
βΌ
βοΈ chunking/ 4 strategies
βββ fixed_chunker.py β fixed token windows (512 / 1000 tokens)
βββ semantic_chunker.py β sentence-boundary aware splits
βββ recursive_chunker.py β hierarchical splitting with overlap
βββ hybrid_chunker.py β semantic first, fixed fallback β
BEST
β
βΌ
𧬠embedding/ 5 models
βββ BGEEmbedder β BAAI/bge-small-en-v1.5 (384d) β
BEST
βββ MiniLMEmbedder β all-MiniLM-L6-v2 (384d)
βββ FinBERTEmbedder β ProsusAI/finbert (768d)
βββ E5Embedder β intfloat/e5-small-v2 (384d)
βββ vector_store.py β ChromaDB persistence
β
βΌ
π retrieval/ 3 modes
βββ dense_retriever.py β vector similarity (ChromaDB)
βββ sparse_retriever.py β BM25 keyword match
βββ hybrid_retriever.py β RRF fusion, alpha β {0.3,0.5,0.7,0.9} β
BEST: 0.7
β
βΌ
π€ rag/
βββ rag_pipeline.py β orchestrates retrieve β build_context β generate
βββ llm_client.py β Llama 3.2 via HuggingFace Inference API
βββ citation_manager.py β tracks chunk provenance in generated answers
βββ evaluator.py β latency, coverage, token metrics
β
βΌ
π app/qa_app.py β Streamlit dashboard
FinSight-RAG/
βββ π app/ # Streamlit dashboard
βββ π chunking/ # 4 chunking strategies
βββ π config/ # All strategy config files
βββ π data_pipeline/ # SEC ingestion + cleaning
βββ π embedding/ # 5 embedding models + ChromaDB
βββ π rag/ # Pipeline, LLM client, evaluator
βββ π retrieval/ # Dense, sparse, hybrid retrievers
βββ π scripts/ # Experiment runner scripts
βββ π run_focused_evaluation.py # Strategy benchmark runner
βββ π run_final_rag_evaluation.py # End-to-end RAG evaluation
βββ π visualize_evaluation.py # Chart + report generator
βββ π requirements.txt
Prerequisites: Python 3.10+, free HuggingFace account, ~5GB disk space.
# 1. Clone and install
git clone https://github.com/Darsh29/FinSight-RAG.git
cd FinSight-RAG
pip install -r requirements.txt
# 2. Set your HuggingFace token
echo "HF_TOKEN=your_token_here" > .envGet a free token at huggingface.co/settings/tokens
# 3. Download and process SEC filings
python scripts/run_data_pipeline.py
# Run for specific tickers only
python scripts/run_data_pipeline.py --tickers AAPL MSFT NVDA GOOGL
# 4. Benchmark all retrieval strategies
python run_focused_evaluation.py --max-tickers 10
# 5. Run full end-to-end RAG evaluation
python run_final_rag_evaluation.py --max-tickers 5
# 6. Generate result visualizations
python visualize_evaluation.py
# 7. Launch the interactive dashboard
streamlit run app/qa_app.pyAll strategy switches live in config/ and take effect without touching pipeline code:
# config/chunking_config.py
ACTIVE_STRATEGY = "hybrid" # fixed_512 | fixed_1000 | semantic | recursive | hybrid
# config/embedding_config.py
ACTIVE_EMBEDDING = "bge" # bge | minilm | mpnet | finbert | e5
# config/retrieval_config.py
ACTIVE_RETRIEVAL = "hybrid_07" # dense_only | sparse_only | hybrid_03 | hybrid_05 | hybrid_07 | hybrid_09The hybrid retriever uses Reciprocal Rank Fusion (RRF) to merge dense and sparse results. Alpha controls the dense-to-sparse weight ratio β
hybrid_07= 70% dense + 30% BM25.
- Hybrid chunking beats fixed windows on financial text with irregular section lengths
- BGE outperforms FinBERT despite FinBERT being finance-specific, likely due to BGE's larger training corpus
- RRF at alpha=0.7 consistently beats both dense-only and sparse-only retrieval across all chunking strategies
- MiniLM is the production choice: 42% faster than BGE with only 1.4% score drop
| Issue | Fix |
|---|---|
No cleaned data available |
Run python scripts/run_data_pipeline.py first |
HF_TOKEN not set |
Add token to .env file |
| HuggingFace API rate limits | Built-in exponential backoff handles this automatically |
| Out of memory during embedding | Reduce batch_size in config/embedding_config.py |
| ChromaDB collection not found | Re-run embedding pipeline for that strategy/model combo |
MIT License. See LICENSE for details.