redis-applied-ai
diff --git a/‎.gitignore
Lines changed: 3 additions & 1 deletion b/‎.gitignore
Lines changed: 3 additions & 1 deletion
diff --git a/‎Makefile
Lines changed: 35 additions & 0 deletions b/‎Makefile
Lines changed: 35 additions & 0 deletions
diff --git a/‎README.md
Lines changed: 170 additions & 20 deletions b/‎README.md
Lines changed: 170 additions & 20 deletions
diff --git a/‎docs/api/threshold_optimizer.rst
Lines changed: 26 additions & 0 deletions b/‎docs/api/threshold_optimizer.rst
Lines changed: 26 additions & 0 deletions
@@ -4,4 +4,6 @@ models/
 .python-version
 __pycache__/
 .env
-dist/
+dist/
+.pytest_cache
+.mypy_cache
@@ -0,0 +1,35 @@
+.PHONY: install format lint test clean redis-start redis-stop check-types check
+
+install:
+	poetry install --all-extras
+
+redis-start:
+	docker run -d --name redis-stack -p 6379:6379 -p 8001:8001 redis/redis-stack:latest
+
+redis-stop:
+	docker stop redis-stack
+
+format:
+	poetry run format
+	poetry run sort-imports
+
+check-types:
+	poetry run check-mypy
+
+lint: format check-types
+	
+test:
+	poetry run test
+
+check: lint test
+
+clean:
+	find . -type d -name "__pycache__" -exec rm -rf {} +
+	find . -type d -name ".pytest_cache" -exec rm -rf {} +
+	find . -type d -name ".mypy_cache" -exec rm -rf {} +
+	find . -type d -name ".coverage" -delete
+	find . -type d -name "htmlcov" -exec rm -rf {} +
+	find . -type d -name "dist" -exec rm -rf {} +
+	find . -type d -name "build" -exec rm -rf {} +
+	find . -type d -name "*.egg-info" -exec rm -rf {} +
+	find . -type d -name "_build" -exec rm -rf {} +
@@ -1,25 +1,66 @@
 <div align="center">
-<div><img src="https://raw.githubusercontent.com/redis/redis-vl-python/main/docs/_static/Redis_Logo_Red_RGB.svg" style="width: 130px"> </div>
+
+<img src="https://raw.githubusercontent.com/redis/redis-vl-python/main/docs/_static/Redis_Logo_Red_RGB.svg" width="130">
+
+# 🚀 Redis Retrieval Optimizer
+
 
 [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
 ![Language](https://img.shields.io/github/languages/top/redis-applied-ai/retrieval-optimizer)
 ![GitHub last commit](https://img.shields.io/github/last-commit/redis-applied-ai/retrieval-optimizer)
 
+---
+
+**Stop guessing. Start measuring.** 📊
+
+Transform your retrieval system from *"looks good to me"* to *"proven to perform"* with systematic benchmarking and Bayesian optimization.
+
 </div>
 
-# Retrieval Optimizer
+## ✨ What is Redis Retrieval Optimizer?
 
-The **Redis Retrieval Optimizer** is a framework for systematically measuring and improving retrieval performance for vector and hybrid search. The framework helps you select the best combination of embedding model, index type, and query settings for your specific use case.
+The **Retrieval Optimizer** is a powerful framework that takes the guesswork out of building high-performance search systems. Whether you're working with vector search, hybrid retrieval, or traditional text search, this tool helps you **scientifically measure and optimize** your search performance.
 
-To use the Retrieval Optimizer, you start with a labeled data set consisting of a corpus of texts, a set of natural language questions, and a collection of labels. You also define a set of search methods and embedding models to test against.
+### 🎯 Why You Need This
 
-The Retrieval Optimizer then lets you evaluate critical tradeoffs between **cost, speed, and latency**, helping you understand how different embedding models, retrieval strategies, and index configurations impact overall system performance. The tool's **Bayesian optimization** mode lets you fine-tune these index configurations. Ultimately, the tools let you implement **metrics-driven development** for your search applications — ensuring that decisions are grounded in data, not assumptions.
+- **🔬 Evidence-Based Decisions**: Replace assumptions with hard data about what actually works for your use case
+- **⚡ Performance Optimization**: Find the perfect balance between speed, accuracy, and cost
+- **🤖 Smart Automation**: Use Bayesian optimization to intelligently explore the configuration space
+- **📈 Comprehensive Metrics**: Track everything from latency to relevance with industry-standard evaluation metrics
+- **🔧 Easy Integration**: Works with your existing Redis setup and popular embedding models
 
-# Installation
+### 🚀 Key Features
 
-`pip install redis-retrieval-optimizer` | [pypi](https://pypi.org/project/redis-retrieval-optimizer/)
+| Feature | Description |
+|---------|-------------|
+| **🔄 Grid Search** | Systematically test different embedding models and retrieval strategies |
+| **🧠 Bayesian Optimization** | Intelligently fine-tune index configurations for optimal performance |
+| **📊 Rich Metrics** | Track recall, precision, NDCG, F1, query time, and indexing performance |
+| **🔍 Multiple Search Methods** | Compare BM25, vector search, hybrid, reranking, and weighted RRF |
+| **🎛️ Threshold Optimization** | Automatically tune semantic cache and router thresholds |
+| **🔌 Extensible Design** | Easily add custom search methods and data processors |
 
-# Example notebooks
+
+### 🎪 How It Works
+
+1. **📁 Prepare Your Data**: Bring your corpus, queries, and relevance labels
+2. **⚙️ Define Your Tests**: Choose embedding models and search methods to compare
+3. **🔬 Run Experiments**: Let the optimizer systematically test configurations
+4. **📈 Analyze Results**: Get detailed performance metrics and recommendations
+5. **🚀 Deploy Optimized Config**: Use the best-performing setup in production
+
+**The result?** A search system backed by data, not hunches.
+
+
+## 📦 Installation
+
+Install the optimizer from [PyPI](https://pypi.org/project/redis-retrieval-optimizer/):
+
+```bash
+pip install redis-retrieval-optimizer
+```
+
+## 📚 Example Notebooks
 
 For complete code examples, see the following notebooks:
 
@@ -30,7 +71,9 @@ For complete code examples, see the following notebooks:
 | Bayesian Optimization | [00_bayes_study.ipynb](https://github.com/redis-applied-ai/redis-retrieval-optimizer/blob/main/docs/examples/bayesian_optimization/00_bayes_study.ipynb) |
 | Embedding model comparison | [00_comparison.ipynb](https://github.com/redis-applied-ai/redis-retrieval-optimizer/blob/main/docs/examples/comparison/00_comparison.ipynb) |
 
-# Quick start
+---
+
+## 🚀 Quick Start
 
 The Retrieval Optimizer supports two *study* types: **Grid** and **Bayesian Optimization**. Each is suited to a different stage of building a high-quality search system.
 
@@ -44,7 +87,7 @@ Once you've identified a solid starting point, use Bayesian optimization to **fi
 
 ## Running a Grid study
 
-#### Define study config
+#### Study config
 ```yaml
 # paths to necessary data files
 corpus: "data/nfcorpus_corpus.json"
@@ -105,7 +148,7 @@ metrics = run_grid_study(
 ## Running a Bayesian optimization
 Selects the next best configuration to try based on a heuristic. This is good when it would take a very long time to test all possible configurations.
 
-#### Study config:
+#### Study config
 ```yaml
 # path to data files for easy read
 corpus: "data/nfcorpus_corpus.json"
@@ -194,11 +237,11 @@ metrics = run_bayes_study(
 | vector        | hnsw      | float32          | 100              | 50         | 8  | 0.002346       | 3.088                | 0.126233|
 | vector        | hnsw      | float32          | 100              | 50         | 16 | 0.001478       | 1.896                | 0.116203|
 
+---
 
+## 🔍 Search Methods
 
-# Search methods
-
-Below is a comprehensive table documenting the built-in search methods available in the Redis Retrieval Optimizer:
+Below is a comprehensive table documenting the built-in search methods available in the Retrieval Optimizer:
 
 | Method | Description | Use Case | Key Features |
 |--------|-------------|----------|--------------|
@@ -232,7 +275,113 @@ CUSTOM_SEARCH_METHOD_MAP = {
 }
 ```
 
-## Custom processors and search methods
+## 🎛️ Threshold Optimization
+
+The Retrieval Optimizer includes threshold optimization capabilities for RedisVL's **Semantic Cache** and **Semantic Router**. This feature helps you automatically tune distance thresholds to maximize performance metrics like F1 score, precision, or recall.
+
+### Cache Threshold Optimization
+
+Optimize thresholds for semantic caches to improve cache hit rates and relevance:
+
+```python
+from redis_retrieval_optimizer.threshold_optimization import CacheThresholdOptimizer
+from redisvl.extensions.cache.llm import SemanticCache
+
+# Create a semantic cache
+cache = SemanticCache(
+    name="my_cache",
+    redis_url="redis://localhost:6379",
+    distance_threshold=0.5  # Initial threshold
+)
+
+# Add some data to the cache
+paris_key = cache.store(
+    prompt="what is the capital of france?", 
+    response="paris"
+)
+
+# Define test data for optimization
+test_data = [
+    {
+        "query": "What's the capital of France??",
+        "query_match": paris_key  # Expected cache hit
+    },
+    {
+        "query": "What's the capital of Britain?", 
+        "query_match": ""  # Expected cache miss
+    }
+]
+
+# Optimize the threshold
+optimizer = CacheThresholdOptimizer(cache, test_data)
+optimizer.optimize()
+
+print(f"Optimized threshold: {cache.distance_threshold}")
+```
+
+### Router Threshold Optimization
+
+Optimize thresholds for semantic routers to improve routing accuracy:
+
+```python
+from redisvl.extensions.router import Route, SemanticRouter
+from redisvl.utils.vectorize import HFTextVectorizer
+from redis_retrieval_optimizer.threshold_optimization import RouterThresholdOptimizer
+
+# Define routes
+routes = [
+    Route(
+        name="greeting",
+        references=["hello", "hi"],
+        metadata={"type": "greeting"},
+        distance_threshold=0.5,
+    ),
+    Route(
+        name="farewell", 
+        references=["bye", "goodbye"],
+        metadata={"type": "farewell"},
+        distance_threshold=0.5,
+    ),
+]
+
+# Create router
+router = SemanticRouter(
+    name="my-router",
+    vectorizer=HFTextVectorizer(),
+    routes=routes,
+    redis_url="redis://localhost:6379"
+)
+
+# Define test data
+test_data = [
+    {"query": "hello there", "query_match": "greeting"},
+    {"query": "goodbye", "query_match": "farewell"},
+    {"query": "hola", "query_match": "greeting"},  # Spanish
+]
+
+# Optimize route thresholds
+optimizer = RouterThresholdOptimizer(router, test_data)
+optimizer.optimize(max_iterations=20, search_step=0.1)
+
+print(f"Optimized thresholds: {router.route_thresholds}")
+```
+
+### Evaluation Metrics
+
+Threshold optimization supports multiple evaluation metrics:
+
+```python
+from redis_retrieval_optimizer.threshold_optimization import EvalMetric
+
+# Available metrics
+optimizer = CacheThresholdOptimizer(cache, test_data, eval_metric="f1")       # F1 score (default)
+optimizer = CacheThresholdOptimizer(cache, test_data, eval_metric="precision") # Precision
+optimizer = CacheThresholdOptimizer(cache, test_data, eval_metric="recall")    # Recall
+```
+
+For complete documentation and examples, see [docs/examples/threshold_optimization](docs/examples/threshold_optimization/).
+
+## 🔧 Custom Processors and Search Methods
 
 The Retrieval Optimizer is designed to be flexible and extensible. You can define your own **corpus processors** and **search methods** to support different data formats and retrieval techniques. This is especially useful when working with domain-specific data or testing out experimental search strategies.
 
@@ -340,7 +489,7 @@ def process_car_corpus(corpus, emb_model):
 
 ### Running the custom study
 
-Once you’ve defined your search methods and processor, pass them into the study runner:
+Once you've defined your search methods and processor, pass them into the study runner:
 
 ```python
 from redis_retrieval_optimizer.grid_study import run_grid_study
@@ -366,15 +515,15 @@ metrics = run_grid_study(
 | basic_vector      | sentence-transformers/all-MiniLM-L6-v2     | 0.002605       | 0.9       | 0.23      | 0.717676 |
 
 
-## Data requirements
+## 📊 Data Requirements
 
 To run a retrieval study, you need three key datasets: **queries**, **corpus**, and **qrels**. The framework is flexible—data can be in any shape as long as you provide custom processors to interpret it. But if you're just getting started, here's the expected format and some working examples to guide you.
 
 ---
 
 ### Corpus
 
-This is the full set of documents you'll be searching against. It’s what gets indexed into Redis. The default assumption is that each document has a `text` field to search or embed, but you can customize this using a corpus processor.
+This is the full set of documents you'll be searching against. It's what gets indexed into Redis. The default assumption is that each document has a `text` field to search or embed, but you can customize this using a corpus processor.
 
 **General structure**:
 
@@ -423,7 +572,7 @@ These are the search inputs you'll evaluate against the corpus. Each query consi
 }
 ```
 
-> 💡 Using custom query metadata? That’s fine—just make sure your custom search method knows how to interpret it.
+> 💡 Using custom query metadata? That's fine—just make sure your custom search method knows how to interpret it.
 
 ---
 
@@ -461,7 +610,8 @@ Qrels define the relevance of documents to each query. They are required for eva
 
 > 🔍 Note: Relevance scores can be binary (`1` or `0`) for classification metrics or ranked (`2`, `1`, etc.) for ranking metrics like NDCG.
 
-# Contributing
+## 🤝 Contributing
+
 We love contributors if you have an addition follow this process:
 - Fork the repo
 - Make contribution
 
@@ -0,0 +1,26 @@
+********************
+Threshold Optimizers
+********************
+
+CacheThresholdOptimizer
+=======================
+
+.. _cachethresholdoptimizer_api:
+
+.. currentmodule:: redis_retrieval_optimizer.threshold_optimization.cache
+
+.. autoclass:: CacheThresholdOptimizer
+   :show-inheritance:
+   :members:
+
+
+RouterThresholdOptimizer
+========================
+
+.. _routerthresholdoptimizer_api:
+
+.. currentmodule:: redis_retrieval_optimizer.threshold_optimization.router
+
+.. autoclass:: RouterThresholdOptimizer
+   :show-inheritance:
+   :members: