Cachly semantic cache for LangChain — fleet-wide LLM response caching by meaning, not by exact string match. Zero extra dependencies.
Your users ask the same question dozens of different ways. Without semantic caching, every rephrasing hits your LLM and costs money. RedisCache only catches exact duplicates. RedisSemanticCache requires a local embedding model per instance.
langchain-cachly uses server-side similarity search — one managed service, shared across your whole fleet.
pip install langchain-cachlyimport langchain
from langchain_cachly import CachlySemanticCache
langchain.llm_cache = CachlySemanticCache(
vector_url="https://api.cachly.dev/v1/sem/YOUR_VECTOR_TOKEN",
threshold=0.92, # cosine similarity 0–1; 0.92 recommended
ttl=86400, # seconds; default 24 h
)
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(model="gpt-4o")
llm.predict("What is semantic caching?") # → LLM called, answer cached
llm.predict("Can you explain semantic caching?") # → cache HIT, $0 LLM costGet your CACHLY_VECTOR_URL from cachly.dev — free tier available.
| Feature | RedisCache | RedisSemanticCache | CachlySemanticCache |
|---|---|---|---|
| Match type | Exact | Per-user embedding | Cluster-wide semantic |
| Infra needed | Redis | Redis + embedding model | None (managed) |
| Embeddings | ❌ | Local model | Managed (server-side) |
| Cross-instance sharing | ❌ | ❌ | ✅ |
| Extra dependencies | redis |
redis + embedding lib |
None (stdlib only) |
| Cold start | Fast | Slow (model load) | Fast |
| Parameter | Type | Default | Description |
|---|---|---|---|
vector_url |
str |
required | Full Cachly semantic URL incl. token |
threshold |
float |
0.92 |
Min cosine similarity for a cache hit |
ttl |
int |
86400 |
TTL in seconds (24 h) |
CachlySemanticCache implements the full BaseCache interface including alookup, aupdate, and aclear via thread-pool delegation — compatible with langchain's async chains out of the box.
Network errors and HTTP failures return None (cache miss) and are logged at WARNING level. The cache never raises in the hot path — your LLM calls always complete even if cachly is temporarily unreachable.
- cachly.dev — Free signup, dashboard, pricing
- Docs — Full integration docs
- GitHub