Skip to content

cachly-dev/cachly-langchain

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 

Repository files navigation

langchain-cachly

Cachly semantic cache for LangChain — fleet-wide LLM response caching by meaning, not by exact string match. Zero extra dependencies.

PyPI Python 3.10+ License: MIT


The problem

Your users ask the same question dozens of different ways. Without semantic caching, every rephrasing hits your LLM and costs money. RedisCache only catches exact duplicates. RedisSemanticCache requires a local embedding model per instance.

langchain-cachly uses server-side similarity search — one managed service, shared across your whole fleet.

Quick start

pip install langchain-cachly
import langchain
from langchain_cachly import CachlySemanticCache

langchain.llm_cache = CachlySemanticCache(
    vector_url="https://api.cachly.dev/v1/sem/YOUR_VECTOR_TOKEN",
    threshold=0.92,   # cosine similarity 0–1; 0.92 recommended
    ttl=86400,        # seconds; default 24 h
)

from langchain_openai import ChatOpenAI
llm = ChatOpenAI(model="gpt-4o")

llm.predict("What is semantic caching?")           # → LLM called, answer cached
llm.predict("Can you explain semantic caching?")   # → cache HIT, $0 LLM cost

Get your CACHLY_VECTOR_URL from cachly.dev — free tier available.

Why CachlySemanticCache?

Feature RedisCache RedisSemanticCache CachlySemanticCache
Match type Exact Per-user embedding Cluster-wide semantic
Infra needed Redis Redis + embedding model None (managed)
Embeddings Local model Managed (server-side)
Cross-instance sharing
Extra dependencies redis redis + embedding lib None (stdlib only)
Cold start Fast Slow (model load) Fast

Configuration

Parameter Type Default Description
vector_url str required Full Cachly semantic URL incl. token
threshold float 0.92 Min cosine similarity for a cache hit
ttl int 86400 TTL in seconds (24 h)

Async support

CachlySemanticCache implements the full BaseCache interface including alookup, aupdate, and aclear via thread-pool delegation — compatible with langchain's async chains out of the box.

Fail-open design

Network errors and HTTP failures return None (cache miss) and are logged at WARNING level. The cache never raises in the hot path — your LLM calls always complete even if cachly is temporarily unreachable.

Links

About

Cachly semantic cache integration for LangChain — zero dependencies, fleet-wide cache hits

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages