langchain-cachly

Cachly semantic cache for LangChain — fleet-wide LLM response caching by meaning, not by exact string match. Zero extra dependencies.

The problem

Your users ask the same question dozens of different ways. Without semantic caching, every rephrasing hits your LLM and costs money. RedisCache only catches exact duplicates. RedisSemanticCache requires a local embedding model per instance.

langchain-cachly uses server-side similarity search — one managed service, shared across your whole fleet.

Quick start

pip install langchain-cachly

import langchain
from langchain_cachly import CachlySemanticCache

langchain.llm_cache = CachlySemanticCache(
    vector_url="https://api.cachly.dev/v1/sem/YOUR_VECTOR_TOKEN",
    threshold=0.92,   # cosine similarity 0–1; 0.92 recommended
    ttl=86400,        # seconds; default 24 h
)

from langchain_openai import ChatOpenAI
llm = ChatOpenAI(model="gpt-4o")

llm.predict("What is semantic caching?")           # → LLM called, answer cached
llm.predict("Can you explain semantic caching?")   # → cache HIT, $0 LLM cost

Get your CACHLY_VECTOR_URL from cachly.dev — free tier available.

Why CachlySemanticCache?

Feature	RedisCache	RedisSemanticCache	CachlySemanticCache
Match type	Exact	Per-user embedding	Cluster-wide semantic
Infra needed	Redis	Redis + embedding model	None (managed)
Embeddings	❌	Local model	Managed (server-side)
Cross-instance sharing	❌	❌	✅
Extra dependencies	`redis`	`redis` + embedding lib	None (stdlib only)
Cold start	Fast	Slow (model load)	Fast

Configuration

Parameter	Type	Default	Description
`vector_url`	`str`	required	Full Cachly semantic URL incl. token
`threshold`	`float`	`0.92`	Min cosine similarity for a cache hit
`ttl`	`int`	`86400`	TTL in seconds (24 h)

Async support

CachlySemanticCache implements the full BaseCache interface including alookup, aupdate, and aclear via thread-pool delegation — compatible with langchain's async chains out of the box.

Fail-open design

Network errors and HTTP failures return None (cache miss) and are logged at WARNING level. The cache never raises in the hot path — your LLM calls always complete even if cachly is temporarily unreachable.

Links

cachly.dev — Free signup, dashboard, pricing
Docs — Full integration docs
GitHub

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
dist		dist
langchain_cachly		langchain_cachly
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

langchain-cachly

The problem

Quick start

Why CachlySemanticCache?

Configuration

Async support

Fail-open design

Links

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

langchain-cachly

The problem

Quick start

Why CachlySemanticCache?

Configuration

Async support

Fail-open design

Links

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages