Skip to content

feat: add prompt_cache_key for cross-call cache routing#653

Open
san0808 wants to merge 6 commits intomasterfrom
feat/prompt-cache-key-routing
Open

feat: add prompt_cache_key for cross-call cache routing#653
san0808 wants to merge 6 commits intomasterfrom
feat/prompt-cache-key-routing

Conversation

@san0808
Copy link
Copy Markdown
Contributor

@san0808 san0808 commented Apr 17, 2026

Summary

Adds prompt_cache_key=agent_id to all OpenAI and Azure LLM API call sites so that calls for the same agent route to the same GPU server, enabling cross-call cache hits on the system prompt.

Previously, each new call landed on a random server and started with a cold cache on the first turn. Now the first turn reuses the cached system prompt from previous calls for the same agent.

Caller-controlled opt-in: bolna sets prompt_cache_key only when agent_id is explicitly passed in kwargs. See https://github.com/bolna-ai/dashboard-backend/pull/2041 for the allowlist consumer.

What changed

File Change
task_manager.py Forward agent_id to LLM when caller passes it in kwargs
openai_llm.py extra_body with prompt_cache_key in streaming and non-streaming chat completions; pop and merge for WebSocket path
azure_llm.py Store agent_id; extra_body in streaming and non-streaming chat completions
openai_base.py extra_body in _build_responses_create_kwargs and _generate_responses
graph_agent.py, knowledgebase_agent.py Pass agent_id through to LLM initialization

All changes guarded by if self.agent_id, no effect when absent.

@san0808 san0808 force-pushed the feat/prompt-cache-key-routing branch from 539cf92 to a54388c Compare April 20, 2026 16:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant