Bug Fix: auto-inject prompt caching support for Gemini models#21881
Bug Fix: auto-inject prompt caching support for Gemini models#21881RheagalFire wants to merge 1 commit intoBerriAI:mainfrom
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
Greptile SummaryThis PR fixes a bug where
Confidence Score: 4/5
|
| Filename | Overview |
|---|---|
| litellm/utils.py | Adds message-level cache_control detection in is_cached_message() before the existing content-list check. Clean, correct logic with proper type guards. |
| litellm/llms/vertex_ai/context_caching/vertex_ai_context_caching.py | Adds minimum token count guard (1024) in both sync and async check_and_create_cache() methods. Minor import ordering issue. Logic is correct and avoids Gemini API errors for small cached content. |
| tests/test_litellm/integrations/test_anthropic_cache_control_hook.py | Two new tests verify the full flow for Gemini cache_control injection with both string and list content. Well-structured, no network calls. |
| tests/test_litellm/llms/vertex_ai/context_caching/test_vertex_ai_context_caching.py | Adds setup/teardown for mocking is_prompt_caching_valid_prompt and two new parameterized tests (sync + async) for the min-token skip behavior. No real network calls are made. |
| tests/test_litellm/test_utils.py | Three new unit tests for message-level cache_control detection in is_cached_message(), covering positive, wrong-type, and non-dict cases. |
Flowchart
%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[Proxy receives request with<br/>cache_control_injection_points] --> B[AnthropicCacheControlHook<br/>injects cache_control]
B --> C{Message content type?}
C -->|String content| D["Sets message-level<br/>cache_control = {type: ephemeral}"]
C -->|List content| E["Sets cache_control on<br/>last content item"]
D --> F["is_cached_message()"]
E --> F
F --> G{Detected as cached?}
G -->|"Yes (NEW: message-level check)"| H["separate_cached_messages()<br/>splits messages"]
G -->|No - was broken before| I[Cache silently skipped]
H --> J{"is_prompt_caching_valid_prompt()<br/>(NEW: min 1024 tokens?)"}
J -->|Below threshold| K[Skip caching gracefully]
J -->|Above threshold| L[Proceed with Gemini<br/>context caching API]
style D fill:#90EE90
style J fill:#90EE90
style K fill:#90EE90
Last reviewed commit: bf8d2a3
| from litellm.caching.caching import Cache, LiteLLMCacheType | ||
| from litellm.constants import MINIMUM_PROMPT_CACHE_TOKEN_COUNT | ||
| from litellm.litellm_core_utils.litellm_logging import Logging | ||
| from litellm.llms.custom_httpx.http_handler import ( | ||
| AsyncHTTPHandler, | ||
| HTTPHandler, | ||
| get_async_httpx_client, | ||
| ) | ||
| from litellm._logging import verbose_logger | ||
| from litellm.llms.openai.openai import AllMessageValues | ||
| from litellm.utils import is_prompt_caching_valid_prompt |
There was a problem hiding this comment.
Import ordering is inconsistent
The litellm._logging import is placed between the litellm.llms.* imports, breaking the alphabetical grouping. Consider moving it to be adjacent to the other litellm.* (non-llms) imports for consistency.
| from litellm.caching.caching import Cache, LiteLLMCacheType | |
| from litellm.constants import MINIMUM_PROMPT_CACHE_TOKEN_COUNT | |
| from litellm.litellm_core_utils.litellm_logging import Logging | |
| from litellm.llms.custom_httpx.http_handler import ( | |
| AsyncHTTPHandler, | |
| HTTPHandler, | |
| get_async_httpx_client, | |
| ) | |
| from litellm._logging import verbose_logger | |
| from litellm.llms.openai.openai import AllMessageValues | |
| from litellm.utils import is_prompt_caching_valid_prompt | |
| from litellm._logging import verbose_logger | |
| from litellm.caching.caching import Cache, LiteLLMCacheType | |
| from litellm.constants import MINIMUM_PROMPT_CACHE_TOKEN_COUNT | |
| from litellm.litellm_core_utils.litellm_logging import Logging | |
| from litellm.llms.custom_httpx.http_handler import ( | |
| AsyncHTTPHandler, | |
| HTTPHandler, | |
| get_async_httpx_client, | |
| ) | |
| from litellm.llms.openai.openai import AllMessageValues | |
| from litellm.utils import is_prompt_caching_valid_prompt |
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Relevant issues
Fixes #18519
Problem
cache_control_injection_pointsin proxy YAML config worked for Anthropic models but was silently ignored for Gemini. The root cause: when the hook injectedcache_controlonstring-content messages (common for system messages), it set
message["cache_control"]at the message level. Butis_cached_message()only checked forcache_controlon contentitems inside a list — so message-level markers were never detected by the Gemini context caching pipeline.
Pre-Submission checklist
Please complete all items before asking a LiteLLM maintainer to review your PR
tests/litellm/directory, Adding at least 1 test is a hard requirement - see detailsmake test-unit@greptileaiand received a Confidence Score of at least 4/5 before requesting a maintainer reviewCI (LiteLLM team)
Branch creation CI run
Link:
CI run for the last commit
Link:
Merge / cherry-pick CI run
Links:
Type
🐛 Bug Fix
Changes
litellm/utils.py: Add message-levelcache_controldetection inis_cached_message()before the existing content-list checklitellm/llms/vertex_ai/context_caching/vertex_ai_context_caching.py: Add minimum token count guard (1024) in both sync and asynccheck_and_create_cache()— skips cachinggracefully instead of hitting a Gemini API error
Test plan
pytest tests/test_litellm/test_utils.py::TestIsCachedMessage— 12 pass (3 new)pytest tests/test_litellm/integrations/test_anthropic_cache_control_hook.py— 2 new Gemini tests passpytest tests/test_litellm/llms/vertex_ai/context_caching/test_vertex_ai_context_caching.py— 72 pass (6 new), 0 regressions