Bug Fix: auto-inject prompt caching support for Gemini models by RheagalFire · Pull Request #21881 · BerriAI/litellm

RheagalFire · 2026-02-22T08:19:04Z

Relevant issues

Problem

cache_control_injection_points in proxy YAML config worked for Anthropic models but was silently ignored for Gemini. The root cause: when the hook injected cache_control on
string-content messages (common for system messages), it set message["cache_control"] at the message level. But is_cached_message() only checked for cache_control on content
items inside a list — so message-level markers were never detected by the Gemini context caching pipeline.

Pre-Submission checklist

Please complete all items before asking a LiteLLM maintainer to review your PR

I have Added testing in the tests/litellm/ directory, Adding at least 1 test is a hard requirement - see details
My PR passes all unit tests on make test-unit
My PR's scope is as isolated as possible, it only solves 1 specific problem
I have requested a Greptile review by commenting @greptileai and received a Confidence Score of at least 4/5 before requesting a maintainer review

CI (LiteLLM team)

CI status guideline:

50-55 passing tests: main is stable with minor issues.

45-49 passing tests: acceptable but needs attention

<= 40 passing tests: unstable; be careful with your merges and assess the risk.

Branch creation CI run
Link:
CI run for the last commit
Link:
Merge / cherry-pick CI run
Links:

Type

🐛 Bug Fix

Changes

litellm/utils.py: Add message-level cache_control detection in is_cached_message() before the existing content-list check
- litellm/llms/vertex_ai/context_caching/vertex_ai_context_caching.py: Add minimum token count guard (1024) in both sync and async check_and_create_cache() — skips caching
  gracefully instead of hitting a Gemini API error
- Tests: 11 new tests across 3 files covering message-level detection, Gemini integration flow, and min-token skip behavior

Test plan

pytest tests/test_litellm/test_utils.py::TestIsCachedMessage — 12 pass (3 new)
pytest tests/test_litellm/integrations/test_anthropic_cache_control_hook.py — 2 new Gemini tests pass
pytest tests/test_litellm/llms/vertex_ai/context_caching/test_vertex_ai_context_caching.py — 72 pass (6 new), 0 regressions

vercel · 2026-02-22T08:19:09Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
litellm	Ready	Preview, Comment	Feb 22, 2026 8:20am

RheagalFire · 2026-02-22T08:20:33Z

@greptileai

greptile-apps · 2026-02-22T08:23:12Z

Greptile Summary

This PR fixes a bug where cache_control_injection_points in proxy YAML config was silently ignored for Gemini models (issue #18519). The root cause was that the AnthropicCacheControlHook sets message["cache_control"] at the message level for string-content messages, but is_cached_message() only checked for cache_control on content items inside a list.

Core fix in litellm/utils.py: Adds message-level cache_control detection in is_cached_message() before the existing content-list check, enabling Gemini's context caching pipeline to detect markers injected by the hook on string-content messages.
Min-token guard in vertex_ai_context_caching.py: Adds a minimum token count check (1024, the Gemini requirement) in both sync and async check_and_create_cache() — gracefully skips caching instead of triggering a Gemini API error.
Tests: 11 new tests across 3 files covering message-level detection, Gemini integration flow, and min-token skip behavior. All tests are properly mocked with no real network calls.

Confidence Score: 4/5

This PR is safe to merge — it adds detection for an already-existing message format and a graceful fallback guard with no behavioral regressions.
The core fix in is_cached_message() is minimal and correct, adding detection for message-level cache_control that was already being set by the hook. The min-token guard is a sensible defensive addition that prevents API errors. Tests are thorough and well-structured. Only a minor import ordering style issue was found.
No files require special attention. The changes in litellm/llms/vertex_ai/context_caching/vertex_ai_context_caching.py have a minor import ordering issue but are functionally correct.

Important Files Changed

Filename	Overview
litellm/utils.py	Adds message-level `cache_control` detection in `is_cached_message()` before the existing content-list check. Clean, correct logic with proper type guards.
litellm/llms/vertex_ai/context_caching/vertex_ai_context_caching.py	Adds minimum token count guard (1024) in both sync and async `check_and_create_cache()` methods. Minor import ordering issue. Logic is correct and avoids Gemini API errors for small cached content.
tests/test_litellm/integrations/test_anthropic_cache_control_hook.py	Two new tests verify the full flow for Gemini cache_control injection with both string and list content. Well-structured, no network calls.
tests/test_litellm/llms/vertex_ai/context_caching/test_vertex_ai_context_caching.py	Adds setup/teardown for mocking `is_prompt_caching_valid_prompt` and two new parameterized tests (sync + async) for the min-token skip behavior. No real network calls are made.
tests/test_litellm/test_utils.py	Three new unit tests for message-level `cache_control` detection in `is_cached_message()`, covering positive, wrong-type, and non-dict cases.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Proxy receives request with<br/>cache_control_injection_points] --> B[AnthropicCacheControlHook<br/>injects cache_control]
    B --> C{Message content type?}
    C -->|String content| D["Sets message-level<br/>cache_control = {type: ephemeral}"]
    C -->|List content| E["Sets cache_control on<br/>last content item"]
    D --> F["is_cached_message()"]
    E --> F
    F --> G{Detected as cached?}
    G -->|"Yes (NEW: message-level check)"| H["separate_cached_messages()<br/>splits messages"]
    G -->|No - was broken before| I[Cache silently skipped]
    H --> J{"is_prompt_caching_valid_prompt()<br/>(NEW: min 1024 tokens?)"}
    J -->|Below threshold| K[Skip caching gracefully]
    J -->|Above threshold| L[Proceed with Gemini<br/>context caching API]

    style D fill:#90EE90
    style J fill:#90EE90
    style K fill:#90EE90

_{Last reviewed commit: bf8d2a3}

greptile-apps

_{5 files reviewed, 1 comment}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-02-22T08:23:15Z

litellm/llms/vertex_ai/context_caching/vertex_ai_context_caching.py

 from litellm.caching.caching import Cache, LiteLLMCacheType
+from litellm.constants import MINIMUM_PROMPT_CACHE_TOKEN_COUNT
 from litellm.litellm_core_utils.litellm_logging import Logging
 from litellm.llms.custom_httpx.http_handler import (
    AsyncHTTPHandler,
    HTTPHandler,
    get_async_httpx_client,
 )
+from litellm._logging import verbose_logger
 from litellm.llms.openai.openai import AllMessageValues
+from litellm.utils import is_prompt_caching_valid_prompt


Import ordering is inconsistent
The litellm._logging import is placed between the litellm.llms.* imports, breaking the alphabetical grouping. Consider moving it to be adjacent to the other litellm.* (non-llms) imports for consistency.

Suggested change

from litellm.caching.caching import Cache, LiteLLMCacheType

from litellm.constants import MINIMUM_PROMPT_CACHE_TOKEN_COUNT

from litellm.litellm_core_utils.litellm_logging import Logging

from litellm.llms.custom_httpx.http_handler import (

AsyncHTTPHandler,

HTTPHandler,

get_async_httpx_client,

)

from litellm._logging import verbose_logger

from litellm.llms.openai.openai import AllMessageValues

from litellm.utils import is_prompt_caching_valid_prompt

from litellm._logging import verbose_logger

from litellm.caching.caching import Cache, LiteLLMCacheType

from litellm.constants import MINIMUM_PROMPT_CACHE_TOKEN_COUNT

from litellm.litellm_core_utils.litellm_logging import Logging

from litellm.llms.custom_httpx.http_handler import (

AsyncHTTPHandler,

HTTPHandler,

get_async_httpx_client,

)

from litellm.llms.openai.openai import AllMessageValues

from litellm.utils import is_prompt_caching_valid_prompt

_{Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!}

add explicit caching to litellm proxy for gemini models via injection

bf8d2a3

vercel bot deployed to Preview February 22, 2026 08:20 View deployment

greptile-apps bot reviewed Feb 22, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Comments

Bug Fix: auto-inject prompt caching support for Gemini models#21881

Bug Fix: auto-inject prompt caching support for Gemini models#21881
RheagalFire wants to merge 1 commit intoBerriAI:mainfrom
RheagalFire:feat/autoinject_gemini_cache

RheagalFire commented Feb 22, 2026 •

edited

Loading

Uh oh!

vercel bot commented Feb 22, 2026 •

edited

Loading

Uh oh!

RheagalFire commented Feb 22, 2026

Uh oh!

greptile-apps bot commented Feb 22, 2026

Important Files Changed

Uh oh!

greptile-apps bot left a comment

Uh oh!

greptile-apps bot Feb 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Comments

Conversation

RheagalFire commented Feb 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Relevant issues

Problem

Pre-Submission checklist

CI (LiteLLM team)

Type

Changes

Test plan

Uh oh!

vercel bot commented Feb 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

RheagalFire commented Feb 22, 2026

Uh oh!

greptile-apps bot commented Feb 22, 2026

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Flowchart

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Feb 22, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

RheagalFire commented Feb 22, 2026 •

edited

Loading

vercel bot commented Feb 22, 2026 •

edited

Loading