[FEATURE] New GenAI Model with Automatic system_prompt Caching Support #1686
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
🚀 [FEATURE] Add Google GenAI Model with Automatic
system_prompt
Caching SupportMotivation
system_prompt
, which is often the longest input segment. This prompt is repeatedly sent for everyActionStep
, leading to increased token usage and cost. Google GenAI’s context caching can significantly reduce both cost and response time by avoiding re-sending long static prompts.Model
implementation.What's Implemented
GenAIModel
, enabling generation via the Google GenAI SDK.generate()
, the model checks if the currentsystem_prompt
is cached.system_prompt
for the specified TTL (cache_ttl
).gemini-2.5-flash-lite
model.Example Usage