[FEATURE] New GenAI Model with Automatic system_prompt Caching Support #1686

cangozpi · 2025-08-16T15:41:13Z

🚀 [FEATURE] Add Google GenAI Model with Automatic `system_prompt` Caching Support

Motivation

Google GenAI is a leading provider of LLMs, offering not only the Gemini family but also models from Hugging Face, Anthropic, and OpenAI. Currently, SmolAgents lacks built-in support for interacting with Google GenAI services.
Alternatives like LiteLLM often lag behind in supporting newly released features—such as Google’s explicit context caching. SmolAgents does not currently support Google’s GenAI SDK natively. While custom integration is possible, it introduces unnecessary complexity, redundancy, and effort.
In SmolAgents, each agent typically has a fixed system_prompt, which is often the longest input segment. This prompt is repeatedly sent for every ActionStep, leading to increased token usage and cost. Google GenAI’s context caching can significantly reduce both cost and response time by avoiding re-sending long static prompts.
Since caching APIs vary across providers (e.g., Google vs. Anthropic), it's logical to implement caching behavior in a provider-specific Model implementation.

What's Implemented

Introduced a new model class: GenAIModel, enabling generation via the Google GenAI SDK.
Authentication is supported using Google Credentials.
Added support for optional explicit context caching:
- If caching is disabled: the model behaves normally with no caching behavior.
- If caching is enabled:
  - Upon calling generate(), the model checks if the current system_prompt is cached.
  - If not cached (or cache expired), it caches the system_prompt for the specified TTL (cache_ttl).
  - Cached prompts are then reused automatically in subsequent calls, improving efficiency and reducing costs.
Provides agent-level configuration for enabling/disabling context caching and specifying TTL, supporting complex multi-agent setups.
Adds support for Google’s Thinking feature (where applicable), such as with the gemini-2.5-flash-lite model.

Example Usage

from smolagents import GenAIModel, DuckDuckGoSearchTool

google_credentials = {...}  # your Google credentials

# Initialize the GenAIModel with the gemini-2.5-flash-lite model
model = GenAIModel(
    vertex_project="your-project-name",
    vertex_location="europe-west1",
    credentials=google_credentials,
    model_id="gemini-2.5-flash-lite",
    custom_role_conversions=None,
    flatten_messages_as_text=None,
    thinking_budget=None,  # set this if the model supports 'thinking'
    cache_system_prompts=True,  # enable or disable context caching
    cache_ttl="300s"  # cache expiration time
)

# Create an agent using the model
agent = CodeAgent(
    tools=[DuckDuckGoSearchTool()],
    model=model,
)

# The system prompt will be automatically cached and reused for 300 seconds
result = agent.run("What is the current weather in Paris?")
print(result)

This was referenced Aug 16, 2025

Support prompt caching for claude models #1419

Open

ENH: Prompt caching #1672

Open

GenAIModel implemented.

a9f3646

cangozpi force-pushed the genai_model branch from c4581c9 to a9f3646 Compare August 16, 2025 16:44

Merge branch 'main' into genai_model

3fad56b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[FEATURE] New GenAI Model with Automatic system_prompt Caching Support #1686

[FEATURE] New GenAI Model with Automatic system_prompt Caching Support #1686

Uh oh!

cangozpi commented Aug 16, 2025

Uh oh!

Uh oh!

[FEATURE] New GenAI Model with Automatic system_prompt Caching Support #1686

Are you sure you want to change the base?

[FEATURE] New GenAI Model with Automatic system_prompt Caching Support #1686

Uh oh!

Conversation

cangozpi commented Aug 16, 2025

🚀 [FEATURE] Add Google GenAI Model with Automatic system_prompt Caching Support

Motivation

What's Implemented

Example Usage

Uh oh!

Uh oh!

🚀 [FEATURE] Add Google GenAI Model with Automatic `system_prompt` Caching Support