Skip to content

Conversation

cangozpi
Copy link

🚀 [FEATURE] Add Google GenAI Model with Automatic system_prompt Caching Support

Motivation

  • Google GenAI is a leading provider of LLMs, offering not only the Gemini family but also models from Hugging Face, Anthropic, and OpenAI. Currently, SmolAgents lacks built-in support for interacting with Google GenAI services.
  • Alternatives like LiteLLM often lag behind in supporting newly released features—such as Google’s explicit context caching. SmolAgents does not currently support Google’s GenAI SDK natively. While custom integration is possible, it introduces unnecessary complexity, redundancy, and effort.
  • In SmolAgents, each agent typically has a fixed system_prompt, which is often the longest input segment. This prompt is repeatedly sent for every ActionStep, leading to increased token usage and cost. Google GenAI’s context caching can significantly reduce both cost and response time by avoiding re-sending long static prompts.
  • Since caching APIs vary across providers (e.g., Google vs. Anthropic), it's logical to implement caching behavior in a provider-specific Model implementation.

What's Implemented

  • Introduced a new model class: GenAIModel, enabling generation via the Google GenAI SDK.
  • Authentication is supported using Google Credentials.
  • Added support for optional explicit context caching:
    • If caching is disabled: the model behaves normally with no caching behavior.
    • If caching is enabled:
      • Upon calling generate(), the model checks if the current system_prompt is cached.
      • If not cached (or cache expired), it caches the system_prompt for the specified TTL (cache_ttl).
      • Cached prompts are then reused automatically in subsequent calls, improving efficiency and reducing costs.
  • Provides agent-level configuration for enabling/disabling context caching and specifying TTL, supporting complex multi-agent setups.
  • Adds support for Google’s Thinking feature (where applicable), such as with the gemini-2.5-flash-lite model.

Example Usage

from smolagents import GenAIModel, DuckDuckGoSearchTool

google_credentials = {...}  # your Google credentials

# Initialize the GenAIModel with the gemini-2.5-flash-lite model
model = GenAIModel(
    vertex_project="your-project-name",
    vertex_location="europe-west1",
    credentials=google_credentials,
    model_id="gemini-2.5-flash-lite",
    custom_role_conversions=None,
    flatten_messages_as_text=None,
    thinking_budget=None,  # set this if the model supports 'thinking'
    cache_system_prompts=True,  # enable or disable context caching
    cache_ttl="300s"  # cache expiration time
)

# Create an agent using the model
agent = CodeAgent(
    tools=[DuckDuckGoSearchTool()],
    model=model,
)

# The system prompt will be automatically cached and reused for 300 seconds
result = agent.run("What is the current weather in Paris?")
print(result)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant