This project demonstrates function calling with large language models, supporting both Google's Gemini API and local models (Gemma3) via Ollama. The Python script creates a simple chatbot that can retrieve the current weather for a given location by using a custom function, showcasing the flexibility of a strategy-based design.
This project is using Strategy design pattern to handle different methods of communicating with the Gemini API. This approach decouples the main application logic from the specific API client implementations, making the system cleaner and more extensible.
Date: July 21, 2025
The core logic is organized as follows:
-
multi-model-chatbot.py
: This is the single entry point for the application. It accepts a command-line argument to select which API client "strategy" to use (genai
oropenai
). It contains the main chat loop, which is now agnostic to the underlying API library. -
clients/
directory: This module contains the different strategies for communicating with the LLM.api_client.py
: An abstract base class that defines a common interface (generate_content
,get_function_call
, etc.) that all concrete clients must implement.genai_client.py
: A concrete strategy that implements theApiClient
interface using thegoogle-genai
library.openai_client.py
: A concrete strategy that implements theApiClient
interface using theopenai
library with Gemini's compatible endpoint.ollama_client.py
: A concrete strategy for running models locally using Ollama. It uses theopenai
library to connect to Ollama's OpenAI-compatible API endpoint. This client uses an advanced few-shot prompting strategy to ensure reliable function calling with smaller models like Gemma. See the "Advanced Function Calling with Local Models" section below for more details.
-
tools/
directory:weather_tool.py
: This module encapsulates the logic for theget_current_weather
function. It now uses the Open-Meteo API for both geocoding (to get coordinates for a location) and for retrieving weather data. The function now returns a more detailed weather forecast, including temperature, wind speed, wind direction, and whether it is day or night. Theweathercode
is also translated into a human-readable description.
When the chatbot runs, it uses the selected client to send the user's prompt to the Gemini model. The model can then issue a function call, which the main script executes via the weather_tool
module.
A significant enhancement to this project is the implementation of a two-step process for function calls. This ensures that the chatbot not only executes the requested function but also provides a more natural and human-readable response based on the function's output.
This two-step process creates a more interactive and intuitive user experience. This workflow is a practical example of a pattern known as Retrieval Augmented Generation (RAG). While RAG is often associated with retrieving data from static documents, our implementation uses a live API call for retrieval. In this context, Function Calling is the mechanism that enables this specific, real-time implementation of the RAG pattern.
To provide a more robust and user-friendly command-line experience, this project uses the prompt-toolkit
library instead of Python's built-in input()
function.
The standard input()
function is very limited and cannot handle special terminal commands, such as those sent by arrow keys. This results in strange characters like ^[[A
appearing in the terminal when you try to use arrow keys to edit your input.
By integrating prompt-toolkit
, the chatbot now features a much more powerful input prompt that supports:
- Cursor Navigation: Arrow keys (left, right) work as expected for editing text.
- Command History: You can cycle through your previous messages using the up and down arrow keys.
- Graceful Exit: Handles
Ctrl+C
andCtrl+D
without crashing.
This makes interacting with the chatbot a smoother and more intuitive experience.
- This project uses
uv
to manage dependencies. ollama
is required to runollama_client.py
implementation.make
to run chatbot in different modes
The required Python libraries for the application are:
google-genai
: The official Python library for the Google AI SDK.openai
: The library for the OpenAI API, used to connect to Gemini's OpenAI-compatible endpoint.requests
: A simple, yet elegant, HTTP library for the weather tool.prompt-toolkit
: A powerful library for building interactive command-line interfaces.
A key feature of this project is the ability to run the chatbot against a model deployed locally on your machine.
-
Install Ollama: First, you need to install Ollama. You can find the download instructions on their official website:
-
Pull the Model: Once Ollama is running, you must pull the model used by our client. We are using the slim, open
gemma3:1b
model. Open your terminal and run:ollama run gemma3:1b
You can find more information about the model here: https://ollama.com/library/gemma3
To run this script, you will need to set up your Gemini API key as an environment variable.
First, for the Gemini API, you'll need to set the GEMINI_API_KEY
:
export GEMINI_API_KEY="YOUR_GEMINI_API_KEY"
Note: The Ollama client does not require any API keys, as it runs entirely on your local machine.
The Open-Meteo API, which is used for geocoding and weather data, is free to use without an API key. However, for higher usage, you can optionally add an OPENMETEO_API_KEY
:
export OPENMETEO_API_KEY="YOUR_OPENMETEO_API_KEY"
You can run the chatbot using the main multi-model-chatbot.py
script. Use the --client
flag to specify which API library to use.
To run the google-genai
client implementation:
make gemini-genai
To run the ollama
client implementation against your local model:
make gemma-openai
To run the openai
library client implementation:
make gemini-openai
If you do not provide a --client
flag, it will default to using genai
.
The core of this project is the function-calling feature of the Gemini model. This is implemented through the following components:
-
tools/weather_tool.py
: This module contains the implementation of theget_current_weather
function and the detailedWEATHER_TOOL_INSTRUCTIONS
. -
clients/ollama_client.py
,clients/genai_client.py
&clients/openai_client.py
: Each client module contains its own library-specificWEATHER_TOOL
dictionary. This dictionary defines the function for the model in the format required by its respective library (google-genai
oropenai
).
This separation ensures that the tool's implementation is centralized, while the API-specific definitions live alongside the client logic.
It is important to understand that the function calling mechanism for Gemma models is fundamentally different from models with native tool-use capabilities. Gemma models perform function calling through a structured prompting strategy. This means the model is instructed to generate a specific JSON output within its text response when a tool is needed, which the client code must then parse to execute the function. This prompt-based method requires more explicit guidance, which is why the few-shot strategy is so effective.
To solve the challenges of this approach, the ollama_client.py
implements a powerful technique known as few-shot prompting. Instead of just providing a system prompt with instructions, we seed the model's conversation history with a complete, multi-step example of the desired interaction. This "teaches" the model the expected behavior through demonstration.
To ensure efficient and scalable conversations, all API clients in this repository (OpenAIClient
, GenAIClient
, and OllamaClient
) have been updated to use a Sliding Window memory strategy.
Previously, the entire conversation history was submitted with each new user request. This approach, while simple, leads to two major issues:
- High Token Consumption: As the conversation grows, the number of tokens sent to the model increases with every turn, leading to higher operational costs.
- Context Window Limits: Eventually, the conversation history can exceed the model's maximum context window, causing errors and an inability to continue the conversation.
The sliding window strategy addresses this by maintaining a fixed-size history of the most recent conversational turns. We use Python's collections.deque
with a maxlen
to automatically manage this.
- When a new message (from the user or the assistant) is added, it is appended to the history.
- If the history is full, the oldest message is automatically dropped.
- This ensures that the token count remains predictable and bounded, preventing context overload while keeping the most recent interactions fresh in the model's memory.
A window size of 10 recent messages is currently implemented across all clients.
The OllamaClient
uses a "few-shot" prompting strategy, which requires a static system prompt and examples to be present in every API call to guide the model's behavior. To accommodate this, its implementation of the sliding window is slightly different:
- The initial prompt (containing the system message and few-shot examples) is stored separately and is never dropped.
- The sliding window is applied only to the actual user/assistant conversation.
- At runtime, the final prompt is constructed by combining the static initial prompt with the dynamic, sliding conversation history.
This hybrid approach gives us the best of both worlds: the robust, guided behavior from few-shot prompting and the memory efficiency of a sliding window.