fix(genai): restore accurate Gemini token usage reporting #1210
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Why this matters
Teams running Gemini and Vertex adapters today cannot trust the billable token numbers they receive. LangSmith traces, quota safeguards, and customer invoices are all deriving cost from incorrect totals because we drop cache/tool usage, lose modality details, and compute negative or inflated deltas during streaming. Several production users (issues #975, #940, #1011, #1053, #879) have reported that this blocks them from rolling out Gemini 2.x and multimodal workloads; this PR resolves those regressions end to end.
What’s included
usage_metadata
so totals always equal inputs + outputs and we preserve cache/tool/modality detail dictionaries across both sync and streaming responses.input_token_details
and modality enums flow through exactly as Google returns them.Issues addressed
Fixes #975, fixes #940, fixes #1011, fixes #1053, fixes #879
Testing
GOOGLE_API_KEY=fake uv run pytest libs/genai/tests/unit_tests/test_chat_models.py
GOOGLE_API_KEY=fake VERTEXAI_LOCATION=us-central1 uv run pytest libs/vertexai/tests/unit_tests/test_usage_metadata.py