Skip to content

Conversation

parkerhancock
Copy link

@parkerhancock parkerhancock commented Sep 24, 2025

Why this matters

Teams running Gemini and Vertex adapters today cannot trust the billable token numbers they receive. LangSmith traces, quota safeguards, and customer invoices are all deriving cost from incorrect totals because we drop cache/tool usage, lose modality details, and compute negative or inflated deltas during streaming. Several production users (issues #975, #940, #1011, #1053, #879) have reported that this blocks them from rolling out Gemini 2.x and multimodal workloads; this PR resolves those regressions end to end.

What’s included

  • Normalize Gemini usage_metadata so totals always equal inputs + outputs and we preserve cache/tool/modality detail dictionaries across both sync and streaming responses.
  • Apply the same normalization to the Vertex adapter (mirroring merged PR fix: Add cache token support to ChatAnthropicVertex streaming responses #1010 for Anthropic) so input_token_details and modality enums flow through exactly as Google returns them.
  • Introduce a shared delta helper for streaming that prevents negative/duplicated counts and ensures tool-call prompts and reasoning tokens are visible chunk-by-chunk. Also harden tool-call argument coercion to avoid Cloud Build failures from Infinity/NaN payloads.

Issues addressed

Fixes #975, fixes #940, fixes #1011, fixes #1053, fixes #879

Testing

  • GOOGLE_API_KEY=fake uv run pytest libs/genai/tests/unit_tests/test_chat_models.py
  • GOOGLE_API_KEY=fake VERTEXAI_LOCATION=us-central1 uv run pytest libs/vertexai/tests/unit_tests/test_usage_metadata.py

@parkerhancock parkerhancock changed the title Fix Gemini usage metadata handling fix: preserve Gemini usage metadata across genai and vertex Sep 24, 2025
@parkerhancock parkerhancock force-pushed the fix-gemini-usage branch 2 times, most recently from 986ba5b to fe3f8f3 Compare September 24, 2025 17:28
@mdrxy
Copy link
Collaborator

mdrxy commented Sep 24, 2025

Thank you - will investigate as soon as able

@parkerhancock
Copy link
Author

parkerhancock commented Sep 25, 2025

Updated the branch to latest main, fixed Gemini tool-call argument serialization so Infinity/NaN responses no longer break parsing, and cleaned up the local lint (mypy) failure. The failing Cloud Build run was caused by proto tool-call args containing Infinity; the new _coerce_function_call_args path normalizes those values and we added a regression test to cover it. Lint passes with make lint locally; Google Cloud Build should rerun clean now.

@parkerhancock parkerhancock changed the title fix: preserve Gemini usage metadata across genai and vertex fix(genai,vertexai): restore accurate Gemini token usage reporting Sep 29, 2025
@parkerhancock parkerhancock changed the title fix(genai,vertexai): restore accurate Gemini token usage reporting fix(genai): restore accurate Gemini token usage reporting Sep 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment