Skip to content

fix: return null stop_reason for content_filter#1994

Open
steebchen wants to merge 3 commits intomainfrom
fix/anthropic-stop-reason-content-filter
Open

fix: return null stop_reason for content_filter#1994
steebchen wants to merge 3 commits intomainfrom
fix/anthropic-stop-reason-content-filter

Conversation

@steebchen
Copy link
Copy Markdown
Member

@steebchen steebchen commented Apr 8, 2026

Summary

  • determineStopReason was returning "end_turn" as the default for any unrecognised finish reason, including "content_filter"
  • This misled Anthropic API clients into thinking a content-filtered response completed normally
  • Now returns null for content_filter and any unknown finish reason, matching Anthropic's API behaviour

Test plan

  • Send a request that triggers content filtering through the Anthropic-compatible endpoint and verify stop_reason is null
  • Verify normal completions still return "end_turn"
  • Verify length"max_tokens" and tool_calls"tool_use" still work

🤖 Generated with Claude Code

Summary by CodeRabbit

Release Notes

  • New Features

    • Added support for prompt caching with cache_control directives on system and message content blocks.
    • Enhanced token usage tracking to include cache_creation_input_tokens and cache_read_input_tokens metrics in API responses.
    • Prompt caching now supported across OpenAI-compatible and native provider endpoints.
  • Tests

    • Added comprehensive end-to-end tests for prompt caching behavior validation.

romel lauron and others added 3 commits April 9, 2026 01:40
Anthropic and Bedrock prompt caching pass-through was unreliable: client-
supplied cache_control markers were partially handled, cache token usage
was inconsistently surfaced through the openai-compat and native /v1/messages
paths, and there was no e2e coverage for streaming or bedrock.

This change:

- Honors caller-supplied cache_control on text content parts in
  /v1/chat/completions (new optional schema field) and forwards them
  verbatim to Anthropic, mapping to cachePoint blocks for Bedrock. Falls
  back to the existing length-based heuristic when no marker is provided.
- Preserves cache_control on system + message text blocks coming through
  the native /v1/messages endpoint, and surfaces cache_creation_input_tokens
  / cache_read_input_tokens on responses (always emitted, set to 0 when
  inapplicable, matching Anthropic's actual API).
- Surfaces cache_creation_tokens alongside cached_tokens in
  prompt_tokens_details on the openai-compat response, including streaming
  chunks via a new normalizeAnthropicUsage helper.
- Strips cache_control from text parts when routing to non-Anthropic /
  non-Bedrock providers so OpenAI/Google/etc. don't receive an unknown
  field.
- Adds end-to-end tests covering: native /v1/messages with explicit
  cache_control, openai-compat for both Anthropic and Bedrock, streaming
  for both, and explicit cache_control on /v1/chat/completions. Each
  asserts cached_tokens > 0 after a retry-with-backoff (Anthropic prompt
  cache writes are eventually consistent), and where applicable asserts
  the per-token cached cost is strictly less than the per-token uncached
  cost within the same response.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Make cache_creation_input_tokens and cache_read_input_tokens optional
with a default of 0 in anthropicResponseSchema. Anthropic emits these
on caching-supported models today, but a non-optional schema would fail
validation if an older Claude model, a beta endpoint, or a future API
change ever omits them — turning a graceful "no caching info" into a 500.

The downstream conversion code already handles 0 correctly.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings April 8, 2026 18:30
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 8, 2026

Walkthrough

Added Anthropic prompt caching support across the gateway by introducing cache_creation_input_tokens and cache_read_input_tokens token tracking, extending the request schema to accept cache_control markers on text content blocks, and implementing provider-specific logic to transform and propagate cache-related fields through request/response pipelines while maintaining compatibility with OpenAI-format endpoints.

Changes

Cohort / File(s) Summary
Anthropic Gateway Integration
apps/gateway/src/anthropic/anthropic.ts
Extended usage schema with cache_creation_input_tokens and cache_read_input_tokens fields. Reworked token accounting: streaming now recomputes input_tokens via max(0, prompt_tokens - cached_tokens - cache_creation_tokens); non-streaming derives cache metrics from prompt_tokens_details. Modified system/prompt message transformation to preserve cache_control as per-block markers instead of flattening. Changed determineStopReason to explicitly map content_filter to null and return null (instead of "end_turn") for unrecognized finish reasons.
Schema and Request Handling
apps/gateway/src/chat/schemas/completions.ts
Extended message content schema for type: "text" blocks to include optional cache_control: { type: "ephemeral" } property.
Token Usage Extraction
apps/gateway/src/chat/tools/extract-token-usage.ts, apps/gateway/src/chat/tools/parse-provider-response.ts
Added cacheCreationTokens field extraction. For Anthropic, populated from usage.cache_creation_input_tokens; for AWS Bedrock, populated from usage.cacheWriteInputTokens. Adjusted prompt-token calculations to account for cache creation tokens in addition to existing cache-read tokens.
Response Transformation
apps/gateway/src/chat/tools/transform-response-to-openai.ts, apps/gateway/src/chat/tools/transform-streaming-to-openai.ts
Updated transformResponseToOpenai to accept and propagate cacheCreationTokens parameter. Conditional prompt_tokens_details emission now depends on either cache reads or cache creation. Added normalizeAnthropicUsage helper to convert Anthropic streaming chunks to OpenAI-compatible usage shape with cached/creation token details.
Chat Route Handling
apps/gateway/src/chat/chat.ts
Introduced cacheCreationTokens tracking and propagated it through streaming/non-streaming response transformations. Updated final streaming usage chunk construction to include prompt_tokens_details when cache activity is present, with conditional cache_creation_tokens emission.
Request Body Preparation
packages/actions/src/prepare-request-body.ts
Added provider-specific preprocessing: strips Anthropic-style cache_control from non-Anthropic/Bedrock providers. For Anthropic, preserves per-block cache_control when explicitly set; otherwise applies length-based heuristic. For AWS Bedrock, maps cache_control to cachePoint markers (bounded by max limit).
E2E Tests
apps/gateway/src/chat-prompt-caching.e2e.ts, apps/gateway/src/native-anthropic-cache.e2e.ts
Enhanced existing cache test with retry loop and exponential backoff to await cache population; validates cached_tokens > 0 before success. Added comprehensive new test suite validating Anthropic native caching and OpenAI-compat routing: covers native /v1/messages, OpenAI-compat /v1/chat/completions via Anthropic and Bedrock, streaming variants, and cache token cost discounting.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Title check ⚠️ Warning The title focuses on a single stop_reason fix (content_filter), but the changeset predominantly implements Anthropic/Bedrock prompt caching support across multiple files with 896 lines of changes; the stop_reason fix is a minor part. Consider a more accurate title reflecting the main scope: 'feat: add Anthropic/Bedrock prompt caching support with cache_control' or similar, and address the stop_reason fix as a secondary change or separate PR.
Docstring Coverage ⚠️ Warning Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/anthropic-stop-reason-content-filter

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adjusts the Anthropic-compatible gateway behavior around stop reasons and prompt caching/usage reporting, aiming to better match Anthropic semantics while improving cache observability across OpenAI-compatible endpoints.

Changes:

  • Update Anthropic /v1/messages stop_reason mapping to return null for content_filter and unknown finish reasons.
  • Add/propagate prompt caching controls and usage accounting (including cache read/write token breakdown) through request preparation and OpenAI-compatible responses/streams.
  • Add and update e2e coverage for Anthropic/Bedrock prompt caching and cached-token surfacing (including retry/backoff for eventual consistency).

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
packages/actions/src/prepare-request-body.ts Preserves/strips cache_control depending on provider; forwards explicit cache markers for Anthropic/Bedrock.
apps/gateway/src/native-anthropic-cache.e2e.ts New e2e suite validating cache_control forwarding and cached-token usage surfacing across endpoints and streaming.
apps/gateway/src/chat/tools/transform-streaming-to-openai.ts Normalizes Anthropic streaming usage into OpenAI-style usage with cache details.
apps/gateway/src/chat/tools/transform-response-to-openai.ts Extends OpenAI usage shaping to optionally include cache creation token details.
apps/gateway/src/chat/tools/parse-provider-response.ts Extracts cacheCreationTokens alongside cached read tokens for Anthropic/Bedrock.
apps/gateway/src/chat/tools/extract-token-usage.ts Extracts cacheCreationTokens in streaming usage parsing for Anthropic/Bedrock.
apps/gateway/src/chat/schemas/completions.ts Allows cache_control on OpenAI-compat text content parts.
apps/gateway/src/chat/chat.ts Threads cacheCreationTokens through streaming aggregation and final OpenAI response transformation.
apps/gateway/src/chat-prompt-caching.e2e.ts Adds retry/backoff to reduce flakiness from eventual cache consistency.
apps/gateway/src/anthropic/anthropic.ts Preserves cache markers when translating Anthropic requests; fixes stop_reason mapping; includes cache token fields in responses.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 884 to 898
function determineStopReason(
finishReason: string | undefined,
): "end_turn" | "max_tokens" | "stop_sequence" | "tool_use" | null {
switch (finishReason) {
case "stop":
return "end_turn";
case "length":
return "max_tokens";
case "tool_calls":
return "tool_use";
case "content_filter":
return null;
default:
return "end_turn";
return null;
}
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR metadata (title/description) focuses on stop_reason/content_filter, but this diff also introduces substantial prompt-caching behavior changes (cache_control forwarding/stripping), new cache token accounting (cache_creation_tokens), and a large new e2e suite. Please update the PR title/description to reflect the full scope, or split the caching/usage work into a separate PR so reviewers can assess each change independently.

Copilot uses AI. Check for mistakes.
Comment on lines 7109 to +7118
),
...(cachedTokens !== null && {
...((cachedTokens !== null ||
(cacheCreationTokens !== null &&
cacheCreationTokens > 0)) && {
prompt_tokens_details: {
cached_tokens: cachedTokens,
cached_tokens: cachedTokens ?? 0,
...(cacheCreationTokens !== null &&
cacheCreationTokens > 0 && {
cache_creation_tokens: cacheCreationTokens,
}),
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This adds prompt_tokens_details.cache_creation_tokens to the streamed usage payload. The OpenAPI response schema in this file currently documents prompt_tokens_details as only { cached_tokens }, so the API docs/types will be out of sync with actual responses. Either extend the response schema to allow optional cache_creation_tokens (and propagate to any client types), or avoid emitting this new key in the OpenAI-compatible response.

Copilot uses AI. Check for mistakes.
Comment on lines 102 to 133
function buildUsageObject(
promptTokens: number | null,
completionTokens: number | null,
totalTokens: number | null,
reasoningTokens: number | null,
cachedTokens: number | null,
costs: CostData | null,
showUpgradeMessage = false,
cacheCreationTokens: number | null = null,
) {
const hasCacheRead = cachedTokens !== null;
const hasCacheCreation =
cacheCreationTokens !== null && cacheCreationTokens > 0;
return {
prompt_tokens: Math.max(1, promptTokens ?? 1),
completion_tokens: completionTokens ?? 0,
total_tokens: (() => {
const fallbackTotal =
(promptTokens ?? 0) + (completionTokens ?? 0) + (reasoningTokens ?? 0);
return Math.max(1, totalTokens ?? fallbackTotal);
})(),
...(reasoningTokens !== null && {
reasoning_tokens: reasoningTokens,
}),
...(cachedTokens !== null && {
...((hasCacheRead || hasCacheCreation) && {
prompt_tokens_details: {
cached_tokens: cachedTokens,
cached_tokens: cachedTokens ?? 0,
...(hasCacheCreation && {
cache_creation_tokens: cacheCreationTokens,
}),
},
}),
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

buildUsageObject now conditionally emits prompt_tokens_details.cache_creation_tokens, which is new externally-visible response surface area. Please add/extend unit tests in transform-response-to-openai.spec.ts to cover the cases: (1) cache read only, (2) cache creation only, (3) both, and assert the exact shape of usage.prompt_tokens_details so regressions don’t silently change the API payload.

Copilot uses AI. Check for mistakes.
Comment on lines +12 to +33
function normalizeAnthropicUsage(usage: any): any {
if (!usage || typeof usage !== "object") {
return null;
}
const inputTokens = usage.input_tokens ?? 0;
const cacheCreation = usage.cache_creation_input_tokens ?? 0;
const cacheRead = usage.cache_read_input_tokens ?? 0;
const outputTokens = usage.output_tokens ?? 0;
const promptTokens = inputTokens + cacheCreation + cacheRead;
const hasCacheInfo = cacheRead > 0 || cacheCreation > 0;
return {
prompt_tokens: promptTokens,
completion_tokens: outputTokens,
total_tokens: promptTokens + outputTokens,
...(hasCacheInfo && {
prompt_tokens_details: {
cached_tokens: cacheRead,
...(cacheCreation > 0 && { cache_creation_tokens: cacheCreation }),
},
}),
};
}
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

normalizeAnthropicUsage changes the shape of usage for Anthropic streaming chunks (now emitting OpenAI-style prompt_tokens* fields and optional prompt_tokens_details.cache_creation_tokens). Please add unit tests in transform-streaming-to-openai.spec.ts covering at least one chunk with cache_read/cache_creation fields to ensure the normalized usage totals and the cache detail fields remain correct.

Copilot uses AI. Check for mistakes.
Comment on lines +196 to +200
// Sanity: input_tokens should be the *non-cached* input tokens, not
// the total. The cached portion lives in cache_read_input_tokens.
expect(second.json.usage.input_tokens).toBeLessThan(
second.json.usage.cache_read_input_tokens,
);
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assertion input_tokens < cache_read_input_tokens isn’t guaranteed by Anthropic’s semantics (uncached input can be larger than cached input depending on prompt composition), which can make this test flaky across tokenizers/model changes. Consider removing this inequality check or replacing it with a more robust invariant (e.g., only assert cache_read_input_tokens > 0 and that input_tokens is non-negative / stable across retries).

Suggested change
// Sanity: input_tokens should be the *non-cached* input tokens, not
// the total. The cached portion lives in cache_read_input_tokens.
expect(second.json.usage.input_tokens).toBeLessThan(
second.json.usage.cache_read_input_tokens,
);
// Sanity: input_tokens should reflect the non-cached portion only, so
// just verify it is a valid non-negative count without assuming any
// ordering relationship to cache_read_input_tokens.
expect(typeof second.json.usage.input_tokens).toBe("number");
expect(second.json.usage.input_tokens).toBeGreaterThanOrEqual(0);

Copilot uses AI. Check for mistakes.
Comment on lines +776 to +806
// Strip Anthropic-style cache_control markers from text content parts when
// the resolved provider doesn't natively understand them. The Anthropic and
// AWS Bedrock branches below transform/forward cache_control on their own;
// every other provider receives the raw `processedMessages` and would
// otherwise pass an unknown field through to OpenAI/Google/etc., risking a
// 400 from strict providers and confusing logs from lenient ones.
const providerHandlesCacheControl =
usedProvider === "anthropic" || usedProvider === "aws-bedrock";
if (!providerHandlesCacheControl) {
processedMessages = processedMessages.map((m) => {
if (!Array.isArray(m.content)) {
return m;
}
let mutated = false;
const newContent = m.content.map((part) => {
const asRecord = part as unknown as Record<string, unknown>;
if (
asRecord &&
typeof asRecord === "object" &&
asRecord.type === "text" &&
asRecord.cache_control !== undefined
) {
mutated = true;
const { cache_control: _ignored, ...rest } = asRecord;
return rest as unknown as typeof part;
}
return part;
});
return mutated ? { ...m, content: newContent } : m;
});
}
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New behavior strips cache_control for non-Anthropic/Bedrock providers and preserves per-block cache_control markers for Anthropic system messages when explicitly set. There are existing unit tests for heuristic cache_control, but none covering (a) explicit cache_control passthrough on system blocks, or (b) stripping cache_control when routing to other providers. Please add tests in prepare-request-body.spec.ts to lock in these behaviors.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (2)
apps/gateway/src/chat/tools/transform-response-to-openai.ts (1)

112-133: Consider using cachedTokens > 0 for hasCacheRead check.

The current logic hasCacheRead = cachedTokens !== null will emit prompt_tokens_details even when cachedTokens is explicitly 0 (no cache read occurred). This differs from hasCacheCreation which checks > 0.

For consistency and to avoid emitting empty cache details, consider:

-	const hasCacheRead = cachedTokens !== null;
+	const hasCacheRead = cachedTokens !== null && cachedTokens > 0;

However, if the intent is to always include prompt_tokens_details when caching is enabled (even with zero reads), the current behavior is acceptable.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/gateway/src/chat/tools/transform-response-to-openai.ts` around lines 112
- 133, The current check uses hasCacheRead = cachedTokens !== null which will
treat cachedTokens === 0 as a cache read and cause prompt_tokens_details to be
emitted; change the predicate to require a positive token count (e.g.
hasCacheRead should use cachedTokens > 0) so prompt_tokens_details is only
included when there are cached tokens, keeping hasCacheCreation
(cacheCreationTokens > 0) consistent with cachedTokens; update any logic
referencing hasCacheRead and ensure prompt_tokens_details still includes
cached_tokens and conditionally cache_creation_tokens as before.
apps/gateway/src/native-anthropic-cache.e2e.ts (1)

29-33: Please replace the new any-typed helpers with small response shapes.

The added helpers carry parsed JSON, usage, and SSE chunks as any, which turns off type checking in the exact assertions this file is meant to protect. A tiny CacheUsage / StreamingChunk shape would be enough here.

As per coding guidelines, **/*.{ts,tsx}: Never use any or as any in TypeScript unless absolutely necessary.

Also applies to: 65-66, 90-95, 349-349, 425-425

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/gateway/src/native-anthropic-cache.e2e.ts` around lines 29 - 33, The
helpers and handlers in this test use loose any types (e.g., the
sendUntilCacheRead return shape and variables like last) so replace those any
usages with small explicit interfaces (e.g., define CacheUsage { cachedCount?:
number; total?: number } and StreamingChunk { type: "chunk" | "done"; data?:
string } or similar minimal shapes) and update function signatures
(sendUntilCacheRead and any other helpers that carry parsed JSON, usage, or SSE
chunks) and local variables (like last) to use these concrete types instead of
any; ensure the Promise return types and properties (json, usage, chunks) use
the new types and adjust assertions accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@apps/gateway/src/chat/chat.ts`:
- Around line 8911-8912: The 200 response schema currently only documents
cached_tokens but the actual non-streaming JSON from
transformResponseToOpenai(...) also returns cache_creation_tokens; update the
OpenAPI/response schema that defines the 200 payload (the object that lists
cached_tokens) to include cache_creation_tokens with the correct type/shape so
generated clients and validation match the runtime output, and ensure any
mention of cached_tokens/cache_creation_tokens in the response schema aligns
with the keys emitted by transformResponseToOpenai.
- Around line 7110-7118: The final usage payload emitted in the "[DONE]" fast
path omits prompt_tokens_details (cached_tokens and cache_creation_tokens)
because that block only runs when !doneSent; update the code that constructs the
final "[DONE]"/usage chunk (the fast-path that sets doneSent = true) to mirror
the same prompt_tokens_details structure used elsewhere: include cachedTokens
(or 0) and, when cacheCreationTokens !== null && cacheCreationTokens > 0,
include cache_creation_tokens. Locate the code paths handling the "[DONE]"
sentinel and the variables prompt_tokens_details, cachedTokens,
cacheCreationTokens and add the same conditional insertion so the final usage
chunk matches the streaming happy path.

In `@apps/gateway/src/native-anthropic-cache.e2e.ts`:
- Around line 372-374: The current construction of the usage object places
...usageChunk?.usage after prompt_tokens_details, causing any
prompt_tokens_details from the last SSE chunk to overwrite the accumulated
cachedTokens; change the merge order so the chunk usage is spread first and then
you explicitly set prompt_tokens_details with cached_tokens: cachedTokens (or
merge usageChunk?.usage?.prompt_tokens_details into a new object and then set
cached_tokens to cachedTokens) to ensure cachedTokens is never overwritten;
update both occurrences that reference usageChunk, prompt_tokens_details, and
cachedTokens (the shown block and the one at the other location noted)
accordingly.

---

Nitpick comments:
In `@apps/gateway/src/chat/tools/transform-response-to-openai.ts`:
- Around line 112-133: The current check uses hasCacheRead = cachedTokens !==
null which will treat cachedTokens === 0 as a cache read and cause
prompt_tokens_details to be emitted; change the predicate to require a positive
token count (e.g. hasCacheRead should use cachedTokens > 0) so
prompt_tokens_details is only included when there are cached tokens, keeping
hasCacheCreation (cacheCreationTokens > 0) consistent with cachedTokens; update
any logic referencing hasCacheRead and ensure prompt_tokens_details still
includes cached_tokens and conditionally cache_creation_tokens as before.

In `@apps/gateway/src/native-anthropic-cache.e2e.ts`:
- Around line 29-33: The helpers and handlers in this test use loose any types
(e.g., the sendUntilCacheRead return shape and variables like last) so replace
those any usages with small explicit interfaces (e.g., define CacheUsage {
cachedCount?: number; total?: number } and StreamingChunk { type: "chunk" |
"done"; data?: string } or similar minimal shapes) and update function
signatures (sendUntilCacheRead and any other helpers that carry parsed JSON,
usage, or SSE chunks) and local variables (like last) to use these concrete
types instead of any; ensure the Promise return types and properties (json,
usage, chunks) use the new types and adjust assertions accordingly.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: e1f5951e-a098-46ce-a579-543163d3ea4a

📥 Commits

Reviewing files that changed from the base of the PR and between 6b064a1 and 8bb49d7.

📒 Files selected for processing (10)
  • apps/gateway/src/anthropic/anthropic.ts
  • apps/gateway/src/chat-prompt-caching.e2e.ts
  • apps/gateway/src/chat/chat.ts
  • apps/gateway/src/chat/schemas/completions.ts
  • apps/gateway/src/chat/tools/extract-token-usage.ts
  • apps/gateway/src/chat/tools/parse-provider-response.ts
  • apps/gateway/src/chat/tools/transform-response-to-openai.ts
  • apps/gateway/src/chat/tools/transform-streaming-to-openai.ts
  • apps/gateway/src/native-anthropic-cache.e2e.ts
  • packages/actions/src/prepare-request-body.ts

Comment on lines +7110 to +7118
...((cachedTokens !== null ||
(cacheCreationTokens !== null &&
cacheCreationTokens > 0)) && {
prompt_tokens_details: {
cached_tokens: cachedTokens,
cached_tokens: cachedTokens ?? 0,
...(cacheCreationTokens !== null &&
cacheCreationTokens > 0 && {
cache_creation_tokens: cacheCreationTokens,
}),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Mirror prompt_tokens_details into the [DONE] fast path.

This block only runs when !doneSent, but the normal streaming happy path already emits a final usage chunk at Lines 5829-5876 and then sets doneSent = true at Lines 5912-5918. That means streams finishing through the upstream [DONE] sentinel still omit cache_creation_tokens/cached_tokens from their last usage payload.

🧩 Suggested fix
 									const finalUsageChunk = {
 										id: `chatcmpl-${Date.now()}`,
 										object: "chat.completion.chunk",
 										created: Math.floor(Date.now() / 1000),
 										model: usedModel,
 										choices: [
 											{
 												index: 0,
 												delta: {},
 												finish_reason: null,
 											},
 										],
 										usage: {
 											prompt_tokens: Math.max(
 												1,
 												streamingCosts.promptTokens ?? finalPromptTokens ?? 1,
 											),
 											completion_tokens:
 												streamingCosts.completionTokens ??
 												finalCompletionTokens ??
 												0,
 											total_tokens: Math.max(
 												1,
 												(streamingCosts.promptTokens ??
 													finalPromptTokens ??
 													0) +
 													(streamingCosts.completionTokens ??
 														finalCompletionTokens ??
 														0) +
 													(reasoningTokens ?? 0),
 											),
+											...((cachedTokens !== null ||
+												(cacheCreationTokens !== null &&
+													cacheCreationTokens > 0)) && {
+												prompt_tokens_details: {
+													cached_tokens: cachedTokens ?? 0,
+													...(cacheCreationTokens !== null &&
+														cacheCreationTokens > 0 && {
+															cache_creation_tokens: cacheCreationTokens,
+														}),
+												},
+											}),
 											...(shouldIncludeCosts && {
 												cost_usd_total: streamingCosts.totalCost,
 												cost_usd_input: streamingCosts.inputCost,
 												cost_usd_output: streamingCosts.outputCost,
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/gateway/src/chat/chat.ts` around lines 7110 - 7118, The final usage
payload emitted in the "[DONE]" fast path omits prompt_tokens_details
(cached_tokens and cache_creation_tokens) because that block only runs when
!doneSent; update the code that constructs the final "[DONE]"/usage chunk (the
fast-path that sets doneSent = true) to mirror the same prompt_tokens_details
structure used elsewhere: include cachedTokens (or 0) and, when
cacheCreationTokens !== null && cacheCreationTokens > 0, include
cache_creation_tokens. Locate the code paths handling the "[DONE]" sentinel and
the variables prompt_tokens_details, cachedTokens, cacheCreationTokens and add
the same conditional insertion so the final usage chunk matches the streaming
happy path.

Comment on lines +8911 to 8912
cacheCreationTokens,
);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Update the 200 response schema for cache_creation_tokens.

Passing this into transformResponseToOpenai(...) makes the non-streaming JSON payload expose a new field, but the declared schema at Lines 833-837 still only documents cached_tokens. That leaves generated clients and schema-based validation out of sync with the actual response.

📝 Suggested schema update
 							prompt_tokens_details: z
 								.object({
 									cached_tokens: z.number(),
+									cache_creation_tokens: z.number().optional(),
 								})
 								.optional(),
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
cacheCreationTokens,
);
prompt_tokens_details: z
.object({
cached_tokens: z.number(),
cache_creation_tokens: z.number().optional(),
})
.optional(),
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/gateway/src/chat/chat.ts` around lines 8911 - 8912, The 200 response
schema currently only documents cached_tokens but the actual non-streaming JSON
from transformResponseToOpenai(...) also returns cache_creation_tokens; update
the OpenAPI/response schema that defines the 200 payload (the object that lists
cached_tokens) to include cache_creation_tokens with the correct type/shape so
generated clients and validation match the runtime output, and ensure any
mention of cached_tokens/cache_creation_tokens in the response schema aligns
with the keys emitted by transformResponseToOpenai.

Comment on lines +372 to +374
usage: {
prompt_tokens_details: { cached_tokens: cachedTokens },
...usageChunk?.usage,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don't overwrite the synthesized cached_tokens value.

...usageChunk?.usage comes after prompt_tokens_details, so any prompt_tokens_details on the last SSE chunk replaces the max cachedTokens you just accumulated. That puts both streaming tests back at the mercy of the final chunk.

🛠️ Proposed fix
				return {
					status: res.status,
					json: {
						usage: {
-							prompt_tokens_details: { cached_tokens: cachedTokens },
-							...usageChunk?.usage,
+							...usageChunk?.usage,
+							prompt_tokens_details: {
+								...(usageChunk?.usage?.prompt_tokens_details ?? {}),
+								cached_tokens: cachedTokens,
+							},
						},
					},
				};

Also applies to: 447-449

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/gateway/src/native-anthropic-cache.e2e.ts` around lines 372 - 374, The
current construction of the usage object places ...usageChunk?.usage after
prompt_tokens_details, causing any prompt_tokens_details from the last SSE chunk
to overwrite the accumulated cachedTokens; change the merge order so the chunk
usage is spread first and then you explicitly set prompt_tokens_details with
cached_tokens: cachedTokens (or merge usageChunk?.usage?.prompt_tokens_details
into a new object and then set cached_tokens to cachedTokens) to ensure
cachedTokens is never overwritten; update both occurrences that reference
usageChunk, prompt_tokens_details, and cachedTokens (the shown block and the one
at the other location noted) accordingly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants