fix: return null stop_reason for content_filter#1994
Conversation
Anthropic and Bedrock prompt caching pass-through was unreliable: client- supplied cache_control markers were partially handled, cache token usage was inconsistently surfaced through the openai-compat and native /v1/messages paths, and there was no e2e coverage for streaming or bedrock. This change: - Honors caller-supplied cache_control on text content parts in /v1/chat/completions (new optional schema field) and forwards them verbatim to Anthropic, mapping to cachePoint blocks for Bedrock. Falls back to the existing length-based heuristic when no marker is provided. - Preserves cache_control on system + message text blocks coming through the native /v1/messages endpoint, and surfaces cache_creation_input_tokens / cache_read_input_tokens on responses (always emitted, set to 0 when inapplicable, matching Anthropic's actual API). - Surfaces cache_creation_tokens alongside cached_tokens in prompt_tokens_details on the openai-compat response, including streaming chunks via a new normalizeAnthropicUsage helper. - Strips cache_control from text parts when routing to non-Anthropic / non-Bedrock providers so OpenAI/Google/etc. don't receive an unknown field. - Adds end-to-end tests covering: native /v1/messages with explicit cache_control, openai-compat for both Anthropic and Bedrock, streaming for both, and explicit cache_control on /v1/chat/completions. Each asserts cached_tokens > 0 after a retry-with-backoff (Anthropic prompt cache writes are eventually consistent), and where applicable asserts the per-token cached cost is strictly less than the per-token uncached cost within the same response. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Make cache_creation_input_tokens and cache_read_input_tokens optional with a default of 0 in anthropicResponseSchema. Anthropic emits these on caching-supported models today, but a non-optional schema would fail validation if an older Claude model, a beta endpoint, or a future API change ever omits them — turning a graceful "no caching info" into a 500. The downstream conversion code already handles 0 correctly. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
WalkthroughAdded Anthropic prompt caching support across the gateway by introducing Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Pull request overview
This PR adjusts the Anthropic-compatible gateway behavior around stop reasons and prompt caching/usage reporting, aiming to better match Anthropic semantics while improving cache observability across OpenAI-compatible endpoints.
Changes:
- Update Anthropic
/v1/messagesstop_reasonmapping to returnnullforcontent_filterand unknown finish reasons. - Add/propagate prompt caching controls and usage accounting (including cache read/write token breakdown) through request preparation and OpenAI-compatible responses/streams.
- Add and update e2e coverage for Anthropic/Bedrock prompt caching and cached-token surfacing (including retry/backoff for eventual consistency).
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| packages/actions/src/prepare-request-body.ts | Preserves/strips cache_control depending on provider; forwards explicit cache markers for Anthropic/Bedrock. |
| apps/gateway/src/native-anthropic-cache.e2e.ts | New e2e suite validating cache_control forwarding and cached-token usage surfacing across endpoints and streaming. |
| apps/gateway/src/chat/tools/transform-streaming-to-openai.ts | Normalizes Anthropic streaming usage into OpenAI-style usage with cache details. |
| apps/gateway/src/chat/tools/transform-response-to-openai.ts | Extends OpenAI usage shaping to optionally include cache creation token details. |
| apps/gateway/src/chat/tools/parse-provider-response.ts | Extracts cacheCreationTokens alongside cached read tokens for Anthropic/Bedrock. |
| apps/gateway/src/chat/tools/extract-token-usage.ts | Extracts cacheCreationTokens in streaming usage parsing for Anthropic/Bedrock. |
| apps/gateway/src/chat/schemas/completions.ts | Allows cache_control on OpenAI-compat text content parts. |
| apps/gateway/src/chat/chat.ts | Threads cacheCreationTokens through streaming aggregation and final OpenAI response transformation. |
| apps/gateway/src/chat-prompt-caching.e2e.ts | Adds retry/backoff to reduce flakiness from eventual cache consistency. |
| apps/gateway/src/anthropic/anthropic.ts | Preserves cache markers when translating Anthropic requests; fixes stop_reason mapping; includes cache token fields in responses. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| function determineStopReason( | ||
| finishReason: string | undefined, | ||
| ): "end_turn" | "max_tokens" | "stop_sequence" | "tool_use" | null { | ||
| switch (finishReason) { | ||
| case "stop": | ||
| return "end_turn"; | ||
| case "length": | ||
| return "max_tokens"; | ||
| case "tool_calls": | ||
| return "tool_use"; | ||
| case "content_filter": | ||
| return null; | ||
| default: | ||
| return "end_turn"; | ||
| return null; | ||
| } |
There was a problem hiding this comment.
PR metadata (title/description) focuses on stop_reason/content_filter, but this diff also introduces substantial prompt-caching behavior changes (cache_control forwarding/stripping), new cache token accounting (cache_creation_tokens), and a large new e2e suite. Please update the PR title/description to reflect the full scope, or split the caching/usage work into a separate PR so reviewers can assess each change independently.
| ), | ||
| ...(cachedTokens !== null && { | ||
| ...((cachedTokens !== null || | ||
| (cacheCreationTokens !== null && | ||
| cacheCreationTokens > 0)) && { | ||
| prompt_tokens_details: { | ||
| cached_tokens: cachedTokens, | ||
| cached_tokens: cachedTokens ?? 0, | ||
| ...(cacheCreationTokens !== null && | ||
| cacheCreationTokens > 0 && { | ||
| cache_creation_tokens: cacheCreationTokens, | ||
| }), |
There was a problem hiding this comment.
This adds prompt_tokens_details.cache_creation_tokens to the streamed usage payload. The OpenAPI response schema in this file currently documents prompt_tokens_details as only { cached_tokens }, so the API docs/types will be out of sync with actual responses. Either extend the response schema to allow optional cache_creation_tokens (and propagate to any client types), or avoid emitting this new key in the OpenAI-compatible response.
| function buildUsageObject( | ||
| promptTokens: number | null, | ||
| completionTokens: number | null, | ||
| totalTokens: number | null, | ||
| reasoningTokens: number | null, | ||
| cachedTokens: number | null, | ||
| costs: CostData | null, | ||
| showUpgradeMessage = false, | ||
| cacheCreationTokens: number | null = null, | ||
| ) { | ||
| const hasCacheRead = cachedTokens !== null; | ||
| const hasCacheCreation = | ||
| cacheCreationTokens !== null && cacheCreationTokens > 0; | ||
| return { | ||
| prompt_tokens: Math.max(1, promptTokens ?? 1), | ||
| completion_tokens: completionTokens ?? 0, | ||
| total_tokens: (() => { | ||
| const fallbackTotal = | ||
| (promptTokens ?? 0) + (completionTokens ?? 0) + (reasoningTokens ?? 0); | ||
| return Math.max(1, totalTokens ?? fallbackTotal); | ||
| })(), | ||
| ...(reasoningTokens !== null && { | ||
| reasoning_tokens: reasoningTokens, | ||
| }), | ||
| ...(cachedTokens !== null && { | ||
| ...((hasCacheRead || hasCacheCreation) && { | ||
| prompt_tokens_details: { | ||
| cached_tokens: cachedTokens, | ||
| cached_tokens: cachedTokens ?? 0, | ||
| ...(hasCacheCreation && { | ||
| cache_creation_tokens: cacheCreationTokens, | ||
| }), | ||
| }, | ||
| }), |
There was a problem hiding this comment.
buildUsageObject now conditionally emits prompt_tokens_details.cache_creation_tokens, which is new externally-visible response surface area. Please add/extend unit tests in transform-response-to-openai.spec.ts to cover the cases: (1) cache read only, (2) cache creation only, (3) both, and assert the exact shape of usage.prompt_tokens_details so regressions don’t silently change the API payload.
| function normalizeAnthropicUsage(usage: any): any { | ||
| if (!usage || typeof usage !== "object") { | ||
| return null; | ||
| } | ||
| const inputTokens = usage.input_tokens ?? 0; | ||
| const cacheCreation = usage.cache_creation_input_tokens ?? 0; | ||
| const cacheRead = usage.cache_read_input_tokens ?? 0; | ||
| const outputTokens = usage.output_tokens ?? 0; | ||
| const promptTokens = inputTokens + cacheCreation + cacheRead; | ||
| const hasCacheInfo = cacheRead > 0 || cacheCreation > 0; | ||
| return { | ||
| prompt_tokens: promptTokens, | ||
| completion_tokens: outputTokens, | ||
| total_tokens: promptTokens + outputTokens, | ||
| ...(hasCacheInfo && { | ||
| prompt_tokens_details: { | ||
| cached_tokens: cacheRead, | ||
| ...(cacheCreation > 0 && { cache_creation_tokens: cacheCreation }), | ||
| }, | ||
| }), | ||
| }; | ||
| } |
There was a problem hiding this comment.
normalizeAnthropicUsage changes the shape of usage for Anthropic streaming chunks (now emitting OpenAI-style prompt_tokens* fields and optional prompt_tokens_details.cache_creation_tokens). Please add unit tests in transform-streaming-to-openai.spec.ts covering at least one chunk with cache_read/cache_creation fields to ensure the normalized usage totals and the cache detail fields remain correct.
| // Sanity: input_tokens should be the *non-cached* input tokens, not | ||
| // the total. The cached portion lives in cache_read_input_tokens. | ||
| expect(second.json.usage.input_tokens).toBeLessThan( | ||
| second.json.usage.cache_read_input_tokens, | ||
| ); |
There was a problem hiding this comment.
This assertion input_tokens < cache_read_input_tokens isn’t guaranteed by Anthropic’s semantics (uncached input can be larger than cached input depending on prompt composition), which can make this test flaky across tokenizers/model changes. Consider removing this inequality check or replacing it with a more robust invariant (e.g., only assert cache_read_input_tokens > 0 and that input_tokens is non-negative / stable across retries).
| // Sanity: input_tokens should be the *non-cached* input tokens, not | |
| // the total. The cached portion lives in cache_read_input_tokens. | |
| expect(second.json.usage.input_tokens).toBeLessThan( | |
| second.json.usage.cache_read_input_tokens, | |
| ); | |
| // Sanity: input_tokens should reflect the non-cached portion only, so | |
| // just verify it is a valid non-negative count without assuming any | |
| // ordering relationship to cache_read_input_tokens. | |
| expect(typeof second.json.usage.input_tokens).toBe("number"); | |
| expect(second.json.usage.input_tokens).toBeGreaterThanOrEqual(0); |
| // Strip Anthropic-style cache_control markers from text content parts when | ||
| // the resolved provider doesn't natively understand them. The Anthropic and | ||
| // AWS Bedrock branches below transform/forward cache_control on their own; | ||
| // every other provider receives the raw `processedMessages` and would | ||
| // otherwise pass an unknown field through to OpenAI/Google/etc., risking a | ||
| // 400 from strict providers and confusing logs from lenient ones. | ||
| const providerHandlesCacheControl = | ||
| usedProvider === "anthropic" || usedProvider === "aws-bedrock"; | ||
| if (!providerHandlesCacheControl) { | ||
| processedMessages = processedMessages.map((m) => { | ||
| if (!Array.isArray(m.content)) { | ||
| return m; | ||
| } | ||
| let mutated = false; | ||
| const newContent = m.content.map((part) => { | ||
| const asRecord = part as unknown as Record<string, unknown>; | ||
| if ( | ||
| asRecord && | ||
| typeof asRecord === "object" && | ||
| asRecord.type === "text" && | ||
| asRecord.cache_control !== undefined | ||
| ) { | ||
| mutated = true; | ||
| const { cache_control: _ignored, ...rest } = asRecord; | ||
| return rest as unknown as typeof part; | ||
| } | ||
| return part; | ||
| }); | ||
| return mutated ? { ...m, content: newContent } : m; | ||
| }); | ||
| } |
There was a problem hiding this comment.
New behavior strips cache_control for non-Anthropic/Bedrock providers and preserves per-block cache_control markers for Anthropic system messages when explicitly set. There are existing unit tests for heuristic cache_control, but none covering (a) explicit cache_control passthrough on system blocks, or (b) stripping cache_control when routing to other providers. Please add tests in prepare-request-body.spec.ts to lock in these behaviors.
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (2)
apps/gateway/src/chat/tools/transform-response-to-openai.ts (1)
112-133: Consider usingcachedTokens > 0forhasCacheReadcheck.The current logic
hasCacheRead = cachedTokens !== nullwill emitprompt_tokens_detailseven whencachedTokensis explicitly0(no cache read occurred). This differs fromhasCacheCreationwhich checks> 0.For consistency and to avoid emitting empty cache details, consider:
- const hasCacheRead = cachedTokens !== null; + const hasCacheRead = cachedTokens !== null && cachedTokens > 0;However, if the intent is to always include
prompt_tokens_detailswhen caching is enabled (even with zero reads), the current behavior is acceptable.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@apps/gateway/src/chat/tools/transform-response-to-openai.ts` around lines 112 - 133, The current check uses hasCacheRead = cachedTokens !== null which will treat cachedTokens === 0 as a cache read and cause prompt_tokens_details to be emitted; change the predicate to require a positive token count (e.g. hasCacheRead should use cachedTokens > 0) so prompt_tokens_details is only included when there are cached tokens, keeping hasCacheCreation (cacheCreationTokens > 0) consistent with cachedTokens; update any logic referencing hasCacheRead and ensure prompt_tokens_details still includes cached_tokens and conditionally cache_creation_tokens as before.apps/gateway/src/native-anthropic-cache.e2e.ts (1)
29-33: Please replace the newany-typed helpers with small response shapes.The added helpers carry parsed JSON, usage, and SSE chunks as
any, which turns off type checking in the exact assertions this file is meant to protect. A tinyCacheUsage/StreamingChunkshape would be enough here.As per coding guidelines,
**/*.{ts,tsx}: Never useanyoras anyin TypeScript unless absolutely necessary.Also applies to: 65-66, 90-95, 349-349, 425-425
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@apps/gateway/src/native-anthropic-cache.e2e.ts` around lines 29 - 33, The helpers and handlers in this test use loose any types (e.g., the sendUntilCacheRead return shape and variables like last) so replace those any usages with small explicit interfaces (e.g., define CacheUsage { cachedCount?: number; total?: number } and StreamingChunk { type: "chunk" | "done"; data?: string } or similar minimal shapes) and update function signatures (sendUntilCacheRead and any other helpers that carry parsed JSON, usage, or SSE chunks) and local variables (like last) to use these concrete types instead of any; ensure the Promise return types and properties (json, usage, chunks) use the new types and adjust assertions accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@apps/gateway/src/chat/chat.ts`:
- Around line 8911-8912: The 200 response schema currently only documents
cached_tokens but the actual non-streaming JSON from
transformResponseToOpenai(...) also returns cache_creation_tokens; update the
OpenAPI/response schema that defines the 200 payload (the object that lists
cached_tokens) to include cache_creation_tokens with the correct type/shape so
generated clients and validation match the runtime output, and ensure any
mention of cached_tokens/cache_creation_tokens in the response schema aligns
with the keys emitted by transformResponseToOpenai.
- Around line 7110-7118: The final usage payload emitted in the "[DONE]" fast
path omits prompt_tokens_details (cached_tokens and cache_creation_tokens)
because that block only runs when !doneSent; update the code that constructs the
final "[DONE]"/usage chunk (the fast-path that sets doneSent = true) to mirror
the same prompt_tokens_details structure used elsewhere: include cachedTokens
(or 0) and, when cacheCreationTokens !== null && cacheCreationTokens > 0,
include cache_creation_tokens. Locate the code paths handling the "[DONE]"
sentinel and the variables prompt_tokens_details, cachedTokens,
cacheCreationTokens and add the same conditional insertion so the final usage
chunk matches the streaming happy path.
In `@apps/gateway/src/native-anthropic-cache.e2e.ts`:
- Around line 372-374: The current construction of the usage object places
...usageChunk?.usage after prompt_tokens_details, causing any
prompt_tokens_details from the last SSE chunk to overwrite the accumulated
cachedTokens; change the merge order so the chunk usage is spread first and then
you explicitly set prompt_tokens_details with cached_tokens: cachedTokens (or
merge usageChunk?.usage?.prompt_tokens_details into a new object and then set
cached_tokens to cachedTokens) to ensure cachedTokens is never overwritten;
update both occurrences that reference usageChunk, prompt_tokens_details, and
cachedTokens (the shown block and the one at the other location noted)
accordingly.
---
Nitpick comments:
In `@apps/gateway/src/chat/tools/transform-response-to-openai.ts`:
- Around line 112-133: The current check uses hasCacheRead = cachedTokens !==
null which will treat cachedTokens === 0 as a cache read and cause
prompt_tokens_details to be emitted; change the predicate to require a positive
token count (e.g. hasCacheRead should use cachedTokens > 0) so
prompt_tokens_details is only included when there are cached tokens, keeping
hasCacheCreation (cacheCreationTokens > 0) consistent with cachedTokens; update
any logic referencing hasCacheRead and ensure prompt_tokens_details still
includes cached_tokens and conditionally cache_creation_tokens as before.
In `@apps/gateway/src/native-anthropic-cache.e2e.ts`:
- Around line 29-33: The helpers and handlers in this test use loose any types
(e.g., the sendUntilCacheRead return shape and variables like last) so replace
those any usages with small explicit interfaces (e.g., define CacheUsage {
cachedCount?: number; total?: number } and StreamingChunk { type: "chunk" |
"done"; data?: string } or similar minimal shapes) and update function
signatures (sendUntilCacheRead and any other helpers that carry parsed JSON,
usage, or SSE chunks) and local variables (like last) to use these concrete
types instead of any; ensure the Promise return types and properties (json,
usage, chunks) use the new types and adjust assertions accordingly.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: e1f5951e-a098-46ce-a579-543163d3ea4a
📒 Files selected for processing (10)
apps/gateway/src/anthropic/anthropic.tsapps/gateway/src/chat-prompt-caching.e2e.tsapps/gateway/src/chat/chat.tsapps/gateway/src/chat/schemas/completions.tsapps/gateway/src/chat/tools/extract-token-usage.tsapps/gateway/src/chat/tools/parse-provider-response.tsapps/gateway/src/chat/tools/transform-response-to-openai.tsapps/gateway/src/chat/tools/transform-streaming-to-openai.tsapps/gateway/src/native-anthropic-cache.e2e.tspackages/actions/src/prepare-request-body.ts
| ...((cachedTokens !== null || | ||
| (cacheCreationTokens !== null && | ||
| cacheCreationTokens > 0)) && { | ||
| prompt_tokens_details: { | ||
| cached_tokens: cachedTokens, | ||
| cached_tokens: cachedTokens ?? 0, | ||
| ...(cacheCreationTokens !== null && | ||
| cacheCreationTokens > 0 && { | ||
| cache_creation_tokens: cacheCreationTokens, | ||
| }), |
There was a problem hiding this comment.
Mirror prompt_tokens_details into the [DONE] fast path.
This block only runs when !doneSent, but the normal streaming happy path already emits a final usage chunk at Lines 5829-5876 and then sets doneSent = true at Lines 5912-5918. That means streams finishing through the upstream [DONE] sentinel still omit cache_creation_tokens/cached_tokens from their last usage payload.
🧩 Suggested fix
const finalUsageChunk = {
id: `chatcmpl-${Date.now()}`,
object: "chat.completion.chunk",
created: Math.floor(Date.now() / 1000),
model: usedModel,
choices: [
{
index: 0,
delta: {},
finish_reason: null,
},
],
usage: {
prompt_tokens: Math.max(
1,
streamingCosts.promptTokens ?? finalPromptTokens ?? 1,
),
completion_tokens:
streamingCosts.completionTokens ??
finalCompletionTokens ??
0,
total_tokens: Math.max(
1,
(streamingCosts.promptTokens ??
finalPromptTokens ??
0) +
(streamingCosts.completionTokens ??
finalCompletionTokens ??
0) +
(reasoningTokens ?? 0),
),
+ ...((cachedTokens !== null ||
+ (cacheCreationTokens !== null &&
+ cacheCreationTokens > 0)) && {
+ prompt_tokens_details: {
+ cached_tokens: cachedTokens ?? 0,
+ ...(cacheCreationTokens !== null &&
+ cacheCreationTokens > 0 && {
+ cache_creation_tokens: cacheCreationTokens,
+ }),
+ },
+ }),
...(shouldIncludeCosts && {
cost_usd_total: streamingCosts.totalCost,
cost_usd_input: streamingCosts.inputCost,
cost_usd_output: streamingCosts.outputCost,🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@apps/gateway/src/chat/chat.ts` around lines 7110 - 7118, The final usage
payload emitted in the "[DONE]" fast path omits prompt_tokens_details
(cached_tokens and cache_creation_tokens) because that block only runs when
!doneSent; update the code that constructs the final "[DONE]"/usage chunk (the
fast-path that sets doneSent = true) to mirror the same prompt_tokens_details
structure used elsewhere: include cachedTokens (or 0) and, when
cacheCreationTokens !== null && cacheCreationTokens > 0, include
cache_creation_tokens. Locate the code paths handling the "[DONE]" sentinel and
the variables prompt_tokens_details, cachedTokens, cacheCreationTokens and add
the same conditional insertion so the final usage chunk matches the streaming
happy path.
| cacheCreationTokens, | ||
| ); |
There was a problem hiding this comment.
Update the 200 response schema for cache_creation_tokens.
Passing this into transformResponseToOpenai(...) makes the non-streaming JSON payload expose a new field, but the declared schema at Lines 833-837 still only documents cached_tokens. That leaves generated clients and schema-based validation out of sync with the actual response.
📝 Suggested schema update
prompt_tokens_details: z
.object({
cached_tokens: z.number(),
+ cache_creation_tokens: z.number().optional(),
})
.optional(),📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| cacheCreationTokens, | |
| ); | |
| prompt_tokens_details: z | |
| .object({ | |
| cached_tokens: z.number(), | |
| cache_creation_tokens: z.number().optional(), | |
| }) | |
| .optional(), |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@apps/gateway/src/chat/chat.ts` around lines 8911 - 8912, The 200 response
schema currently only documents cached_tokens but the actual non-streaming JSON
from transformResponseToOpenai(...) also returns cache_creation_tokens; update
the OpenAPI/response schema that defines the 200 payload (the object that lists
cached_tokens) to include cache_creation_tokens with the correct type/shape so
generated clients and validation match the runtime output, and ensure any
mention of cached_tokens/cache_creation_tokens in the response schema aligns
with the keys emitted by transformResponseToOpenai.
| usage: { | ||
| prompt_tokens_details: { cached_tokens: cachedTokens }, | ||
| ...usageChunk?.usage, |
There was a problem hiding this comment.
Don't overwrite the synthesized cached_tokens value.
...usageChunk?.usage comes after prompt_tokens_details, so any prompt_tokens_details on the last SSE chunk replaces the max cachedTokens you just accumulated. That puts both streaming tests back at the mercy of the final chunk.
🛠️ Proposed fix
return {
status: res.status,
json: {
usage: {
- prompt_tokens_details: { cached_tokens: cachedTokens },
- ...usageChunk?.usage,
+ ...usageChunk?.usage,
+ prompt_tokens_details: {
+ ...(usageChunk?.usage?.prompt_tokens_details ?? {}),
+ cached_tokens: cachedTokens,
+ },
},
},
};Also applies to: 447-449
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@apps/gateway/src/native-anthropic-cache.e2e.ts` around lines 372 - 374, The
current construction of the usage object places ...usageChunk?.usage after
prompt_tokens_details, causing any prompt_tokens_details from the last SSE chunk
to overwrite the accumulated cachedTokens; change the merge order so the chunk
usage is spread first and then you explicitly set prompt_tokens_details with
cached_tokens: cachedTokens (or merge usageChunk?.usage?.prompt_tokens_details
into a new object and then set cached_tokens to cachedTokens) to ensure
cachedTokens is never overwritten; update both occurrences that reference
usageChunk, prompt_tokens_details, and cachedTokens (the shown block and the one
at the other location noted) accordingly.
Summary
determineStopReasonwas returning"end_turn"as the default for any unrecognised finish reason, including"content_filter"nullforcontent_filterand any unknown finish reason, matching Anthropic's API behaviourTest plan
stop_reasonisnull"end_turn"length→"max_tokens"andtool_calls→"tool_use"still work🤖 Generated with Claude Code
Summary by CodeRabbit
Release Notes
New Features
cache_controldirectives on system and message content blocks.cache_creation_input_tokensandcache_read_input_tokensmetrics in API responses.Tests