fix: return null stop_reason for content_filter by steebchen · Pull Request #1994 · theopenco/llmgateway

steebchen · 2026-04-08T18:30:40Z

Summary

determineStopReason was returning "end_turn" as the default for any unrecognised finish reason, including "content_filter"
This misled Anthropic API clients into thinking a content-filtered response completed normally
Now returns null for content_filter and any unknown finish reason, matching Anthropic's API behaviour

Test plan

Send a request that triggers content filtering through the Anthropic-compatible endpoint and verify stop_reason is null
Verify normal completions still return "end_turn"
Verify length → "max_tokens" and tool_calls → "tool_use" still work

🤖 Generated with Claude Code

Summary by CodeRabbit

Release Notes

New Features
- Added support for prompt caching with cache_control directives on system and message content blocks.
- Enhanced token usage tracking to include cache_creation_input_tokens and cache_read_input_tokens metrics in API responses.
- Prompt caching now supported across OpenAI-compatible and native provider endpoints.
Tests
- Added comprehensive end-to-end tests for prompt caching behavior validation.

Anthropic and Bedrock prompt caching pass-through was unreliable: client- supplied cache_control markers were partially handled, cache token usage was inconsistently surfaced through the openai-compat and native /v1/messages paths, and there was no e2e coverage for streaming or bedrock. This change: - Honors caller-supplied cache_control on text content parts in /v1/chat/completions (new optional schema field) and forwards them verbatim to Anthropic, mapping to cachePoint blocks for Bedrock. Falls back to the existing length-based heuristic when no marker is provided. - Preserves cache_control on system + message text blocks coming through the native /v1/messages endpoint, and surfaces cache_creation_input_tokens / cache_read_input_tokens on responses (always emitted, set to 0 when inapplicable, matching Anthropic's actual API). - Surfaces cache_creation_tokens alongside cached_tokens in prompt_tokens_details on the openai-compat response, including streaming chunks via a new normalizeAnthropicUsage helper. - Strips cache_control from text parts when routing to non-Anthropic / non-Bedrock providers so OpenAI/Google/etc. don't receive an unknown field. - Adds end-to-end tests covering: native /v1/messages with explicit cache_control, openai-compat for both Anthropic and Bedrock, streaming for both, and explicit cache_control on /v1/chat/completions. Each asserts cached_tokens > 0 after a retry-with-backoff (Anthropic prompt cache writes are eventually consistent), and where applicable asserts the per-token cached cost is strictly less than the per-token uncached cost within the same response. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Make cache_creation_input_tokens and cache_read_input_tokens optional with a default of 0 in anthropicResponseSchema. Anthropic emits these on caching-supported models today, but a non-optional schema would fail validation if an older Claude model, a beta endpoint, or a future API change ever omits them — turning a graceful "no caching info" into a 500. The downstream conversion code already handles 0 correctly. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

coderabbitai · 2026-04-08T18:30:58Z

Walkthrough

Added Anthropic prompt caching support across the gateway by introducing cache_creation_input_tokens and cache_read_input_tokens token tracking, extending the request schema to accept cache_control markers on text content blocks, and implementing provider-specific logic to transform and propagate cache-related fields through request/response pipelines while maintaining compatibility with OpenAI-format endpoints.

Changes

Cohort / File(s)	Summary
Anthropic Gateway Integration `apps/gateway/src/anthropic/anthropic.ts`	Extended usage schema with `cache_creation_input_tokens` and `cache_read_input_tokens` fields. Reworked token accounting: streaming now recomputes `input_tokens` via `max(0, prompt_tokens - cached_tokens - cache_creation_tokens)`; non-streaming derives cache metrics from `prompt_tokens_details`. Modified system/prompt message transformation to preserve `cache_control` as per-block markers instead of flattening. Changed `determineStopReason` to explicitly map `content_filter` to `null` and return `null` (instead of `"end_turn"`) for unrecognized finish reasons.
Schema and Request Handling `apps/gateway/src/chat/schemas/completions.ts`	Extended message content schema for `type: "text"` blocks to include optional `cache_control: { type: "ephemeral" }` property.
Token Usage Extraction `apps/gateway/src/chat/tools/extract-token-usage.ts`, `apps/gateway/src/chat/tools/parse-provider-response.ts`	Added `cacheCreationTokens` field extraction. For Anthropic, populated from `usage.cache_creation_input_tokens`; for AWS Bedrock, populated from `usage.cacheWriteInputTokens`. Adjusted prompt-token calculations to account for cache creation tokens in addition to existing cache-read tokens.
Response Transformation `apps/gateway/src/chat/tools/transform-response-to-openai.ts`, `apps/gateway/src/chat/tools/transform-streaming-to-openai.ts`	Updated `transformResponseToOpenai` to accept and propagate `cacheCreationTokens` parameter. Conditional `prompt_tokens_details` emission now depends on either cache reads or cache creation. Added `normalizeAnthropicUsage` helper to convert Anthropic streaming chunks to OpenAI-compatible usage shape with cached/creation token details.
Chat Route Handling `apps/gateway/src/chat/chat.ts`	Introduced `cacheCreationTokens` tracking and propagated it through streaming/non-streaming response transformations. Updated final streaming usage chunk construction to include `prompt_tokens_details` when cache activity is present, with conditional `cache_creation_tokens` emission.
Request Body Preparation `packages/actions/src/prepare-request-body.ts`	Added provider-specific preprocessing: strips Anthropic-style `cache_control` from non-Anthropic/Bedrock providers. For Anthropic, preserves per-block `cache_control` when explicitly set; otherwise applies length-based heuristic. For AWS Bedrock, maps `cache_control` to `cachePoint` markers (bounded by max limit).
E2E Tests `apps/gateway/src/chat-prompt-caching.e2e.ts`, `apps/gateway/src/native-anthropic-cache.e2e.ts`	Enhanced existing cache test with retry loop and exponential backoff to await cache population; validates `cached_tokens > 0` before success. Added comprehensive new test suite validating Anthropic native caching and OpenAI-compat routing: covers native `/v1/messages`, OpenAI-compat `/v1/chat/completions` via Anthropic and Bedrock, streaming variants, and cache token cost discounting.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

fix(gateway): enable Anthropic prompt caching #1379 — Both PRs modify Anthropic prompt-caching behavior (system field/cache_control handling) and overlapping files including prepare-request-body and token-usage parsing.
refactor(chat): add tools for provider transformations #853 — Both PRs extend provider transformation tooling (parse-provider-response, extract-token-usage, transform-streaming-to-openai, transform-response-to-openai) to add cache_control handling and cache token fields.
fix: include cached tokens in bedrock streaming #1845 — Both PRs modify transform-streaming-to-openai.ts to include cache read/write tokens in prompt_tokens and prompt_tokens_details for both Anthropic and Bedrock.

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Title check	⚠️ Warning	The title focuses on a single stop_reason fix (content_filter), but the changeset predominantly implements Anthropic/Bedrock prompt caching support across multiple files with 896 lines of changes; the stop_reason fix is a minor part.	Consider a more accurate title reflecting the main scope: 'feat: add Anthropic/Bedrock prompt caching support with cache_control' or similar, and address the stop_reason fix as a secondary change or separate PR.
Docstring Coverage	⚠️ Warning	Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/anthropic-stop-reason-content-filter

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Copilot

Pull request overview

This PR adjusts the Anthropic-compatible gateway behavior around stop reasons and prompt caching/usage reporting, aiming to better match Anthropic semantics while improving cache observability across OpenAI-compatible endpoints.

Changes:

Update Anthropic /v1/messages stop_reason mapping to return null for content_filter and unknown finish reasons.
Add/propagate prompt caching controls and usage accounting (including cache read/write token breakdown) through request preparation and OpenAI-compatible responses/streams.
Add and update e2e coverage for Anthropic/Bedrock prompt caching and cached-token surfacing (including retry/backoff for eventual consistency).

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
packages/actions/src/prepare-request-body.ts	Preserves/strips `cache_control` depending on provider; forwards explicit cache markers for Anthropic/Bedrock.
apps/gateway/src/native-anthropic-cache.e2e.ts	New e2e suite validating cache_control forwarding and cached-token usage surfacing across endpoints and streaming.
apps/gateway/src/chat/tools/transform-streaming-to-openai.ts	Normalizes Anthropic streaming `usage` into OpenAI-style usage with cache details.
apps/gateway/src/chat/tools/transform-response-to-openai.ts	Extends OpenAI usage shaping to optionally include cache creation token details.
apps/gateway/src/chat/tools/parse-provider-response.ts	Extracts `cacheCreationTokens` alongside cached read tokens for Anthropic/Bedrock.
apps/gateway/src/chat/tools/extract-token-usage.ts	Extracts `cacheCreationTokens` in streaming usage parsing for Anthropic/Bedrock.
apps/gateway/src/chat/schemas/completions.ts	Allows `cache_control` on OpenAI-compat text content parts.
apps/gateway/src/chat/chat.ts	Threads `cacheCreationTokens` through streaming aggregation and final OpenAI response transformation.
apps/gateway/src/chat-prompt-caching.e2e.ts	Adds retry/backoff to reduce flakiness from eventual cache consistency.
apps/gateway/src/anthropic/anthropic.ts	Preserves cache markers when translating Anthropic requests; fixes `stop_reason` mapping; includes cache token fields in responses.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-08T18:38:17Z

apps/gateway/src/anthropic/anthropic.ts

 function determineStopReason(
 	finishReason: string | undefined,
 ): "end_turn" | "max_tokens" | "stop_sequence" | "tool_use" | null {
 	switch (finishReason) {
 		case "stop":
 			return "end_turn";
 		case "length":
 			return "max_tokens";
 		case "tool_calls":
 			return "tool_use";
+		case "content_filter":
+			return null;
 		default:
-			return "end_turn";
+			return null;
 	}


PR metadata (title/description) focuses on stop_reason/content_filter, but this diff also introduces substantial prompt-caching behavior changes (cache_control forwarding/stripping), new cache token accounting (cache_creation_tokens), and a large new e2e suite. Please update the PR title/description to reflect the full scope, or split the caching/usage work into a separate PR so reviewers can assess each change independently.

Copilot · 2026-04-08T18:38:17Z

apps/gateway/src/chat/chat.ts

 										),
-										...(cachedTokens !== null && {
+										...((cachedTokens !== null ||
+											(cacheCreationTokens !== null &&
+												cacheCreationTokens > 0)) && {
 											prompt_tokens_details: {
-												cached_tokens: cachedTokens,
+												cached_tokens: cachedTokens ?? 0,
+												...(cacheCreationTokens !== null &&
+													cacheCreationTokens > 0 && {
+														cache_creation_tokens: cacheCreationTokens,
+													}),


This adds prompt_tokens_details.cache_creation_tokens to the streamed usage payload. The OpenAPI response schema in this file currently documents prompt_tokens_details as only { cached_tokens }, so the API docs/types will be out of sync with actual responses. Either extend the response schema to allow optional cache_creation_tokens (and propagate to any client types), or avoid emitting this new key in the OpenAI-compatible response.

Copilot · 2026-04-08T18:38:17Z

apps/gateway/src/chat/tools/transform-response-to-openai.ts

 function buildUsageObject(
 	promptTokens: number | null,
 	completionTokens: number | null,
 	totalTokens: number | null,
 	reasoningTokens: number | null,
 	cachedTokens: number | null,
 	costs: CostData | null,
 	showUpgradeMessage = false,
+	cacheCreationTokens: number | null = null,
 ) {
+	const hasCacheRead = cachedTokens !== null;
+	const hasCacheCreation =
+		cacheCreationTokens !== null && cacheCreationTokens > 0;
 	return {
 		prompt_tokens: Math.max(1, promptTokens ?? 1),
 		completion_tokens: completionTokens ?? 0,
 		total_tokens: (() => {
 			const fallbackTotal =
 				(promptTokens ?? 0) + (completionTokens ?? 0) + (reasoningTokens ?? 0);
 			return Math.max(1, totalTokens ?? fallbackTotal);
 		})(),
 		...(reasoningTokens !== null && {
 			reasoning_tokens: reasoningTokens,
 		}),
-		...(cachedTokens !== null && {
+		...((hasCacheRead || hasCacheCreation) && {
 			prompt_tokens_details: {
-				cached_tokens: cachedTokens,
+				cached_tokens: cachedTokens ?? 0,
+				...(hasCacheCreation && {
+					cache_creation_tokens: cacheCreationTokens,
+				}),
 			},
 		}),


buildUsageObject now conditionally emits prompt_tokens_details.cache_creation_tokens, which is new externally-visible response surface area. Please add/extend unit tests in transform-response-to-openai.spec.ts to cover the cases: (1) cache read only, (2) cache creation only, (3) both, and assert the exact shape of usage.prompt_tokens_details so regressions don’t silently change the API payload.

Copilot · 2026-04-08T18:38:18Z

apps/gateway/src/chat/tools/transform-streaming-to-openai.ts

+function normalizeAnthropicUsage(usage: any): any {
+	if (!usage || typeof usage !== "object") {
+		return null;
+	}
+	const inputTokens = usage.input_tokens ?? 0;
+	const cacheCreation = usage.cache_creation_input_tokens ?? 0;
+	const cacheRead = usage.cache_read_input_tokens ?? 0;
+	const outputTokens = usage.output_tokens ?? 0;
+	const promptTokens = inputTokens + cacheCreation + cacheRead;
+	const hasCacheInfo = cacheRead > 0 || cacheCreation > 0;
+	return {
+		prompt_tokens: promptTokens,
+		completion_tokens: outputTokens,
+		total_tokens: promptTokens + outputTokens,
+		...(hasCacheInfo && {
+			prompt_tokens_details: {
+				cached_tokens: cacheRead,
+				...(cacheCreation > 0 && { cache_creation_tokens: cacheCreation }),
+			},
+		}),
+	};
+}


normalizeAnthropicUsage changes the shape of usage for Anthropic streaming chunks (now emitting OpenAI-style prompt_tokens* fields and optional prompt_tokens_details.cache_creation_tokens). Please add unit tests in transform-streaming-to-openai.spec.ts covering at least one chunk with cache_read/cache_creation fields to ensure the normalized usage totals and the cache detail fields remain correct.

Copilot · 2026-04-08T18:38:18Z

apps/gateway/src/native-anthropic-cache.e2e.ts

+			// Sanity: input_tokens should be the *non-cached* input tokens, not
+			// the total. The cached portion lives in cache_read_input_tokens.
+			expect(second.json.usage.input_tokens).toBeLessThan(
+				second.json.usage.cache_read_input_tokens,
+			);


This assertion input_tokens < cache_read_input_tokens isn’t guaranteed by Anthropic’s semantics (uncached input can be larger than cached input depending on prompt composition), which can make this test flaky across tokenizers/model changes. Consider removing this inequality check or replacing it with a more robust invariant (e.g., only assert cache_read_input_tokens > 0 and that input_tokens is non-negative / stable across retries).

Suggested change

// Sanity: input_tokens should be the *non-cached* input tokens, not

// the total. The cached portion lives in cache_read_input_tokens.

expect(second.json.usage.input_tokens).toBeLessThan(

second.json.usage.cache_read_input_tokens,

);

// Sanity: input_tokens should reflect the non-cached portion only, so

// just verify it is a valid non-negative count without assuming any

// ordering relationship to cache_read_input_tokens.

expect(typeof second.json.usage.input_tokens).toBe("number");

expect(second.json.usage.input_tokens).toBeGreaterThanOrEqual(0);

Copilot · 2026-04-08T18:38:18Z

packages/actions/src/prepare-request-body.ts

+	// Strip Anthropic-style cache_control markers from text content parts when
+	// the resolved provider doesn't natively understand them. The Anthropic and
+	// AWS Bedrock branches below transform/forward cache_control on their own;
+	// every other provider receives the raw `processedMessages` and would
+	// otherwise pass an unknown field through to OpenAI/Google/etc., risking a
+	// 400 from strict providers and confusing logs from lenient ones.
+	const providerHandlesCacheControl =
+		usedProvider === "anthropic" || usedProvider === "aws-bedrock";
+	if (!providerHandlesCacheControl) {
+		processedMessages = processedMessages.map((m) => {
+			if (!Array.isArray(m.content)) {
+				return m;
+			}
+			let mutated = false;
+			const newContent = m.content.map((part) => {
+				const asRecord = part as unknown as Record<string, unknown>;
+				if (
+					asRecord &&
+					typeof asRecord === "object" &&
+					asRecord.type === "text" &&
+					asRecord.cache_control !== undefined
+				) {
+					mutated = true;
+					const { cache_control: _ignored, ...rest } = asRecord;
+					return rest as unknown as typeof part;
+				}
+				return part;
+			});
+			return mutated ? { ...m, content: newContent } : m;
+		});
+	}


New behavior strips cache_control for non-Anthropic/Bedrock providers and preserves per-block cache_control markers for Anthropic system messages when explicitly set. There are existing unit tests for heuristic cache_control, but none covering (a) explicit cache_control passthrough on system blocks, or (b) stripping cache_control when routing to other providers. Please add tests in prepare-request-body.spec.ts to lock in these behaviors.

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (2)

apps/gateway/src/chat/tools/transform-response-to-openai.ts (1)
112-133: Consider using cachedTokens > 0 for hasCacheRead check.

The current logic hasCacheRead = cachedTokens !== null will emit prompt_tokens_details even when cachedTokens is explicitly 0 (no cache read occurred). This differs from hasCacheCreation which checks > 0.

For consistency and to avoid emitting empty cache details, consider:
-	const hasCacheRead = cachedTokens !== null;
+	const hasCacheRead = cachedTokens !== null && cachedTokens > 0;
However, if the intent is to always include prompt_tokens_details when caching is enabled (even with zero reads), the current behavior is acceptable.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/gateway/src/chat/tools/transform-response-to-openai.ts` around lines 112
- 133, The current check uses hasCacheRead = cachedTokens !== null which will
treat cachedTokens === 0 as a cache read and cause prompt_tokens_details to be
emitted; change the predicate to require a positive token count (e.g.
hasCacheRead should use cachedTokens > 0) so prompt_tokens_details is only
included when there are cached tokens, keeping hasCacheCreation
(cacheCreationTokens > 0) consistent with cachedTokens; update any logic
referencing hasCacheRead and ensure prompt_tokens_details still includes
cached_tokens and conditionally cache_creation_tokens as before.
apps/gateway/src/native-anthropic-cache.e2e.ts (1)
29-33: Please replace the new any-typed helpers with small response shapes.

The added helpers carry parsed JSON, usage, and SSE chunks as any, which turns off type checking in the exact assertions this file is meant to protect. A tiny CacheUsage / StreamingChunk shape would be enough here.

As per coding guidelines, **/*.{ts,tsx}: Never use any or as any in TypeScript unless absolutely necessary.

Also applies to: 65-66, 90-95, 349-349, 425-425
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/gateway/src/native-anthropic-cache.e2e.ts` around lines 29 - 33, The
helpers and handlers in this test use loose any types (e.g., the
sendUntilCacheRead return shape and variables like last) so replace those any
usages with small explicit interfaces (e.g., define CacheUsage { cachedCount?:
number; total?: number } and StreamingChunk { type: "chunk" | "done"; data?:
string } or similar minimal shapes) and update function signatures
(sendUntilCacheRead and any other helpers that carry parsed JSON, usage, or SSE
chunks) and local variables (like last) to use these concrete types instead of
any; ensure the Promise return types and properties (json, usage, chunks) use
the new types and adjust assertions accordingly.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@apps/gateway/src/chat/chat.ts`:
- Around line 8911-8912: The 200 response schema currently only documents
cached_tokens but the actual non-streaming JSON from
transformResponseToOpenai(...) also returns cache_creation_tokens; update the
OpenAPI/response schema that defines the 200 payload (the object that lists
cached_tokens) to include cache_creation_tokens with the correct type/shape so
generated clients and validation match the runtime output, and ensure any
mention of cached_tokens/cache_creation_tokens in the response schema aligns
with the keys emitted by transformResponseToOpenai.
- Around line 7110-7118: The final usage payload emitted in the "[DONE]" fast
path omits prompt_tokens_details (cached_tokens and cache_creation_tokens)
because that block only runs when !doneSent; update the code that constructs the
final "[DONE]"/usage chunk (the fast-path that sets doneSent = true) to mirror
the same prompt_tokens_details structure used elsewhere: include cachedTokens
(or 0) and, when cacheCreationTokens !== null && cacheCreationTokens > 0,
include cache_creation_tokens. Locate the code paths handling the "[DONE]"
sentinel and the variables prompt_tokens_details, cachedTokens,
cacheCreationTokens and add the same conditional insertion so the final usage
chunk matches the streaming happy path.

In `@apps/gateway/src/native-anthropic-cache.e2e.ts`:
- Around line 372-374: The current construction of the usage object places
...usageChunk?.usage after prompt_tokens_details, causing any
prompt_tokens_details from the last SSE chunk to overwrite the accumulated
cachedTokens; change the merge order so the chunk usage is spread first and then
you explicitly set prompt_tokens_details with cached_tokens: cachedTokens (or
merge usageChunk?.usage?.prompt_tokens_details into a new object and then set
cached_tokens to cachedTokens) to ensure cachedTokens is never overwritten;
update both occurrences that reference usageChunk, prompt_tokens_details, and
cachedTokens (the shown block and the one at the other location noted)
accordingly.

---

Nitpick comments:
In `@apps/gateway/src/chat/tools/transform-response-to-openai.ts`:
- Around line 112-133: The current check uses hasCacheRead = cachedTokens !==
null which will treat cachedTokens === 0 as a cache read and cause
prompt_tokens_details to be emitted; change the predicate to require a positive
token count (e.g. hasCacheRead should use cachedTokens > 0) so
prompt_tokens_details is only included when there are cached tokens, keeping
hasCacheCreation (cacheCreationTokens > 0) consistent with cachedTokens; update
any logic referencing hasCacheRead and ensure prompt_tokens_details still
includes cached_tokens and conditionally cache_creation_tokens as before.

In `@apps/gateway/src/native-anthropic-cache.e2e.ts`:
- Around line 29-33: The helpers and handlers in this test use loose any types
(e.g., the sendUntilCacheRead return shape and variables like last) so replace
those any usages with small explicit interfaces (e.g., define CacheUsage {
cachedCount?: number; total?: number } and StreamingChunk { type: "chunk" |
"done"; data?: string } or similar minimal shapes) and update function
signatures (sendUntilCacheRead and any other helpers that carry parsed JSON,
usage, or SSE chunks) and local variables (like last) to use these concrete
types instead of any; ensure the Promise return types and properties (json,
usage, chunks) use the new types and adjust assertions accordingly.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: e1f5951e-a098-46ce-a579-543163d3ea4a

📥 Commits

Reviewing files that changed from the base of the PR and between 6b064a1 and 8bb49d7.

📒 Files selected for processing (10)

apps/gateway/src/anthropic/anthropic.ts
apps/gateway/src/chat-prompt-caching.e2e.ts
apps/gateway/src/chat/chat.ts
apps/gateway/src/chat/schemas/completions.ts
apps/gateway/src/chat/tools/extract-token-usage.ts
apps/gateway/src/chat/tools/parse-provider-response.ts
apps/gateway/src/chat/tools/transform-response-to-openai.ts
apps/gateway/src/chat/tools/transform-streaming-to-openai.ts
apps/gateway/src/native-anthropic-cache.e2e.ts
packages/actions/src/prepare-request-body.ts

coderabbitai · 2026-04-08T18:45:53Z

apps/gateway/src/chat/chat.ts

+										...((cachedTokens !== null ||
+											(cacheCreationTokens !== null &&
+												cacheCreationTokens > 0)) && {
 											prompt_tokens_details: {
-												cached_tokens: cachedTokens,
+												cached_tokens: cachedTokens ?? 0,
+												...(cacheCreationTokens !== null &&
+													cacheCreationTokens > 0 && {
+														cache_creation_tokens: cacheCreationTokens,
+													}),


⚠️ Potential issue | 🟠 Major

Mirror prompt_tokens_details into the [DONE] fast path.

This block only runs when !doneSent, but the normal streaming happy path already emits a final usage chunk at Lines 5829-5876 and then sets doneSent = true at Lines 5912-5918. That means streams finishing through the upstream [DONE] sentinel still omit cache_creation_tokens/cached_tokens from their last usage payload.

🧩 Suggested fix

const finalUsageChunk = { id: `chatcmpl-${Date.now()}`, object: "chat.completion.chunk", created: Math.floor(Date.now() / 1000), model: usedModel, choices: [ { index: 0, delta: {}, finish_reason: null, }, ], usage: { prompt_tokens: Math.max( 1, streamingCosts.promptTokens ?? finalPromptTokens ?? 1, ), completion_tokens: streamingCosts.completionTokens ?? finalCompletionTokens ?? 0, total_tokens: Math.max( 1, (streamingCosts.promptTokens ?? finalPromptTokens ?? 0) + (streamingCosts.completionTokens ?? finalCompletionTokens ?? 0) + (reasoningTokens ?? 0), ), + ...((cachedTokens !== null || + (cacheCreationTokens !== null && + cacheCreationTokens > 0)) && { + prompt_tokens_details: { + cached_tokens: cachedTokens ?? 0, + ...(cacheCreationTokens !== null && + cacheCreationTokens > 0 && { + cache_creation_tokens: cacheCreationTokens, + }), + }, + }), ...(shouldIncludeCosts && { cost_usd_total: streamingCosts.totalCost, cost_usd_input: streamingCosts.inputCost, cost_usd_output: streamingCosts.outputCost,

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@apps/gateway/src/chat/chat.ts` around lines 7110 - 7118, The final usage payload emitted in the "[DONE]" fast path omits prompt_tokens_details (cached_tokens and cache_creation_tokens) because that block only runs when !doneSent; update the code that constructs the final "[DONE]"/usage chunk (the fast-path that sets doneSent = true) to mirror the same prompt_tokens_details structure used elsewhere: include cachedTokens (or 0) and, when cacheCreationTokens !== null && cacheCreationTokens > 0, include cache_creation_tokens. Locate the code paths handling the "[DONE]" sentinel and the variables prompt_tokens_details, cachedTokens, cacheCreationTokens and add the same conditional insertion so the final usage chunk matches the streaming happy path.

coderabbitai · 2026-04-08T18:45:53Z

apps/gateway/src/chat/chat.ts

+		cacheCreationTokens,
 	);


⚠️ Potential issue | 🟡 Minor

Update the 200 response schema for cache_creation_tokens.

Passing this into transformResponseToOpenai(...) makes the non-streaming JSON payload expose a new field, but the declared schema at Lines 833-837 still only documents cached_tokens. That leaves generated clients and schema-based validation out of sync with the actual response.

📝 Suggested schema update

prompt_tokens_details: z .object({ cached_tokens: z.number(), + cache_creation_tokens: z.number().optional(), }) .optional(),

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

cacheCreationTokens,

);

prompt_tokens_details: z

.object({

cached_tokens: z.number(),

cache_creation_tokens: z.number().optional(),

})

.optional(),

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@apps/gateway/src/chat/chat.ts` around lines 8911 - 8912, The 200 response schema currently only documents cached_tokens but the actual non-streaming JSON from transformResponseToOpenai(...) also returns cache_creation_tokens; update the OpenAPI/response schema that defines the 200 payload (the object that lists cached_tokens) to include cache_creation_tokens with the correct type/shape so generated clients and validation match the runtime output, and ensure any mention of cached_tokens/cache_creation_tokens in the response schema aligns with the keys emitted by transformResponseToOpenai.

coderabbitai · 2026-04-08T18:45:53Z

apps/gateway/src/native-anthropic-cache.e2e.ts

+						usage: {
+							prompt_tokens_details: { cached_tokens: cachedTokens },
+							...usageChunk?.usage,


⚠️ Potential issue | 🟠 Major

Don't overwrite the synthesized cached_tokens value.

...usageChunk?.usage comes after prompt_tokens_details, so any prompt_tokens_details on the last SSE chunk replaces the max cachedTokens you just accumulated. That puts both streaming tests back at the mercy of the final chunk.

🛠️ Proposed fix

return { status: res.status, json: { usage: { - prompt_tokens_details: { cached_tokens: cachedTokens }, - ...usageChunk?.usage, + ...usageChunk?.usage, + prompt_tokens_details: { + ...(usageChunk?.usage?.prompt_tokens_details ?? {}), + cached_tokens: cachedTokens, + }, }, }, };

Also applies to: 447-449

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@apps/gateway/src/native-anthropic-cache.e2e.ts` around lines 372 - 374, The current construction of the usage object places ...usageChunk?.usage after prompt_tokens_details, causing any prompt_tokens_details from the last SSE chunk to overwrite the accumulated cachedTokens; change the merge order so the chunk usage is spread first and then you explicitly set prompt_tokens_details with cached_tokens: cachedTokens (or merge usageChunk?.usage?.prompt_tokens_details into a new object and then set cached_tokens to cachedTokens) to ensure cachedTokens is never overwritten; update both occurrences that reference usageChunk, prompt_tokens_details, and cachedTokens (the shown block and the one at the other location noted) accordingly.

romel lauron and others added 3 commits April 9, 2026 01:40

fix: return null stop_reason for content_filter

8bb49d7

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Copilot AI review requested due to automatic review settings April 8, 2026 18:30

Copilot started reviewing on behalf of steebchen April 8, 2026 18:31 View session

Copilot AI reviewed Apr 8, 2026

View reviewed changes

coderabbitai bot reviewed Apr 8, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: return null stop_reason for content_filter#1994

fix: return null stop_reason for content_filter#1994
steebchen wants to merge 3 commits intomainfrom
fix/anthropic-stop-reason-content-filter

steebchen commented Apr 8, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Apr 8, 2026 •

edited

Loading

❌ Failed checks (2 warnings)

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 8, 2026

Uh oh!

Copilot AI Apr 8, 2026

Uh oh!

Copilot AI Apr 8, 2026

Uh oh!

Copilot AI Apr 8, 2026

Uh oh!

Copilot AI Apr 8, 2026

Uh oh!

Copilot AI Apr 8, 2026

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Apr 8, 2026

Uh oh!

coderabbitai bot Apr 8, 2026

Uh oh!

coderabbitai bot Apr 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

-			// Sanity: input_tokens should be the *non-cached* input tokens, not
-			// the total. The cached portion lives in cache_read_input_tokens.
-			expect(second.json.usage.input_tokens).toBeLessThan(
-				second.json.usage.cache_read_input_tokens,
-			);
+			// Sanity: input_tokens should reflect the non-cached portion only, so
+			// just verify it is a valid non-negative count without assuming any
+			// ordering relationship to cache_read_input_tokens.
+			expect(typeof second.json.usage.input_tokens).toBe("number");
+			expect(second.json.usage.input_tokens).toBeGreaterThanOrEqual(0);

-		cacheCreationTokens,
-	);
+						prompt_tokens_details: z
+							.object({
+								cached_tokens: z.number(),
+								cache_creation_tokens: z.number().optional(),
+							})
+							.optional(),

Conversation

steebchen commented Apr 8, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai bot commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

❌ Failed checks (2 warnings)

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

steebchen commented Apr 8, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Apr 8, 2026 •

edited

Loading