Conversation
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
WalkthroughAdds a new Vitest end-to-end test suite for Changes
Sequence Diagram(s)mermaid Client->>Gateway: POST /v1/responses (input, tools?, headers) Estimated code review effort🎯 3 (Moderate) | ⏱️ ~22 minutes Possibly related PRs
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (2)
apps/gateway/src/responses.e2e.ts (2)
68-70: Remove the empty smoke test to keep signal-to-noise high.
test("empty")doesn’t validate behavior and adds runtime/maintenance noise.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@apps/gateway/src/responses.e2e.ts` around lines 68 - 70, Remove the useless smoke test named test("empty") from apps/gateway/src/responses.e2e.ts: locate the test block with the literal test("empty", ...) and delete it (or replace it with a meaningful assertion for responses e2e if coverage is required) so the test suite no longer contains a no-op test that adds noise.
112-245: Extend log assertions to multi-turn and tool-call flows.Only single-turn asserts persisted logging. Adding
validateLogByRequestIdfor both requests in multi-turn/tool-call flows would make the new coverage more complete for/v1/responses.Proposed diff
@@ expect(secondRes.status).toBe(200); const text = getOutputText(secondJson); expect(text.toLowerCase()).toContain("ada"); + await validateLogByRequestId(firstRequestId); + await validateLogByRequestId(secondRequestId); }, ); @@ expect( finalText.includes("sunny") || finalText.includes("72") || finalText.includes("weather"), ).toBe(true); + await validateLogByRequestId(firstRequestId); + await validateLogByRequestId(secondRequestId); }, ); });🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@apps/gateway/src/responses.e2e.ts` around lines 112 - 245, The tests "responses multi-turn $model" and "responses tool calls $model" currently only assert persisted logging for the final request; update both tests to call validateLogByRequestId for each requestId (firstRequestId and secondRequestId) after receiving and asserting the corresponding response. Specifically, in the multi-turn test add validateLogByRequestId(firstRequestId) after asserting firstRes and validateLogByRequestId(secondRequestId) after asserting secondRes; in the tool-call test add validateLogByRequestId(firstRequestId) after the firstRes assertions and validateLogByRequestId(secondRequestId) after the secondRes assertions. Use the existing helpers generateTestRequestId, postResponses and validateLogByRequestId to locate where to insert these checks.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@apps/gateway/src/responses.e2e.ts`:
- Around line 68-70: Remove the useless smoke test named test("empty") from
apps/gateway/src/responses.e2e.ts: locate the test block with the literal
test("empty", ...) and delete it (or replace it with a meaningful assertion for
responses e2e if coverage is required) so the test suite no longer contains a
no-op test that adds noise.
- Around line 112-245: The tests "responses multi-turn $model" and "responses
tool calls $model" currently only assert persisted logging for the final
request; update both tests to call validateLogByRequestId for each requestId
(firstRequestId and secondRequestId) after receiving and asserting the
corresponding response. Specifically, in the multi-turn test add
validateLogByRequestId(firstRequestId) after asserting firstRes and
validateLogByRequestId(secondRequestId) after asserting secondRes; in the
tool-call test add validateLogByRequestId(firstRequestId) after the firstRes
assertions and validateLogByRequestId(secondRequestId) after the secondRes
assertions. Use the existing helpers generateTestRequestId, postResponses and
validateLogByRequestId to locate where to insert these checks.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 3fe31f2d-ec6d-47dd-8be8-eed0825e6c4e
📒 Files selected for processing (1)
apps/gateway/src/responses.e2e.ts
There was a problem hiding this comment.
🧹 Nitpick comments (2)
apps/gateway/src/responses.e2e.ts (2)
84-86: Remove the placeholderemptytest.This assertion doesn’t validate gateway behavior and adds noise to E2E results.
🧹 Proposed cleanup
- test("empty", () => { - expect(true).toBe(true); - });🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@apps/gateway/src/responses.e2e.ts` around lines 84 - 86, Remove the placeholder test named "empty" in responses.e2e.ts: locate the test("empty", ...) block and delete it so the E2E suite contains only meaningful assertions; ensure no other tests depend on this placeholder and run the test suite to confirm no references remain to the "empty" test.
128-170: Add log-persistence assertions to multi-turn and tool-call flows too.Single-turn already verifies logging; applying the same check here will catch regressions in chained/tool execution paths.
📌 Proposed additions
expect(secondRes.status).toBe(200); const text = getOutputText(secondJson); expect(text.toLowerCase()).toContain("ada"); + + await validateLogByRequestId(firstRequestId); + await validateLogByRequestId(secondRequestId); }, ); @@ expect( finalText.includes("sunny") || finalText.includes("72") || finalText.includes("weather"), ).toBe(true); + + await validateLogByRequestId(firstRequestId); + await validateLogByRequestId(secondRequestId); }, );Also applies to: 173-260
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@apps/gateway/src/responses.e2e.ts` around lines 128 - 170, The multi-turn test (test.each over responsesTestModels) is missing the same log-persistence assertions used in the single-turn test; add the identical checks after the first and second response handling (i.e., after postResponses calls for firstRes and secondRes) to verify the request/response were persisted to logs. Locate the multi-turn flow around test.each, postResponses, generateTestRequestId and getOutputText and mirror the single-turn assertions (check stored log entries or API log endpoint and verify fields like request_id/response_id, model, input/output text) so both turns and tool-call flows assert persisted logs just like the single-turn test. Ensure the same helper/assertion utilities used in single-turn are reused to keep behavior consistent.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@apps/gateway/src/responses.e2e.ts`:
- Around line 84-86: Remove the placeholder test named "empty" in
responses.e2e.ts: locate the test("empty", ...) block and delete it so the E2E
suite contains only meaningful assertions; ensure no other tests depend on this
placeholder and run the test suite to confirm no references remain to the
"empty" test.
- Around line 128-170: The multi-turn test (test.each over responsesTestModels)
is missing the same log-persistence assertions used in the single-turn test; add
the identical checks after the first and second response handling (i.e., after
postResponses calls for firstRes and secondRes) to verify the request/response
were persisted to logs. Locate the multi-turn flow around test.each,
postResponses, generateTestRequestId and getOutputText and mirror the
single-turn assertions (check stored log entries or API log endpoint and verify
fields like request_id/response_id, model, input/output text) so both turns and
tool-call flows assert persisted logs just like the single-turn test. Ensure the
same helper/assertion utilities used in single-turn are reused to keep behavior
consistent.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 042a7285-a41f-4b11-aed4-b0e49867bca5
📒 Files selected for processing (1)
apps/gateway/src/responses.e2e.ts
Two bugs in the Responses API conversion layer caused multi-turn tool-call flows to fail on strict providers (deepseek, aws-bedrock, kimi, novita, alibaba) with "assistant message with tool_calls must be followed by tool messages": 1. convert-chat-to-responses emitted an empty `message` output item when providers returned content: "" alongside tool_calls. On replay this became a stray assistant message that separated tool_calls from the tool result. 2. convert-responses-to-chat split tool_calls and accompanying assistant text into two separate chat messages. They must live on the same assistant message in chat completions; otherwise the tool result no longer immediately follows the tool_calls assistant. Also denylists bytedance/gpt-oss-120b in the e2e tool-call test: that adapter does not emit stable tool_call ids across requests, which is a separate provider-level issue. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
apps/gateway/src/responses.e2e.ts (1)
41-43: Filter the denylist before sampling one model per provider.With the current order, a denylisted first model removes that provider from this suite entirely instead of falling back to another eligible model later in
toolCallModels. Filtering first keeps the intended “one eligible model per provider” behavior stable as the list evolves.♻️ Possible change
-const responsesToolCallModels = oneModelPerProvider(toolCallModels).filter( - (m) => !TOOL_CALL_DENYLIST.has(m.model), -); +const responsesToolCallModels = oneModelPerProvider( + toolCallModels.filter((m) => !TOOL_CALL_DENYLIST.has(m.model)), +);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@apps/gateway/src/responses.e2e.ts` around lines 41 - 43, The current code calls oneModelPerProvider(toolCallModels) then filters out denylisted models, which can drop a provider entirely if its first sampled model is denylisted; instead filter toolCallModels by TOOL_CALL_DENYLIST first and then call oneModelPerProvider so each provider is sampled from eligible models. Update the expression that constructs responsesToolCallModels to pass oneModelPerProvider a pre-filtered array (use toolCallModels.filter(m => !TOOL_CALL_DENYLIST.has(m.model))) so the sampling happens only over allowed models.apps/gateway/src/responses/tools/convert-chat-to-responses.ts (1)
115-125: Please pin this branch with a direct regression test.The code change looks right, but the new E2E suite does not guarantee a provider will emit the exact
content: ""+tool_callsshape that broke replay here. A small unit test aroundconvertChatResponseToResponses()would make this regression deterministic.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@apps/gateway/src/responses/tools/convert-chat-to-responses.ts` around lines 115 - 125, Add a deterministic unit test to pin this regression by exercising convertChatResponseToResponses(): construct a chat response where an assistant message has content: "" (or whitespace) alongside tool_calls and assert that the resulting responses do NOT include an empty assistant message item (i.e., the empty/whitespace content is skipped) and that tool_calls and tool result linkage (previous_response_id behavior) remain intact; ensure the test covers both empty string and whitespace-only content cases so future E2E changes cannot reintroduce the bug.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@apps/gateway/src/responses.e2e.ts`:
- Around line 227-246: The test currently only checks types but not that
identifiers are present; before calling generateTestRequestId() and
postResponses(...) assert that firstJson.id and fnCall.call_id are non-empty
(e.g., use expect(firstJson.id).toBeTruthy() and
expect(fnCall?.call_id).toBeTruthy()) so the replay path is validated; add these
assertions just after obtaining fnCall from getFunctionCall(...) and before
using those values in the second request to ensure the test fails if the IDs are
missing.
---
Nitpick comments:
In `@apps/gateway/src/responses.e2e.ts`:
- Around line 41-43: The current code calls oneModelPerProvider(toolCallModels)
then filters out denylisted models, which can drop a provider entirely if its
first sampled model is denylisted; instead filter toolCallModels by
TOOL_CALL_DENYLIST first and then call oneModelPerProvider so each provider is
sampled from eligible models. Update the expression that constructs
responsesToolCallModels to pass oneModelPerProvider a pre-filtered array (use
toolCallModels.filter(m => !TOOL_CALL_DENYLIST.has(m.model))) so the sampling
happens only over allowed models.
In `@apps/gateway/src/responses/tools/convert-chat-to-responses.ts`:
- Around line 115-125: Add a deterministic unit test to pin this regression by
exercising convertChatResponseToResponses(): construct a chat response where an
assistant message has content: "" (or whitespace) alongside tool_calls and
assert that the resulting responses do NOT include an empty assistant message
item (i.e., the empty/whitespace content is skipped) and that tool_calls and
tool result linkage (previous_response_id behavior) remain intact; ensure the
test covers both empty string and whitespace-only content cases so future E2E
changes cannot reintroduce the bug.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: ba29803f-3353-4fbb-8758-a931027e95ec
📒 Files selected for processing (3)
apps/gateway/src/responses.e2e.tsapps/gateway/src/responses/tools/convert-chat-to-responses.tsapps/gateway/src/responses/tools/convert-responses-to-chat.ts
✅ Files skipped from review due to trivial changes (1)
- apps/gateway/src/responses/tools/convert-responses-to-chat.ts
| expect(firstRes.status).toBe(200); | ||
| const fnCall = getFunctionCall(firstJson); | ||
| expect(fnCall).toBeDefined(); | ||
| expect(fnCall?.name).toBe("get_weather"); | ||
| expect(typeof fnCall?.call_id).toBe("string"); | ||
| expect(typeof fnCall?.arguments).toBe("string"); | ||
| const parsedArgs = JSON.parse(fnCall?.arguments ?? "{}"); | ||
| expect(typeof parsedArgs.city).toBe("string"); | ||
| expect(parsedArgs.city.toLowerCase()).toContain("san francisco"); | ||
|
|
||
| const secondRequestId = generateTestRequestId(); | ||
| const secondRes = await postResponses( | ||
| { | ||
| model, | ||
| previous_response_id: firstJson.id, | ||
| input: [ | ||
| { | ||
| type: "function_call_output", | ||
| call_id: fnCall?.call_id, | ||
| output: "72F and sunny", |
There was a problem hiding this comment.
Assert the replay IDs before sending turn two.
Right now this test only proves the identifiers are typed as strings. If firstJson.id or fnCall.call_id is empty/missing, the second request no longer validates the replay path as strongly as the test name suggests. Please assert both values are non-empty before using them.
🧪 Possible change
expect(firstRes.status).toBe(200);
+ expect(typeof firstJson.id).toBe("string");
+ expect(firstJson.id.startsWith("resp_")).toBe(true);
const fnCall = getFunctionCall(firstJson);
expect(fnCall).toBeDefined();
expect(fnCall?.name).toBe("get_weather");
expect(typeof fnCall?.call_id).toBe("string");
+ expect(fnCall?.call_id).toBeTruthy();
expect(typeof fnCall?.arguments).toBe("string");📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| expect(firstRes.status).toBe(200); | |
| const fnCall = getFunctionCall(firstJson); | |
| expect(fnCall).toBeDefined(); | |
| expect(fnCall?.name).toBe("get_weather"); | |
| expect(typeof fnCall?.call_id).toBe("string"); | |
| expect(typeof fnCall?.arguments).toBe("string"); | |
| const parsedArgs = JSON.parse(fnCall?.arguments ?? "{}"); | |
| expect(typeof parsedArgs.city).toBe("string"); | |
| expect(parsedArgs.city.toLowerCase()).toContain("san francisco"); | |
| const secondRequestId = generateTestRequestId(); | |
| const secondRes = await postResponses( | |
| { | |
| model, | |
| previous_response_id: firstJson.id, | |
| input: [ | |
| { | |
| type: "function_call_output", | |
| call_id: fnCall?.call_id, | |
| output: "72F and sunny", | |
| expect(firstRes.status).toBe(200); | |
| expect(typeof firstJson.id).toBe("string"); | |
| expect(firstJson.id.startsWith("resp_")).toBe(true); | |
| const fnCall = getFunctionCall(firstJson); | |
| expect(fnCall).toBeDefined(); | |
| expect(fnCall?.name).toBe("get_weather"); | |
| expect(typeof fnCall?.call_id).toBe("string"); | |
| expect(fnCall?.call_id).toBeTruthy(); | |
| expect(typeof fnCall?.arguments).toBe("string"); | |
| const parsedArgs = JSON.parse(fnCall?.arguments ?? "{}"); | |
| expect(typeof parsedArgs.city).toBe("string"); | |
| expect(parsedArgs.city.toLowerCase()).toContain("san francisco"); | |
| const secondRequestId = generateTestRequestId(); | |
| const secondRes = await postResponses( | |
| { | |
| model, | |
| previous_response_id: firstJson.id, | |
| input: [ | |
| { | |
| type: "function_call_output", | |
| call_id: fnCall?.call_id, | |
| output: "72F and sunny", |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@apps/gateway/src/responses.e2e.ts` around lines 227 - 246, The test currently
only checks types but not that identifiers are present; before calling
generateTestRequestId() and postResponses(...) assert that firstJson.id and
fnCall.call_id are non-empty (e.g., use expect(firstJson.id).toBeTruthy() and
expect(fnCall?.call_id).toBeTruthy()) so the replay path is validated; add these
assertions just after obtaining fnCall from getFunctionCall(...) and before
using those values in the second request to ensure the test fails if the IDs are
missing.
Summary
POST /v1/responses(the OpenAI-compatible Responses API)openai/gpt-4o-miniandanthropic/claude-haiku-4-5:resp_*id, output text, usage tokens, log persistedprevious_response_idto verify the gateway reconstructs prior context (model recalls a name across turns)function_call, we send back afunction_call_outputviaprevious_response_id, assert the final answer uses the tool resultTest plan
**/*.e2e.tsglob, no workflow changes needed)🤖 Generated with Claude Code
Summary by CodeRabbit