chat: fix Responses API tool-chaining, context limits, and streaming correctness#1080
Conversation
- Fix streaming tool continuations: track currentLegResponseId from StreamingResponseCreatedUpdate and clone options with that ID before calling ExecuteFunctionCallAsync, so the continuation correctly references the response that produced the function call (not the outer conversation root). - Fix non-streaming loop token waste: after each tool-call response, advance PreviousResponseId to the new responseId and reset responseItems to [] so subsequent iterations only send new tool outputs — not the growing full history. This prevents quadratic token re-processing. - Remove redundant functionCallItem from continuation inputs: when using previous_response_id chaining, the function call is already stored server-side in the referenced response; only the FunctionCallOutputResponseItem is new content. - Add ConversationContextLimitExceededException domain exception and IsContextLengthError helper to detect context window overflow from ClientResultException (HTTP 400 with context_length_exceeded text). - Handle context limit in ChatController: both /api/chat/message and /api/chat/stream return 400 with errorCode 'context_limit_exceeded'. - Add CloneOptionsWithPreviousResponseId helper to avoid mutating shared options across concurrent streaming legs. - Add ParseToolArguments helper to deduplicate JSON argument parsing in both streaming and non-streaming paths. - Fix OPENAI001 pragma: extend the disable region to cover the full loop body including the ClientResult<OpenAIResponse> declaration.
C1 - Two-phase streaming tool call pattern: buffer all function calls during the stream, execute sequentially after stream completes, then make ONE continuation with all outputs. Previously, N parallel tool calls created N forked conversation branches. C2 - Context length errors in streaming path: added RethrowContextLengthErrors helper wrapper that puts try/catch only around MoveNextAsync, keeping yield return outside. C# CS1626 prohibits yield inside try/catch, so the wrapper is the only valid pattern for exception remapping in async iterators. H1 - Restore stream-write body in second catch in StreamMessage: the SSE error event was lost in a prior edit. H2 - Clone all relevant ResponseCreationOptions fields in CloneOptionsWithPreviousResponseId: EndUserId, MaxOutputTokenCount, TextOptions, TruncationMode, ParallelToolCallsEnabled, StoredOutputEnabled, ToolChoice, Temperature, TopP, ServiceTier, Metadata. H3 - IsContextLengthError now parses structured JSON error code via TryExtractErrorCode: handles status 413, code token_limit_exceeded, and context_length_exceeded. Falls back to message substring match. Bonus: wire EndUserId into CreateResponseOptionsAsync (SDK supports it). Add LogMcpToolArgumentParseError logger message.
- strictModeEnabled: false for MCP tool registration — external MCP schemas are not guaranteed to satisfy OpenAI strict-mode constraints (all properties required, additionalProperties: false everywhere). A single non-conforming schema with strictModeEnabled: true would cause a 400 at registration time for ALL tools in the request. - Guard ParseToolArguments in non-streaming path — JsonException on malformed model-generated arguments would previously bubble up as a 500. Now caught identically to the streaming path: logs warning and sends an error FunctionCallOutputResponseItem so the model can recover without aborting the loop. - Guard currentLegResponseId null before tool-call continuation — if the API never emits StreamingResponseCreatedUpdate, proceeding with PreviousResponseId = null causes a confusing 400 from the API. Now throws InvalidOperationException with a diagnostic message. - Log original ConversationContextLimitExceededException in mid-stream handler — previously wrapped in a new InvalidOperationException, losing the stack trace, PreviousResponseId, and inner exception.
Replace bare catch with catch (Exception) in two best-effort error-handling sites, with clarifying comments explaining why a broad catch is correct. - TryExtractErrorCode: Azure SDK response internals are outside our control; a narrow typed-catch list risks a latent crash in production error handling. catch (Exception) is intentional. - StreamMessage SSE error-write guards (both): Kestrel throws IOException (ConnectionResetException), OperationCanceledException, or ObjectDisposedException on abrupt client disconnect. The bot's OperationCanceledException-only suggestion would miss IOException and cause an unhandled exception that masks the original error. Both catch blocks updated (bot only flagged one, the other is identical).
There was a problem hiding this comment.
Pull request overview
Updates the Essential C# chat integration to align more closely with the Azure OpenAI Responses API’s tool-calling and streaming behaviors, focusing on correctness under tool-chaining, context-limit failures, and mid-stream error signaling.
Changes:
- Reworked streaming tool-call handling to buffer tool calls and submit a single continuation with all tool outputs (avoids branched conversations).
- Added a domain exception for context-window overflows and mapped Responses API context-limit failures (streaming + non-streaming) to that exception.
- Improved controller behavior for context-limit errors (400 before response start; SSE error event mid-stream) and ensured responseId ownership is recorded for each streaming leg.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| EssentialCSharp.Chat.Shared/Services/AIChatService.cs | Fixes streaming tool-call continuation logic; adds context-limit error detection/remapping; improves continuation option cloning and tool-argument parsing robustness. |
| EssentialCSharp.Chat.Shared/Services/ConversationContextLimitExceededException.cs | Introduces a domain exception used to signal conversation context-window overflow (optionally includes previous response id). |
| EssentialCSharp.Web/Controllers/ChatController.cs | Handles the new domain exception in both non-streaming and streaming endpoints, including mid-stream SSE error writes and logging. |
| catch (Exception) | ||
| { | ||
| // Best-effort extraction inside an error handler — catch all to guarantee we never | ||
| // throw from error-parsing logic. The Azure SDK response internals are outside our | ||
| // control and the exception surface can change across SDK versions. | ||
| } |
There was a problem hiding this comment.
Keeping catch (Exception) here intentionally — this is a documented architectural decision, not a code smell.
Two reasons the proposed typed list can't safely replace it:
-
FormatExceptionis factually incorrect —JsonDocument.Parse()throwsJsonException(notFormatException),TryGetProperty()returns aboolwithout throwing, andGetString()throwsInvalidOperationException.FormatExceptionwould be dead code, which undermines confidence in the rest of the list. -
The list is inherently incomplete —
ex.GetRawResponse()reaches into Azure SDK internals whose exception surface can change across SDK versions (as our comment notes). The realistic surface includes at minimumJsonException,InvalidOperationException,ObjectDisposedException, and potentiallyArgumentExceptionon certain buffer edge cases — but that's exactly the problem: we can't enumerate it reliably. A typed list that looks complete but isn't is more dangerous than an honest broad catch.
The method's contract is "never throw from inside an error handler." catch (Exception) is the correct tool for that contract.
| catch (Exception) | ||
| { | ||
| // Best-effort write to an already-streaming response. Kestrel can throw | ||
| // IOException (connection reset), OperationCanceledException, or | ||
| // ObjectDisposedException on abrupt client disconnect — swallow all to | ||
| // avoid masking the original exception. | ||
| } |
There was a problem hiding this comment.
Keeping catch (Exception) here intentionally — the when filter would introduce a concrete regression risk in this pattern.
This inner try/catch sits inside an outer catch block that has already logged the original exception. If the when filter doesn't match (e.g., because Kestrel introduces a new transport exception type in a future ASP.NET Core version — e.g., ConnectionAbortedException, HttpProtocolException are both real types in newer ASP.NET Core), the unhandled exception propagates out of the outer catch block. In C#, that unwinds the stack with the new exception, potentially:
- Losing the SSE response mid-frame
- Creating confusing telemetry where the original (already-logged) exception is overshadowed
"Best-effort" here means unconditional: any write failure must be swallowed. The comment enumerates IOException, OperationCanceledException, and ObjectDisposedException as examples of what Kestrel throws, not as a complete contract. That's the right level of documentation. The when filter would enforce a false completeness claim.
| catch (Exception) | ||
| { | ||
| // Best-effort write to an already-streaming response. Kestrel can throw | ||
| // IOException (connection reset), OperationCanceledException, or | ||
| // ObjectDisposedException on abrupt client disconnect — swallow all to | ||
| // avoid masking the original exception. | ||
| } |
There was a problem hiding this comment.
Same reasoning as the other SSE block above — keeping catch (Exception) intentionally.
This pattern appears in both mid-stream catch blocks. If the when filter misses an exception type (e.g., HttpProtocolException, ConnectionAbortedException, or any other type ASP.NET Core may throw on connection teardown in future versions), the exception would propagate out of the outer catch (Exception ex) block, masking the original error that was already logged. The best-effort semantics require unconditional swallow. catch (Exception) is correct here.
Summary
Three rounds of review against the Azure OpenAI Responses API docs (using Claude Opus 4.6 and GPT-5.5 as parallel review subagents) identified and fixed several correctness bugs and robustness gaps in the AI chat implementation.
Critical fixes
C1 — Parallel streaming tool calls created forked conversation branches
Previously,
ProcessStreamingUpdatesAsyncfired a separate API continuation per tool call inline during the stream. With N parallel tool calls this created N independent conversation branches — each continuation only saw its own tool result, not the others.Fix: Two-phase pattern — buffer all
FunctionCallResponseItems during the stream, execute them sequentially after the stream completes, then make one continuation with all outputs. Matches the non-streaming behavior.C2 — Context length errors not caught in streaming path
ClientResultExceptionfrom a context-window overflow propagates throughawait foreachbut C# prohibitsyield returninside atry/catchblock (CS1626).Fix: Added
RethrowContextLengthErrorshelper that putstry/catchonly aroundMoveNextAsync, withyield returnoutside — the only CS1626-safe pattern for exception remapping in async iterators.High fixes
H1 — SSE error event lost on mid-stream exception
The second
catch (Exception ex)block inStreamMessagewas missing itstry { await Response.WriteAsync(...) } catch {}body, so clients received no signal when the stream was interrupted.H2 —
CloneOptionsWithPreviousResponseIdcopied only 3 of 14 fieldsTool-call continuation legs inherited
Instructions,PreviousResponseId, andTools— but notEndUserId,Temperature,TopP,TruncationMode,MaxOutputTokenCount,TextOptions,ParallelToolCallsEnabled,StoredOutputEnabled,ToolChoice,ServiceTier, orMetadata. Continuations were silently using defaults for all of those.H3 — Context length detection relied on substring matching only
Fix:
IsContextLengthErrornow parses the structured JSON error body viaTryExtractErrorCodeand checks theerror.codefield first (context_length_exceeded,token_limit_exceeded). Also handles HTTP 413. Falls back to message substring matching.Medium fixes (from second review pass)
ParseToolArgumentsunguarded in non-streaming path — aJsonExceptionon malformed model-generated arguments would bubble up as a 500. Now caught identically to the streaming path, returning an errorFunctionCallOutputResponseItemso the model can recover.strictModeEnabled: falsefor MCP tool registration — the MCP .NET SDK generates schemas that don't satisfy OpenAI strict-mode constraints (optional params not inrequired, noadditionalProperties: false, nullable union types). A single non-conforming schema withstrictModeEnabled: truecauses a 400 that blocks all tools.currentLegResponseIdnull guard — if the API never emitsStreamingResponseCreatedUpdate, the continuation would fire withPreviousResponseId = null, causing a confusing 400. Now throwsInvalidOperationExceptionwith a diagnostic message.InvalidOperationException, discarding the original stack trace andPreviousResponseId.Bonus
EndUserIdwired up —CreateResponseOptionsAsyncnow setsoptions.EndUserId(the SDK exposes it; the previous code discarded it). Enables Azure OpenAI abuse monitoring and Defender prompt-shield correlation.LogMcpToolArgumentParseErrorlogger message added.ConversationContextLimitExceededExceptiondomain exception withPreviousResponseIdproperty.Files changed
EssentialCSharp.Chat.Shared/Services/AIChatService.csEssentialCSharp.Chat.Shared/Services/ConversationContextLimitExceededException.csEssentialCSharp.Web/Controllers/ChatController.cs