Skip to content

fix: replace goroutine+channel with DoDeadline in MakeRequestWithContext#1925

Open
davidrudduck wants to merge 3 commits intomaximhq:mainfrom
davidrudduck:fix-goroutine-leak
Open

fix: replace goroutine+channel with DoDeadline in MakeRequestWithContext#1925
davidrudduck wants to merge 3 commits intomaximhq:mainfrom
davidrudduck:fix-goroutine-leak

Conversation

@davidrudduck
Copy link
Contributor

Summary

Fixes a goroutine leak and potential use-after-free in MakeRequestWithContext by replacing the goroutine+channel pattern with client.DoDeadline.

The previous implementation spawned a goroutine calling client.Do(req, resp) and used select to race it against ctx.Done(). When the context cancelled first, the goroutine continued running and could write to resp after the caller had already returned — causing use-after-free of the response buffer.

Changes

  • Replace client.Do in a goroutine with client.DoDeadline(req, resp, deadline) — synchronous, no goroutine needed
  • Derive deadline from ctx.Deadline(), falling back to a 5-minute safety-net timeout
  • Preserve all existing error classification (timeout, cancellation, DNS, network errors)
  • Add ctx.Err() check after DoDeadline returns to detect in-flight cancellation

Type of change

  • Bug fix
  • Feature
  • Refactor
  • Documentation
  • Chore/CI

Affected areas

  • Core (Go)
  • Transports (HTTP)
  • Providers/Integrations
  • Plugins
  • UI (Next.js)
  • Docs

How to test

(cd core && go build ./... && go vet ./providers/utils/...)

The fix is in the request execution path — all existing provider tests exercise this code:

make test-core PROVIDER=openai

To verify goroutine leak is fixed, send requests with short timeouts and confirm goroutine count stabilizes (previously it would grow unbounded under cancellation).

Screenshots/Recordings

N/A

Breaking changes

  • Yes
  • No

Security considerations

Eliminates use-after-free of fasthttp response buffers when contexts cancel during in-flight requests.

Checklist

  • I read docs/contributing/README.md and followed the guidelines
  • I added/updated tests where appropriate
  • I updated documentation where needed
  • I verified builds succeed (Go and UI)
  • I verified the CI pipeline passes locally if applicable

Fixes goroutine leak and potential use-after-free when contexts are
cancelled before the fasthttp request completes. The previous pattern
spawned a goroutine that continued writing to resp after the caller
returned, which could corrupt the response buffer.

Uses client.DoDeadline with the context's deadline instead. Falls back
to a 5-minute safety-net timeout when no deadline is set (rare in
practice since callers always set deadlines via NetworkConfig).
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 5, 2026

📝 Walkthrough

Summary by CodeRabbit

  • Bug Fixes
    • Improved error handling and reporting for network request failures, timeouts, and request cancellations.
    • Enhanced request timing measurement accuracy.

Walkthrough

Replaces goroutine-based request execution with a deadline-aware approach: uses client.DoDeadline when the context has a deadline, otherwise falls back to client.Do. Always reports latency and maps request failures to structured BifrostError types (cancellation, timeouts, network, or request errors).

Changes

Cohort / File(s) Summary
Request execution & error mapping
core/providers/utils/utils.go
Removes in-flight goroutine + errChan pattern. Uses context-derived deadlines with client.DoDeadline or client.Do. Computes latency immediately after request. Reclassifies errors into RequestCancelled, timeouts, network errors (DNS/OpError/net.Error), or provider request errors while preserving the public function signature and always returning latency.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 I hopped through code where goroutines ran free,
Now deadlines guide each careful plea,
Latency counted, errors sorted with cheer,
No more stray channels, the path is clear,
A rabbit nods—this flow feels right to me! 🥕✨

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The PR title accurately and specifically describes the main change: replacing a goroutine+channel pattern with DoDeadline in MakeRequestWithContext, which is the core fix in the changeset.
Description check ✅ Passed The PR description follows the template structure with all major sections completed: summary (explaining the bug and solution), changes (listing key modifications), type of change (bug fix selected), affected areas (Core Go selected), testing instructions, breaking changes (marked No), security considerations, and checklist verification.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

// defaultRequestTimeout is a safety-net fallback used only when the context has no deadline.
// In normal operation, callers always set a deadline via NetworkConfig.DefaultRequestTimeoutInSeconds
// (default 30s), so this 5-minute ceiling should rarely be reached.
const defaultRequestTimeout = 5 * time.Minute
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We dont need to/want to hardcode this. As we have a few users who have 10 mins as timeout in some cases. Can we nuke this somehow?

@akshaydeo
Copy link
Contributor

❤️ for the PR @davidrudduck

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@core/providers/utils/utils.go`:
- Around line 125-146: The two cancellation pathways in the error handling (the
ctx.Err() branch and the errors.Is(err, context.Canceled) branch) are confusing
and set differing IsBifrostError values; either consolidate into a single
canonical cancellation handling or add a clarifying comment and ensure both
branches set the same IsBifrostError value consistent with orchestrator retry
semantics (see BifrostError and executeRequestWithRetries); also change the
message built with fmt.Sprintf("Request cancelled or timed out by context: %v",
ctx.Err()) to a lowercase, punctuation-free error string per Go conventions
(e.g., "request cancelled or timed out by context: %v").

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 91af49c4-aeec-450d-9400-55d16f9fd31a

📥 Commits

Reviewing files that changed from the base of the PR and between 9353a23 and e5f9755.

📒 Files selected for processing (1)
  • core/providers/utils/utils.go

Comment on lines +125 to +146
if err != nil {
// Check if the context was cancelled while the request was in flight
if ctx.Err() != nil {
return latency, &schemas.BifrostError{
IsBifrostError: true,
Error: &schemas.ErrorField{
Type: schemas.Ptr(schemas.RequestCancelled),
Message: fmt.Sprintf("Request cancelled or timed out by context: %v", ctx.Err()),
Error: ctx.Err(),
},
}
// Check for DNS lookup and network errors after timeout checks
var opErr *net.OpError
var dnsErr *net.DNSError
if errors.As(err, &opErr) || errors.As(err, &dnsErr) {
return latency, &schemas.BifrostError{
IsBifrostError: false,
Error: &schemas.ErrorField{
Message: schemas.ErrProviderNetworkError,
Error: err,
},
}
}
if errors.Is(err, context.Canceled) {
return latency, &schemas.BifrostError{
IsBifrostError: false,
Error: &schemas.ErrorField{
Type: schemas.Ptr(schemas.RequestCancelled),
Message: schemas.ErrRequestCancelled,
Error: err,
},
}
// The HTTP request itself failed (e.g., connection error, fasthttp timeout).
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Potential redundancy and inconsistent IsBifrostError values.

Two observations:

  1. Redundant cancellation check: If ctx.Err() != nil (line 127) returns a RequestCancelled error, the subsequent errors.Is(err, context.Canceled) check (line 137) may be unreachable for the same scenario. However, there's a subtle difference: ctx.Err() catches context cancellation that happened during the request, while errors.Is(err, context.Canceled) catches when fasthttp itself returns context.Canceled (e.g., from a wrapped context). If this distinction is intentional, consider adding a comment to clarify.

  2. IsBifrostError inconsistency: Line 129 sets IsBifrostError: true while line 139 sets IsBifrostError: false for similar cancellation scenarios. Per the BifrostError semantics and the orchestrator's retry logic in executeRequestWithRetries, this affects how errors are processed. Verify this is intentional—typically cancellation errors should have consistent classification.

  3. Error message capitalization: Line 132's message "Request cancelled or timed out by context: %v" starts with uppercase. Go convention requires error strings to be lowercase with no trailing punctuation.

Suggested fix for error message capitalization
 			Error: &schemas.ErrorField{
 				Type:    schemas.Ptr(schemas.RequestCancelled),
-				Message: fmt.Sprintf("Request cancelled or timed out by context: %v", ctx.Err()),
+				Message: fmt.Sprintf("request cancelled or timed out by context: %v", ctx.Err()),
 				Error:   ctx.Err(),
 			},

As per coding guidelines: "Error strings must be lowercase with no trailing punctuation per Go conventions."

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if err != nil {
// Check if the context was cancelled while the request was in flight
if ctx.Err() != nil {
return latency, &schemas.BifrostError{
IsBifrostError: true,
Error: &schemas.ErrorField{
Type: schemas.Ptr(schemas.RequestCancelled),
Message: fmt.Sprintf("Request cancelled or timed out by context: %v", ctx.Err()),
Error: ctx.Err(),
},
}
// Check for DNS lookup and network errors after timeout checks
var opErr *net.OpError
var dnsErr *net.DNSError
if errors.As(err, &opErr) || errors.As(err, &dnsErr) {
return latency, &schemas.BifrostError{
IsBifrostError: false,
Error: &schemas.ErrorField{
Message: schemas.ErrProviderNetworkError,
Error: err,
},
}
}
if errors.Is(err, context.Canceled) {
return latency, &schemas.BifrostError{
IsBifrostError: false,
Error: &schemas.ErrorField{
Type: schemas.Ptr(schemas.RequestCancelled),
Message: schemas.ErrRequestCancelled,
Error: err,
},
}
// The HTTP request itself failed (e.g., connection error, fasthttp timeout).
}
if err != nil {
// Check if the context was cancelled while the request was in flight
if ctx.Err() != nil {
return latency, &schemas.BifrostError{
IsBifrostError: true,
Error: &schemas.ErrorField{
Type: schemas.Ptr(schemas.RequestCancelled),
Message: fmt.Sprintf("request cancelled or timed out by context: %v", ctx.Err()),
Error: ctx.Err(),
},
}
}
if errors.Is(err, context.Canceled) {
return latency, &schemas.BifrostError{
IsBifrostError: false,
Error: &schemas.ErrorField{
Type: schemas.Ptr(schemas.RequestCancelled),
Message: schemas.ErrRequestCancelled,
Error: err,
},
}
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@core/providers/utils/utils.go` around lines 125 - 146, The two cancellation
pathways in the error handling (the ctx.Err() branch and the errors.Is(err,
context.Canceled) branch) are confusing and set differing IsBifrostError values;
either consolidate into a single canonical cancellation handling or add a
clarifying comment and ensure both branches set the same IsBifrostError value
consistent with orchestrator retry semantics (see BifrostError and
executeRequestWithRetries); also change the message built with
fmt.Sprintf("Request cancelled or timed out by context: %v", ctx.Err()) to a
lowercase, punctuation-free error string per Go conventions (e.g., "request
cancelled or timed out by context: %v").

Per maintainer feedback, remove the 5-minute hardcoded timeout.
When the context has a deadline, use DoDeadline (goroutine-safe).
When no deadline is set, fall back to client.Do which respects
the client's own timeout configuration.
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
core/providers/utils/utils.go (1)

118-137: Clarify the intentional distinction between the two cancellation paths.

The two cancellation branches have different IsBifrostError values:

  • Line 120: IsBifrostError: true — context was directly cancelled/timed out
  • Line 130: IsBifrostError: false — error chain contains context.Canceled but ctx.Err() is nil

Both paths set Type: RequestCancelled, so retry and fallback logic will behave identically (break retry loop, skip fallbacks). However, the IsBifrostError distinction is subtle and could cause confusion during debugging.

Consider adding a brief comment explaining when the second path (errors.Is(err, context.Canceled) with ctx.Err() == nil) would be reached—e.g., if a nested/derived context was cancelled while the outer context remains valid.

📝 Suggested clarifying comment
 		if ctx.Err() != nil {
 			return latency, &schemas.BifrostError{
 				IsBifrostError: true,
 				Error: &schemas.ErrorField{
 					Type:    schemas.Ptr(schemas.RequestCancelled),
 					Message: fmt.Sprintf("request cancelled or timed out by context: %v", ctx.Err()),
 					Error:   ctx.Err(),
 				},
 			}
 		}
+		// This branch handles edge cases where the error chain contains context.Canceled
+		// but the parent context is still valid (e.g., a nested context was cancelled).
+		// IsBifrostError is false since this may indicate a transient provider-side issue.
 		if errors.Is(err, context.Canceled) {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@core/providers/utils/utils.go` around lines 118 - 137, Add a brief clarifying
comment above the second cancellation branch (the block that checks
errors.Is(err, context.Canceled)) explaining why IsBifrostError is set to false
there versus true for the ctx.Err() check: note that the first branch (ctx.Err()
!= nil) is hit when the current context itself was cancelled/timed out, while
the second branch is reached when the error chain contains context.Canceled
(e.g., a nested/derived context or upstream operation was cancelled) even though
the outer ctx remains non-nil; reference the symbols ctx.Err(), errors.Is(err,
context.Canceled), IsBifrostError and schemas.BifrostError in the comment so
future readers understand the intent and retry/fallback behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@core/providers/utils/utils.go`:
- Around line 118-137: Add a brief clarifying comment above the second
cancellation branch (the block that checks errors.Is(err, context.Canceled))
explaining why IsBifrostError is set to false there versus true for the
ctx.Err() check: note that the first branch (ctx.Err() != nil) is hit when the
current context itself was cancelled/timed out, while the second branch is
reached when the error chain contains context.Canceled (e.g., a nested/derived
context or upstream operation was cancelled) even though the outer ctx remains
non-nil; reference the symbols ctx.Err(), errors.Is(err, context.Canceled),
IsBifrostError and schemas.BifrostError in the comment so future readers
understand the intent and retry/fallback behavior.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 8561dd1e-30ee-49dc-9c85-58e53ed000c6

📥 Commits

Reviewing files that changed from the base of the PR and between e5f9755 and 35b1c44.

📒 Files selected for processing (1)
  • core/providers/utils/utils.go

@davidrudduck
Copy link
Contributor Author

Updated per your feedback — removed the hardcoded defaultRequestTimeout entirely. When the context carries a deadline (the normal path via NetworkConfig.DefaultRequestTimeoutInSeconds), it uses DoDeadline. When no deadline is set, it falls back to client.Do which respects the client's own timeout config. No arbitrary ceilings imposed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants