Skip to content

Conversation

@shiavm006
Copy link

This PR implements support for configuring timeouts specifically for gRPC streaming requests in BackendTrafficPolicy, resolving the issue where gRPC streaming connections were limited to a hardcoded 15-second timeout.

Issue being fixed #5446

Problem

  • gRPC streaming calls are limited to hardcoded 15-second timeout
  • No way to configure different timeouts for streaming vs unary gRPC calls
  • Users forced to use complex EnvoyPatchPolicy workarounds

Solution

  • New StreamTimeout field in HTTPTimeout struct
  • For gRPC routes (HTTP/2): uses StreamTimeout if set, falls back to RequestTimeout
  • For HTTP routes: always uses RequestTimeout (no change)
  • Set to "0s" to disable timeout for infinite streaming

Testing Done

  • Unit tests pass (make test)
  • Linting passes (make lint)
  • Code generation works (make generate)
  • CRD generation works (make manifests)
  • Helm charts validate correctly
  • XDS translation produces correct Envoy config

Add StreamTimeout field to HTTPTimeout struct in BackendTrafficPolicy to enable
configuration of timeouts specifically for gRPC streaming requests.

Key changes:
1. Add StreamTimeout field to api/v1alpha1/timeout_types.go
2. Add StreamTimeout field to internal/ir/xds.go HTTPTimeout
3. Update buildClusterSettingsTimeout to process StreamTimeout
4. Update XDS translation to use StreamTimeout for gRPC routes (IsHTTP2)
5. Add getEffectiveTimeout function to prioritize StreamTimeout for gRPC routes

When StreamTimeout is set to 0s, timeouts are disabled for streaming requests.
This resolves the 15-second timeout limitation for gRPC streaming calls.

Signed-off-by: Shivam Mittal <[email protected]>
Update CRDs, Helm charts, and other generated files to include
the new StreamTimeout field in HTTPTimeout structs.

Signed-off-by: Shivam Mittal <[email protected]>
Comment on lines +48 to +55

// StreamTimeout is the timeout for streaming requests. This timeout does not apply to non-streaming requests.
// When set to "0s", the timeout is disabled for streaming requests, allowing them to run indefinitely.
// This is particularly useful for gRPC streaming calls.
// Default: inherited from RequestTimeout.
//
// +optional
StreamTimeout *gwapiv1.Duration `json:"streamTimeout,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RequestTimeout doesn't currently get evaluated for GRPCRoutes at all. What do you think about merging the two and treating RequestTimeout like this for GRPCRoutes?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think keeping RequestTimeout and StreamTimeout separate is clearer for users, since unary and streaming gRPC calls often need different timeouts. Merging them could cause confusion or accidental misconfiguration. Explicit fields make the API more intuitive and safer for production use.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the docs I'm a bit confused as to where the different timeouts intersect. It seems like the overall Route Timeout (which for HTTPRoutes get implemented via RequestTimeout) takes precedence over the Stream Timeout value but that might be irrelevant of your change here.

https://www.envoyproxy.io/docs/envoy/latest/faq/configuration/timeouts#route-timeouts

The route idle_timeout allows overriding of the HTTP connection manager stream_idle_timeout and does the same thing.

I agree they should stay separate, thanks!

@arkodg
Copy link
Contributor

arkodg commented Jul 15, 2025

this doesnt look right
here's the definiton in Envoy
https://www.envoyproxy.io/docs/envoy/latest/api-v3/config/route/v3/route_components.proto#envoy-v3-api-msg-config-route-v3-routeaction-maxstreamduration

imo we should set grpc_timeout_header_max to 0 by default for GRPCRoute so its respects grpc-timeout header https://github.com/grpc/grpc/blob/master/doc/PROTOCOL-HTTP2.md

and if we are introducing a new streamTimeout it should be used to set grpc_timeout_header_max

related kubernetes-sigs/gateway-api#3219 (comment)

@codecov
Copy link

codecov bot commented Jul 16, 2025

Codecov Report

❌ Patch coverage is 72.00000% with 7 lines in your changes missing coverage. Please review.
✅ Project coverage is 71.01%. Comparing base (a78ff3e) to head (35d8b67).
⚠️ Report is 232 commits behind head on main.

Files with missing lines Patch % Lines
internal/gatewayapi/clustersettings.go 22.22% 6 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #6508      +/-   ##
==========================================
- Coverage   71.05%   71.01%   -0.04%     
==========================================
  Files         225      225              
  Lines       38992    39015      +23     
==========================================
+ Hits        27705    27707       +2     
- Misses       9684     9704      +20     
- Partials     1603     1604       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@shiavm006
Copy link
Author

this doesnt look right here's the definiton in Envoy https://www.envoyproxy.io/docs/envoy/latest/api-v3/config/route/v3/route_components.proto#envoy-v3-api-msg-config-route-v3-routeaction-maxstreamduration

imo we should set grpc_timeout_header_max to 0 by default for GRPCRoute so its respects grpc-timeout header https://github.com/grpc/grpc/blob/master/doc/PROTOCOL-HTTP2.md

and if we are introducing a new streamTimeout it should be used to set grpc_timeout_header_max

related kubernetes-sigs/gateway-api#3219 (comment)

I tried to implement this, but the grpc_timeout_header_max field is not available in the current version of the Envoy Go API (go-control-plane). I updated to the latest version, but the field is still missing from the generated RouteAction struct.

@jukie
Copy link
Contributor

jukie commented Jul 25, 2025

grpc_timeout_header_max only appears to be used if the request also includes a timeout header so stream_idle_timeout seems like the correct place. It would still be a good idea to configure grpc_timeout_header_max alongside but I don't feel strongly about whether that's implemented in this PR or a follow-up. Are there any negatives to only having stream_idle_timeout set without grpc_timeout_header_max?

grpc_timeout_header_max (Duration) If present, and the request contains a grpc-timeout header, use that value as the max_stream_duration, but limit the applied timeout to the maximum value specified here. If set to 0, the grpc-timeout header is used without modification.

@jukie jukie self-requested a review July 25, 2025 01:22
@jukie
Copy link
Contributor

jukie commented Jul 25, 2025

/retest

Comment on lines +366 to +380
func getEffectiveTimeout(httpRoute *ir.HTTPRoute) *metav1.Duration {
// For gRPC routes (IsHTTP2), check if streaming timeout is configured
if httpRoute.IsHTTP2 &&
httpRoute.Traffic != nil &&
httpRoute.Traffic.Timeout != nil &&
httpRoute.Traffic.Timeout.HTTP != nil &&
httpRoute.Traffic.Timeout.HTTP.StreamTimeout != nil {
// StreamTimeout takes precedence for gRPC/HTTP2 routes
return httpRoute.Traffic.Timeout.HTTP.StreamTimeout
}

// Fall back to regular request timeout for non-gRPC routes or when no StreamTimeout is configured
return getEffectiveRequestTimeout(httpRoute)
}

Copy link
Contributor

@jukie jukie Jul 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a testdata example for this? I'm fairly confident that getEffectiveRequestTimeout already doesn't run for GRPCRoutes so am curious if this is being skipped too.

Edit: Confirmed that BTP requestTimeout gets skipped for GRPCRoutes currently. Not expecting you to fix requestTimeout in this PR but I think your new field here will get skipped as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was so confidently incorrect... ignore me but would still like to include testdata

}
}

func TestGetEffectiveTimeout(t *testing.T) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great and fixes code coverage CI but please also add to internal/xds/translator/testdata

@jukie
Copy link
Contributor

jukie commented Jul 25, 2025

/retest

@jukie jukie self-requested a review July 25, 2025 17:56
@jukie jukie requested a review from a team July 25, 2025 17:59
@github-actions
Copy link

github-actions bot commented Sep 3, 2025

This pull request has been automatically marked as stale because it has not had activity in the last 30 days. Please feel free to give a status update now, ping for review, when it's ready. Thank you for your contributions!

@github-actions github-actions bot added stale and removed stale labels Sep 3, 2025
@github-actions
Copy link

github-actions bot commented Oct 4, 2025

This pull request has been automatically marked as stale because it has not had activity in the last 30 days. Please feel free to give a status update now, ping for review, when it's ready. Thank you for your contributions!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Set timeout for streaming rpc linked to a GRPCRoute

3 participants