Skip to content

grpc: introduce ErrRetriesExhausted to wrap retry failures#8894

Open
arjan-bal wants to merge 2 commits intogrpc:masterfrom
arjan-bal:wrap-retry-error
Open

grpc: introduce ErrRetriesExhausted to wrap retry failures#8894
arjan-bal wants to merge 2 commits intogrpc:masterfrom
arjan-bal:wrap-retry-error

Conversation

@arjan-bal
Copy link
Contributor

@arjan-bal arjan-bal commented Feb 10, 2026

Fixes: #7023

This change wraps RPC errors when the maximum retry limit is reached. It introduces a custom error type that wraps the number of attempts made alongside the original error.

RELEASE NOTES:

  • grpc: Add ErrRetriesExhausted error type that wraps retry failures.

@arjan-bal arjan-bal added this to the 1.80 Release milestone Feb 10, 2026
@arjan-bal arjan-bal added the Type: Feature New features or improvements in behavior label Feb 10, 2026
@arjan-bal
Copy link
Contributor Author

/gemini review

@codecov
Copy link

codecov bot commented Feb 10, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 80.29%. Comparing base (d7b3f93) to head (5cb347e).
⚠️ Report is 6 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #8894      +/-   ##
==========================================
- Coverage   83.13%   80.29%   -2.85%     
==========================================
  Files         414      416       +2     
  Lines       32821    33523     +702     
==========================================
- Hits        27286    26917     -369     
- Misses       4098     4677     +579     
- Partials     1437     1929     +492     
Files with missing lines Coverage Δ
stream.go 61.77% <100.00%> (-20.11%) ⬇️

... and 36 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces ErrRetriesExhausted to provide more context when an RPC fails after exhausting all retry attempts. The idea is good, as it improves error diagnostics for clients.

However, the implementation of error wrapping in stream.go uses fmt.Errorf with two %w verbs, which is invalid in Go and will lead to a compile error. I've left a critical comment with details on this issue and suggestions for a fix. The core challenge is to create an error that both is identifiable as ErrRetriesExhausted and preserves the original gRPC status, which requires careful error wrapping.

The test changes are good, especially the switch to errors.Is and the new test case for ErrRetriesExhausted which helped identify the design issue.

Comment on lines +757 to +761
return false, fmt.Errorf("stopped after %d attempts: %w: %w",
cs.numRetries+1,
ErrRetriesExhausted,
err,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

Using two %w verbs in fmt.Errorf is invalid and will cause a compilation error with standard Go compilers. The Go documentation states: 'It is also invalid to use %w multiple times in a single Errorf call.'

Beyond the compilation error, there's a design challenge. The tests require that the resulting error e satisfies both errors.Is(e, ErrRetriesExhausted) and that status.Code(e) returns the original error's code. This is difficult to achieve when ErrRetriesExhausted is a simple error from errors.New because it cannot wrap the original error.

To satisfy both requirements, a solution would be to use a custom error type that wraps the original err and also implements an Is method to identify as ErrRetriesExhausted.

For example:

// unexported custom error type
type exhaustedError struct {
    err error
    attempts int
}

func (e *exhaustedError) Error() string {
    return fmt.Sprintf("stopped after %d attempts: %v: %v", e.attempts, ErrRetriesExhausted, e.err)
}

func (e *exhaustedError) Unwrap() error {
    return e.err
}

func (e *exhaustedError) Is(target error) bool {
    return target == ErrRetriesExhausted
}

Then you could construct the error as &exhaustedError{err: err, attempts: cs.numRetries + 1}.

Copy link
Contributor Author

@arjan-bal arjan-bal Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using two %w verbs in fmt.Errorf is invalid and will cause a compilation error with standard Go compilers. The Go documentation states: 'It is also invalid to use %w multiple times in a single Errorf call.'

While older versions of Go restricted the use of %w to a single instance, using multiple %w verbs in fmt.Errorf is valid since Go 1.20. Here is what the doc states:

If there is more than one %w verb, the returned error will implement an Unwrap method returning a []error containing all the %w operands in the order they appear in the arguments.

As a result, the error e satisfies errors.Is(e, ErrRetriesExhausted), while status.Code(e) correctly returns the original error's code. Both behaviors are covered by tests.

@arjan-bal arjan-bal self-assigned this Feb 12, 2026
@easwars easwars removed their assignment Feb 13, 2026
Copy link
Contributor Author

@arjan-bal arjan-bal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed the comments. Also added an experimental notice on the new error type and tests cases to verify the wrapped error isn't present when retries are disabled.

RecvMsg(m any) error
}

// ErrRetriesExhausted is returned when an operation exceeds its configured
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: This is not any arbitrary operation. This only applies to RPCs, right. Can we make that more explicit here since this will be part of the API.

"RetryableStatusCodes": [ "UNAVAILABLE" ]
}
}]}`),
grpc.WithDisableRetry()); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this falls under go/go-style/decisions#indentation-confusion.

Can we have the dial options initialized in a separate slice and have them be passed here, so that the ss.Start(....) can be on a single line?

}
_, err = stream.Recv()
if err == nil {
t.Fatalf("client: Recv() = <nil>, <nil>; want <nil>, error")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know we have many error strings like this. But this should remain a relic of the past. This error message is not very readable. Something like "stream.Recv() succeeded when expected to fail" would be more readable. Here and elsewhere where this applies.

}
}

func (s) TestRetryNotConfigured(t *testing.T) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test logic here and in the above test seems identical. The only difference is the dial options. Can we make it a table driven test instead?

&testpb.Empty{},
}
s := status.New(codes.Canceled, "inner canceled")
sWithDetails, err := s.WithDetails(details...)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: We could just inline the details proto here in the call to s.WithDetails and get rid of the slice and the unpacking here.

Comment on lines +322 to +330
if got := st.Code(); got != tc.wantCode {
t.Errorf("st.Code() = %v; want %v", got, tc.wantCode)
}
if got := st.Message(); got != tc.wantMessage {
t.Errorf("st.Message() = %q; want %q", got, tc.wantMessage)
}
if got := len(st.Details()); got != tc.wantDetails {
t.Errorf("len(st.Details()) = %v; want %v", got, tc.wantDetails)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we instead have a wantStatus in the test table and perform a full struct comparison here instead of comparing the individual fields? go/go-style/decisions#compare-full-structures

@easwars easwars assigned arjan-bal and unassigned easwars Feb 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Type: Feature New features or improvements in behavior

Projects

None yet

Development

Successfully merging this pull request may close these issues.

retry: Status should indicate more details when retries are enabled and an RPC fails

2 participants