feat: add dynamic backoff #54

OrangeFlag · 2025-01-17T23:03:18Z

Hi,
First of all, thank you for this excellent library! It has been incredibly useful in my projects.

I’m submitting this pull request to add a feature that enables dynamic backoff support. In some cases, the called service may return a Retry-After header and expects us to wait for the specified time before retrying. However, this header is not always present. Additionally, there are scenarios where I’d like to set a zero backoff for certain errors, allowing an immediate retry.

This pull request introduces the following modifications:

Added a new error constructor, ErrWithBackoff, to specify dynamic backoff durations.
Updated RunFn to handle ErrWithBackoff by dynamically setting the backoff duration based on the error field.
Added a test to verify the dynamic backoff logic.

I believe this feature will make the library more flexible and useful in various retry scenarios.
Thank you for considering this contribution!

eapache

Thanks for your contribution! I've left some minor comments in the code.

It's been a while at this point since I've done much work in go, so I'm not as familiar with conventions around error-handling and typing anymore. Is it usual to define a new wrapping error class to handle cases like this? My first instinct would have been to define an interface for types that have a Backoff() time.Duration method instead, and let user errors implement that interface if they wish. But I don't know what actually ends up being easier, or what the current conventions are.

retrier/classifier_test.go

eapache · 2025-01-18T16:32:34Z

retrier/retrier_test.go

+		t.Error("run wrong number of times")
+	}
+
+	if time.Since(st) < 500*time.Millisecond {


I have tried to avoid doing time-based assertions in the test suite, they tend to be quite flaky.

Yes, the test results may be flaky, but I couldn't think of any other way to check 🤔

Currently, I don't assert on how long it actually sleeps (I trust go's timer to work) and I simply unit-test the return value of calcSleep (and then the bigger tests just assert on the number of times that the method gets retried, which should be deterministic).

Perhaps for this PR you could pass err into calcSleep and move the logic into that method, so you can test that logic in a unit test of that method without any time-based assertions?

eapache · 2025-01-18T16:33:10Z

retrier/retrier_test.go

+	r := New([]time.Duration{0, 10 * time.Millisecond}, nil)
+	st := time.Now()
+
+	err := r.Run(genWork([]error{ErrWithBackoff(errFoo, 500*time.Millisecond)}))


Prefer to swap these times (so the retrier defaults to 500ms and the error overrides it to 10ms), so the test runs that much faster.

eapache · 2025-01-18T16:34:46Z

retrier/errors.go

+	backoff time.Duration
+}
+
+func ErrWithBackoff(err error, backoff time.Duration) error {


Please add godocs for any new public types/methods/etc.

eapache · 2025-01-21T21:00:53Z

retrier/retrier.go

+			var err *errWithBackoff
+			var backoff time.Duration
+			if errors.As(ret, &err) {
+				backoff = err.backoff


If the response gives us a backoff value, does it still make sense to add some jitter to that? I'm not honestly sure...

I agree with eapache's point.
I don't think Jitter should be unnecessarily considered by the client when given by Retry-After.
The client should add the Jitter because it is the same program or could be executed at the same time, and the Retry-After reverses the timing decision.

tukeJonny · 2025-02-23T02:26:27Z

@OrangeFlag @eapache

Sorry to jump in but let me explain further.
As a fan of go-resiliency's retry packages, I am curious to offer my opinion.

I think it's strange to assume that the use of Retry-After, etc. will be based on the existing mechanism for generating backoffs.

What I mean is that in cases where an error occurs and the Retry-After returns, it is unlikely that there will be a mix of cases where it returns and cases where it does not return, so I think the point is that Retry-After (like, dynamically generated values) should be clearly segregated as a compliant back-off.
Also, the current backoffs are mismatched with the basic design of pre-generated values.

The current design you are proposing confuses the two and is quite implicit, making the existing backoff a design that “lies”.

The work functions that Run and RunCtx receive would be very difficult to modify in the current situation.
So, why not have two functions, RunWithDynamicBackoff and RunCtxWithDynamicBackoff?
In this case, we would rely on a “dynamic” backoff to be executed in the RunX function that would generate a backoff for each call.

I would like to solicit input from others on the name, but I think it is a reasonably significant change to have an inherently dynamic treatment enter into a static treatment.

I wouldn't be interrupting PR like this if I could discuss it first in an Issue, but I'm sorry if I'm stirring up the discussion.

tukeJonny · 2025-02-23T02:46:59Z

Thanks for your contribution! I've left some minor comments in the code.

It's been a while at this point since I've done much work in go, so I'm not as familiar with conventions around error-handling and typing anymore. Is it usual to define a new wrapping error class to handle cases like this? My first instinct would have been to define an interface for types that have a Backoff() time.Duration method instead, and let user errors implement that interface if they wish. But I don't know what actually ends up being easier, or what the current conventions are.

I also think that this current code is just screwing with errors and should ideally be in the form of defining a Backoff interface.

Errors should be used for handling as values, but using them as a substitute for messaging seems pretty forced.

The above my suggestions were made to avoid destructive changes.
Also, if static and dynamic behaviors are mixed, it will be easy to introduce bugs and also difficult for users to understand, so why not prepare them in a way that they can be segregated? This is the reason why i suggested to create a separate form for the two.

eapache · 2025-02-23T20:52:44Z

Hmm, to be honest it's the constructor (New) which takes the static list of backoff values. If you wanted to design a proper dynamic retry interface, it would involve a new constructor as well (either taking a function reference in the constructor, or leaving it up to the work function). But if you're baking more logic into the work function so it can return the backoff value, then does the classifier make sense either, or should that also just be a value the work function returns directly? Or should the classifier be responsible for calculating the dynamic backoff? Lots of questions / lots of options, maybe best designed as a fresh type.

@OrangeFlag I'm still open to some version of this feature, but I think it is a good point to consider the interface a little more carefully.

tukeJonny · 2025-02-23T22:13:45Z

@eapache

Thanks for the reply.
It seems like a reasonable idea to me as well, that the backoff should be known at the constructor.
The “processing” is passed by function reference (work) when processing, but the “how to define the backoff value” is something that the retry executor must know.

On the other hand, I recognize that the Classifier is there to determine success or failure and to determine the next action to take.
In this PR proposal, we are talking about how to generate backoffs, assuming a situation where the client is not in a position to determine backoffs, so isn't this a different story? I believe that this is the case.
Even if the work function were to return a backoff value, would it not change whether the next action taken is a retry or not, and would it not only take longer to wait?

I believe it is excessive for the Classifier to have the responsibility of generating backoffs.
If we do that, users will always have to pay attention to the Classifier when they are concerned about handling backoffs or extending them, and we will have a situation where there are multiple concerns.
The reverse is also true.

Against the scale, this PR seems a little too brave.
How about we reopen the discussion with an Issue?
In particular, what is needed in the design to separate constructors and how to incorporate them.

eapache · 2025-02-23T22:34:58Z

Yes, if somebody is interested in working on it, then that larger discussion should be happening in a different issue or PR at this point - the ideas being floated here are much bigger than Ilya initially proposed.

eapache · 2025-02-23T22:43:06Z

I'm going to leave this PR open for the time being - this is a reasonably small extension of the existing implementation that doesn't really get in the way of existing users. While an ideal dynamic retrier probably has a fairly different interface, I don't want to let the perfect be the enemy of the good; If somebody wants to tackle the bigger problem, that's great, but if they just want to address my existing more specific comments, I'm not opposed to merging this anyway.

yandzee · 2025-03-23T07:14:52Z

Hey everyone!
I got here randomly while was checking if rate limiting module is there and read the entire discussion :)

Thx @eapache for this library, I'm currently using it.

My understanding on the topic, is that its not the responsibility of running function to decide on backoff value, in the same way as its not the responsibility of Retrier to calcSleep. So I would move all such logic into the interface like:

type Backoff interface {
    Duration(attempt int, err error) time.Duration
}

This approach would allow one to build their own backoff by chaining the existing one with custom error checking function. I could make a PR for that if you want guys.

wdyt?

eapache · 2025-03-23T14:54:17Z

I think something like that interface could make sense, yes. As discussed though, switching from []time.Duration to something like that starts to impact the constructor and overall design of the type, which makes it not a very straight-forward addition, perhaps even warranting a whole new type/implementation. Happy to look at PRs if there is interest.

This PR is no longer the right place to be having this discussion though. Please file a ticket for any more discussions if needed.

feat: add dynamic backoff

c18ad48

eapache reviewed Jan 18, 2025

View reviewed changes

eapache reviewed Jan 21, 2025

View reviewed changes

feat: add dynamic backoff #54

Are you sure you want to change the base?

feat: add dynamic backoff #54

Conversation

OrangeFlag commented Jan 17, 2025

Uh oh!

eapache left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

eapache Jan 18, 2025

Choose a reason for hiding this comment

Uh oh!

OrangeFlag Jan 20, 2025

Choose a reason for hiding this comment

Uh oh!

eapache Jan 21, 2025

Choose a reason for hiding this comment

Uh oh!

eapache Jan 18, 2025

Choose a reason for hiding this comment

Uh oh!

eapache Jan 18, 2025

Choose a reason for hiding this comment

Uh oh!

eapache Jan 21, 2025

Choose a reason for hiding this comment

Uh oh!

tukeJonny Feb 23, 2025

Choose a reason for hiding this comment

Uh oh!

tukeJonny commented Feb 23, 2025

Uh oh!

tukeJonny commented Feb 23, 2025

Uh oh!

eapache commented Feb 23, 2025

Uh oh!

tukeJonny commented Feb 23, 2025

Uh oh!

eapache commented Feb 23, 2025

Uh oh!

eapache commented Feb 23, 2025

Uh oh!

yandzee commented Mar 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

eapache commented Mar 23, 2025

Uh oh!

Uh oh!

yandzee commented Mar 23, 2025 •

edited

Loading