-
Notifications
You must be signed in to change notification settings - Fork 146
feat: add dynamic backoff #54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your contribution! I've left some minor comments in the code.
It's been a while at this point since I've done much work in go, so I'm not as familiar with conventions around error-handling and typing anymore. Is it usual to define a new wrapping error class to handle cases like this? My first instinct would have been to define an interface for types that have a Backoff() time.Duration
method instead, and let user errors implement that interface if they wish. But I don't know what actually ends up being easier, or what the current conventions are.
t.Error("run wrong number of times") | ||
} | ||
|
||
if time.Since(st) < 500*time.Millisecond { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have tried to avoid doing time-based assertions in the test suite, they tend to be quite flaky.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the test results may be flaky, but I couldn't think of any other way to check 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, I don't assert on how long it actually sleeps (I trust go's timer to work) and I simply unit-test the return value of calcSleep
(and then the bigger tests just assert on the number of times that the method gets retried, which should be deterministic).
Perhaps for this PR you could pass err
into calcSleep
and move the logic into that method, so you can test that logic in a unit test of that method without any time-based assertions?
r := New([]time.Duration{0, 10 * time.Millisecond}, nil) | ||
st := time.Now() | ||
|
||
err := r.Run(genWork([]error{ErrWithBackoff(errFoo, 500*time.Millisecond)})) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prefer to swap these times (so the retrier defaults to 500ms and the error overrides it to 10ms), so the test runs that much faster.
backoff time.Duration | ||
} | ||
|
||
func ErrWithBackoff(err error, backoff time.Duration) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add godocs for any new public types/methods/etc.
var err *errWithBackoff | ||
var backoff time.Duration | ||
if errors.As(ret, &err) { | ||
backoff = err.backoff |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the response gives us a backoff value, does it still make sense to add some jitter to that? I'm not honestly sure...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with eapache's point.
I don't think Jitter should be unnecessarily considered by the client when given by Retry-After.
The client should add the Jitter because it is the same program or could be executed at the same time, and the Retry-After reverses the timing decision.
Sorry to jump in but let me explain further. I think it's strange to assume that the use of Retry-After, etc. will be based on the existing mechanism for generating backoffs. What I mean is that in cases where an error occurs and the Retry-After returns, it is unlikely that there will be a mix of cases where it returns and cases where it does not return, so I think the point is that Retry-After (like, dynamically generated values) should be clearly segregated as a compliant back-off. The current design you are proposing confuses the two and is quite implicit, making the existing backoff a design that “lies”. The work functions that Run and RunCtx receive would be very difficult to modify in the current situation. I would like to solicit input from others on the name, but I think it is a reasonably significant change to have an inherently dynamic treatment enter into a static treatment. I wouldn't be interrupting PR like this if I could discuss it first in an Issue, but I'm sorry if I'm stirring up the discussion. |
I also think that this current code is just screwing with errors and should ideally be in the form of defining a Backoff interface. Errors should be used for handling as values, but using them as a substitute for messaging seems pretty forced. The above my suggestions were made to avoid destructive changes. |
Hmm, to be honest it's the constructor ( @OrangeFlag I'm still open to some version of this feature, but I think it is a good point to consider the interface a little more carefully. |
Thanks for the reply. On the other hand, I recognize that the Classifier is there to determine success or failure and to determine the next action to take. I believe it is excessive for the Classifier to have the responsibility of generating backoffs. Against the scale, this PR seems a little too brave. |
Yes, if somebody is interested in working on it, then that larger discussion should be happening in a different issue or PR at this point - the ideas being floated here are much bigger than Ilya initially proposed. |
I'm going to leave this PR open for the time being - this is a reasonably small extension of the existing implementation that doesn't really get in the way of existing users. While an ideal dynamic retrier probably has a fairly different interface, I don't want to let the perfect be the enemy of the good; If somebody wants to tackle the bigger problem, that's great, but if they just want to address my existing more specific comments, I'm not opposed to merging this anyway. |
Hey everyone! Thx @eapache for this library, I'm currently using it. My understanding on the topic, is that its not the responsibility of running function to decide on backoff value, in the same way as its not the responsibility of type Backoff interface {
Duration(attempt int, err error) time.Duration
} This approach would allow one to build their own backoff by chaining the existing one with custom error checking function. I could make a PR for that if you want guys. wdyt? |
I think something like that interface could make sense, yes. As discussed though, switching from This PR is no longer the right place to be having this discussion though. Please file a ticket for any more discussions if needed. |
Hi,
First of all, thank you for this excellent library! It has been incredibly useful in my projects.
I’m submitting this pull request to add a feature that enables dynamic backoff support. In some cases, the called service may return a
Retry-After
header and expects us to wait for the specified time before retrying. However, this header is not always present. Additionally, there are scenarios where I’d like to set a zero backoff for certain errors, allowing an immediate retry.This pull request introduces the following modifications:
ErrWithBackoff
, to specify dynamic backoff durations.RunFn
to handleErrWithBackoff
by dynamically setting the backoff duration based on the error field.I believe this feature will make the library more flexible and useful in various retry scenarios.
Thank you for considering this contribution!