Skip to content

fix: improve exponential backoff calculation #3437

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

stanhu
Copy link

@stanhu stanhu commented Jul 17, 2025

Given the default MinRetryBackoff of 8 ms, the previous implementation for calculating the backoff duration was this:

  1. Calculate exponential:
d := minBackoff << uint(retry)
// Retry 0: d = 8ms
// Retry 1: d = 16ms
// Retry 2: d = 32ms
// Retry 3: d = 64ms
  1. Replace with linear jitter:
d = minBackoff + time.Duration(rand.Int63n(int64(d)))
// Retry 0: d = 8ms + random(0, 8ms)  = 8-16ms
// Retry 1: d = 8ms + random(0, 16ms) = 8-24ms
// Retry 2: d = 8ms + random(0, 32ms) = 8-40ms
// Retry 3: d = 8ms + random(0, 64ms) = 8-72ms

The average delays show this isn't really exponential:

Retry 0: avg = 8 + 4   = 12ms
Retry 1: avg = 8 + 8   = 16ms  (1.33x growth)
Retry 2: avg = 8 + 16  = 24ms  (1.5x growth)
Retry 3: avg = 8 + 32  = 40ms  (1.67x growth)

This is actually linear growth with exponential jitter range:

delay = constant_base + random(0, exponential_range)

As described in
https://aws.amazon.com/ko/blogs/architecture/exponential-backoff-and-jitter/, we want:

d := minBackoff << uint(retry)
d += random(0, d)
// Retry 0: d = 8ms + random(0, 8ms)  = 8-16ms
// Retry 1: d = 16ms + random(0, 16ms) = 16-32ms
// Retry 2: d = 32ms + random(0, 32ms) = 32-64ms
// Retry 3: d = 64s + random(0, 64ms) = 64-128ms

Note that even with this change, Redis Cluster may still not have enough breathing room for cluster discovery. MaxRetries, MinRetryBackoff, and MaxRetryBackoff likely still need to be tweaked.

Relates to #2046

Given the default `MinRetryBackoff` of 8 ms, the previous implementation
for calculating the backoff duration was this:

1. Calculate exponential:

```
d := minBackoff << uint(retry)
// Retry 0: d = 8ms
// Retry 1: d = 16ms
// Retry 2: d = 32ms
// Retry 3: d = 64ms
```

2. Replace with linear jitter:

```
d = minBackoff + time.Duration(rand.Int63n(int64(d)))
// Retry 0: d = 8ms + random(0, 8ms)  = 8-16ms
// Retry 1: d = 8ms + random(0, 16ms) = 8-24ms
// Retry 2: d = 8ms + random(0, 32ms) = 8-40ms
// Retry 3: d = 8ms + random(0, 64ms) = 8-72ms
```

The average delays show this isn't really exponential:

```
Retry 0: avg = 8 + 4   = 12ms
Retry 1: avg = 8 + 8   = 16ms  (1.33x growth)
Retry 2: avg = 8 + 16  = 24ms  (1.5x growth)
Retry 3: avg = 8 + 32  = 40ms  (1.67x growth)
```

This is actually linear growth with exponential jitter range:

```
delay = constant_base + random(0, exponential_range)
```

As described in
https://aws.amazon.com/ko/blogs/architecture/exponential-backoff-and-jitter/,
we want:

```
d := minBackoff << uint(retry)
d += random(0, d)
// Retry 0: d = 8ms + random(0, 8ms)  = 8-16ms
// Retry 1: d = 16ms + random(0, 16ms) = 16-32ms
// Retry 2: d = 32ms + random(0, 32ms) = 32-64ms
// Retry 3: d = 64s + random(0, 64ms) = 64-128ms
```

Note that even with this change, Redis Cluster may still not have enough
breathing room for cluster discovery. `MaxRetries`, `MinRetryBackoff`,
and `MaxRetryBackoff` likely still need to be tweaked.

Relates to redis#2046
@stanhu stanhu force-pushed the sh-fix-exponential-backoff branch from d815882 to 71e9b8f Compare July 17, 2025 07:09
@stanhu
Copy link
Author

stanhu commented Jul 17, 2025

Eh, maybe this isn't totally right: https://aws.amazon.com/ko/blogs/architecture/exponential-backoff-and-jitter/

@stanhu stanhu closed this Jul 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant