Skip to content

Commit a00a063

Browse files
comments pt 5
1 parent 1b32e00 commit a00a063

File tree

1 file changed

+68
-25
lines changed

1 file changed

+68
-25
lines changed

source/client-backpressure/client-backpressure.md

Lines changed: 68 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,12 @@ ______________________________________________________________________
1010
This specification adds the ability for drivers to automatically retry requests that fail due to server overload errors
1111
while applying backpressure to avoid further overloading the server.
1212

13+
The retry behaviors defined in this specification are separate from and complementary to the retry behaviors defined in
14+
the [Retryable Reads](../retryable-reads/retryable-reads.md) and
15+
[Retryable Writes](../retryable-writes/retryable-writes.md) specifications. This specification expands retry support to
16+
all commands when specific server overload conditions are encountered, regardless of whether the command would normally
17+
be retryable under those specifications.
18+
1319
## META
1420

1521
The keywords "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and
@@ -64,8 +70,8 @@ For example, when a request exceeds the ingress request rate limit, the followin
6470
}
6571
```
6672

67-
Note that an error is not guaranteed to contain both the `SystemOverloadedError` and the `RetryableError` labels, if it
68-
contains one of them.
73+
Note that an error is not guaranteed to contain both the `SystemOverloadedError` and the `RetryableError` labels just
74+
because it contains one of them.
6975

7076
#### Goodput
7177

@@ -76,6 +82,24 @@ See [goodput](https://en.wikipedia.org/wiki/Goodput).
7682

7783
### Requirements for Client Backpressure
7884

85+
#### Driver mechanisms subject to the retry policy
86+
87+
Commands sent by the driver to the server are subject to the retry policy defined in this specification unless the
88+
command is included in the exceptions below.
89+
90+
Driver commands not subject to the overload retry policy:
91+
92+
- [monitoring commands](../server-discovery-and-monitoring/server-monitoring.md#monitoring) and
93+
[round-trip time pingers](../server-discovery-and-monitoring/server-monitoring.md#measuring-rtt) (see
94+
[Why not apply the overload retry policy to monitoring and RTT connections?](./client-backpressure.md#why-not-apply-the-overload-retry-policy-to-monitoring-and-rtt-connections))
95+
- commands executed during [authentication](../auth/auth.md) (see
96+
[Why not apply the overload policy to authentication commands or reauthentication commands?](./client-backpressure.md#why-not-apply-the-overload-policy-to-authentication-commands-or-reauthentication-commands))
97+
98+
Note: Drivers communicate with [mongocryptd](../client-side-encryption/client-side-encryption.md#mongocryptd) using the
99+
driver's `runCommand()` API. Consequently, drivers will implicitly apply the retry policy to communication with
100+
mongocryptd, although practice the retry policy would never be unused because mongocryptd connections are not
101+
authenticated.
102+
79103
#### Overload retry policy
80104

81105
This specification expands the driver's retry ability to all commands if the error indicates that it is a retryable
@@ -112,9 +136,10 @@ rules:
112136
- `BASE_BACKOFF` is constant 100ms.
113137
- `MAX_BACKOFF` is 10000ms.
114138
- This results in delays of 100ms, 200ms, 400ms, 800ms, and 1600ms before accounting for jitter.
115-
8. If the request is eligible for retry (as outlined in step 4), the client MUST add the previously used server's
116-
address to the list of deprioritized server addresses for server selection.
117-
9. If the request is eligible for retry (as outlined in step 4) and is a retryable write:
139+
8. If the request is eligible for retry (as outlined in step 5), the client MUST add the previously used server's
140+
address to the list of deprioritized server addresses for
141+
[server selection](../server-selection/server-selection.md).
142+
9. If the request is eligible for retry (as outlined in step 5) and is a retryable write:
118143
1. If the command is a part of a transaction, the instructions for command modification on retry for commands in
119144
transactions MUST be followed, as outlined in the
120145
[transactions](../transactions/transactions.md#interaction-with-retryable-writes) specification.
@@ -126,20 +151,6 @@ rules:
126151
[retryable writes](../retryable-writes/retryable-writes.md) and the
127152
[transactions](../transactions/transactions.md) specifications.
128153

129-
##### Relevant driver processes
130-
131-
The retry policy defined above is only relevant for commands sent on authenticated connections, which
132-
133-
- any user-facing API which wraps a server command (i.e., a CRUD command or runCommand)
134-
- cursors and change streams (including getMores and killCursors)
135-
- APIs which might perform multiple operations internally (such rewrapManyDataKey(), which performs a find() and a bulk
136-
update)
137-
138-
Driver processes not subject to the overload retry policy include commands executed on unauthenticated connections:
139-
140-
- monitoring commands and round-trip time pingers
141-
- commands executed during authentication (i.e., `saslStart`)
142-
143154
#### Interaction with Other Retry Policies
144155

145156
The retry policy in this specification is separate from the other retry policies defined in the
@@ -157,8 +168,8 @@ specifications. Drivers MUST ensure:
157168

158169
The following pseudocode demonstrates the unified retry policy, combining the overload retry policy defined in this
159170
specification with the retry policies from [Retryable Reads](../retryable-reads/retryable-reads.md) and
160-
[Retryable Writes](../retryable-writes/retryable-writes.md). For brevity, some error handling details such as the
161-
handling of "NoWritesPerformed" are omitted.
171+
[Retryable Writes](../retryable-writes/retryable-writes.md). For brevity, some interactions with other specs are not
172+
included, such as error handling with `NoWritesPerformed` labels.
162173

163174
```python
164175
# Note: the values below have been scaled down by a factor of 1000 because
@@ -230,6 +241,10 @@ the token bucket will limit retry attempts during a prolonged overload.
230241

231242
The token bucket capacity is set to 1000 for consistency with the server.
232243

244+
Each MongoClient instance MUST have its own token bucket. The token bucket MUST be created when the MongoClient is
245+
initialized and exist for the lifetime of the MongoClient. Drivers MUST ensure the token bucket implementation is
246+
thread-safe as it may be accessed concurrently by multiple operations.
247+
233248
#### Pseudocode
234249

235250
The token bucket is implemented via a thread safe counter. For languages without atomics, this can be implemented via a
@@ -263,9 +278,10 @@ class TokenBucket:
263278

264279
#### Handshake changes
265280

266-
Drivers conforming to this spec MUST add `“backpressure”: True` to the connection handshake. This flag allows the server
267-
to identify clients which do and do not support backpressure. Currently, this flag is unused but in the future the
268-
server may offer different rate limiting behavior for clients that do not support backpressure.
281+
Drivers conforming to this spec MUST add `"backpressure": True` to the
282+
[connection handshake](../mongodb-handshake/handshake.rst). This flag allows the server to identify clients which do and
283+
do not support backpressure. Currently, this flag is unused but in the future the server may offer different rate
284+
limiting behavior for clients that do not support backpressure.
269285

270286
#### Implementation notes
271287

@@ -299,7 +315,7 @@ since a server is reselected for a retry attempt.
299315
### Backwards Compatibility
300316

301317
The server's rate limiting can introduce higher error rates than previously would have been exposed to users under
302-
periods of extreme server overload. The increased error rates is a tradeoff: given the choice between an overloaded
318+
periods of extreme server overload. The increased error rate is a tradeoff: given the choice between an overloaded
303319
server (potential crash), or at minimum dramatically slower query execution time and a stable but lowered throughput
304320
with higher error rate as the server load sheds, we have chosen the latter.
305321

@@ -368,6 +384,33 @@ specifications with load-shedding behavior:
368384
times. The approach chosen allows for additional retries in scenarios where a non-overload error fails on a retry
369385
with an overload error.
370386

387+
### Why not apply the overload retry policy to monitoring and RTT connections?
388+
389+
The ingress request rate limiter only applies to authenticated connections. Neither the
390+
[monitoring connection](../server-discovery-and-monitoring/server-monitoring.md#monitoring) nor the
391+
[RTT pinger](../server-discovery-and-monitoring/server-monitoring.md#measuring-rtt) use authentication, and consequently
392+
will not encounter ingress operation rate limiter errors.
393+
394+
It is conceivable that a driver attempting to establish a monitoring connection or RTT connection could encounter the
395+
ingress connection rate limiter. However, in these scenarios, the driver already behaves in an appropriate manner.
396+
397+
If an error is encountered, both the RTT connections and monitoring connections already retry.
398+
399+
- The RTT pinger retries indefinitely until the monitor is reset.
400+
- Monitoring failures will mark the server unknown, which will reset the monitor, triggering another monitoring request.
401+
402+
Under most circumstances, both monitoring and RTT connections wait at least `minHeartbeatFrequencyMS` between `hello`
403+
commands, ensuring delays between retries. The notable exception is monitoring connections retrying network errors
404+
without waiting for `minHeartbeatFrequencyMS`, which is acceptable since re-establishing monitoring is the driver's top
405+
priority when a monitoring connection disconnects.
406+
407+
### Why not apply the overload policy to authentication commands or reauthentication commands?
408+
409+
The ingress request rate limiter only applies to authenticated connections. The server does not consider a connection to
410+
be authenticated until after the authentication workflow has completed and during reauthentication a connection is not
411+
considered authenticated by the server. So, authentication and reauthentication commands will not hit the ingress
412+
operation rate limiter.
413+
371414
## Changelog
372415

373416
- 2026-01-09: Initial version.

0 commit comments

Comments
 (0)