Refactor ClusterPubSub to reuse NodesManager-managed connections. Refactor ClusterPubSub to reuse NodesManager-managed connections#4037
Open
petyaslavova wants to merge 1 commit intomasterfrom
Conversation
…actor ClusterPubSub to reuse NodesManager-managed connections
🛡️ Jit Security Scan Results✅ No security findings were detected in this PR
Security scan by Jit
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Refactor ClusterPubSub to reuse NodesManager-managed connections
Previously, both sync and async ClusterPubSub constructed detached
ConnectionPool instances from
connection_kwargswhenever a targetnode was supplied or resolved. This bypassed NodesManager's authoritative
pool, so credential refreshes, maintenance events, and connection limits
applied to the cluster did not propagate to the PubSub socket.
Sync (redis/cluster.py):
obtain the pool via
cluster.get_redis_connection(node).connection_pool.Async (redis/asyncio/cluster.py):
_ClusterNodePoolAdapterthat exposes a ClusterNode as aConnectionPool-compatible object (get_connection / release / get_encoder).
target node with the adapter instead of building a standalone pool.
keyspace_notifications.pyre-imports the adapter fromredis.asyncio.clusterto preserve the existing public re-export.Tests:
onlyclusterfrom tests/test_asyncio/test_cluster.pyand apply it per class so mock-based unit tests can run in standalone CI.
TestClusterPubSubWithMocks(sync + async) under@pytest.mark.fixed_clientcovering connection extraction, pool reuse,and shard routing on mocks.
Analysis:
original divergence, the resource model, and the chosen fix.
Fix PubSub.aclose() hang by disconnecting with nowait=True (#3941)
PubSub.aclose() called
self.connection.disconnect()with the defaultnowait=False, which awaitsStreamWriter.wait_closed(). When aconcurrent reader task (e.g. a background
pubsub.run()task or aget_message(block=True)call) still holds the transport, the writercannot finish closing and
aclose()hangs indefinitely — made permanentby
socket_connect_timeoutdefaulting toNone.Pass
nowait=TrueinPubSub.aclose()to match every other internaldisconnect()call site inredis/asyncio/connection.pyand thePython-3.13 GC fallback path from #3856. The reader task now observes
a normal
ConnectionErroron the torn-down socket, which is alreadythe documented contract for a pubsub whose transport is closed.
No sync mirror is required:
redis.client.Connection.disconnect()hasno
wait_closed()analogue and cannot deadlock.ClusterPubSub.aclose()delegates tosuper().aclose()and picks upthe fix transparently.
Add
TestPubSubAcloseWithMocks(fixed_client) with two regressions:connection.disconnect(nowait=True)is called and thefull cleanup sequence (deregister callback, pool release, clear
self.connection) runs;disconnectthat hangs forever onnowait=Falseand verifies
aclose()still returns under a bounded timeout.Fixes #3941
Note
Medium Risk
Medium risk because it changes how cluster pubsub acquires/releases connections (sync + asyncio) and modifies async pubsub shutdown semantics, which can affect connection reuse and cleanup behavior under concurrency.
Overview
Cluster pubsub connection management is refactored to stop creating detached pools. Async
ClusterPubSubnow wraps aClusterNodewith a new internal_ClusterNodePoolAdapterand uses it for both primary and sharded pubsub connections, ensuring pubsub sockets participate in the node’s own connection budgeting/maintenance behavior; the adapter is re-used byasyncio/keyspace_notifications.Sync
ClusterPubSubis tightened for lazy node materialization and safer cleanup. Sharded pubsubs are created viacluster.get_redis_connection(node).pubsub(...)(instead ofnode.redis_connection), anddisconnect()now tolerates shard pubsubs whoseconnectionwas never created.Async pubsub shutdown is made non-blocking under concurrent readers.
PubSub.aclose()now callsconnection.disconnect(nowait=True)to avoid deadlocks waiting onStreamWriter.wait_closed().Tests/docs are updated. Adds mock-based unit tests for sync/async
ClusterPubSubpool/adapter usage and disconnect behavior, adds regressions for theaclose()hang, and adjusts asyncio cluster tests to applyonlyclusterper-class so mock tests can run without a live cluster.Reviewed by Cursor Bugbot for commit 353c9ea. Bugbot is set up for automated code reviews on this repo. Configure here.