feat(shard-distributor): implement WatchNamespaceState streaming RPC #7432

jakobht · 2025-11-11T14:46:09Z

What changed?
Implemented the WatchNamespaceState streaming RPC endpoint for the shard distributor service, including a pub/sub mechanism for real-time assignment change notifications.

Why?
The WatchNamespaceState endpoint was previously unimplemented. This enables executors and spectators to receive real-time updates about shard assignment changes without polling, improving responsiveness and reducing load on the storage layer.

How did you test it?
Added unit tests for the handler's streaming behavior and the pub/sub mechanism.

Potential risks
Low - this is a new feature in an experimental service. The pub/sub implementation includes non-blocking publish to prevent slow subscribers from blocking the system.

Release notes
N/A - shard distributor is experimental

Documentation Changes
None required

Signed-off-by: Jakob Haahr Taankvist <[email protected]>

service/sharddistributor/handler/handler.go

eleonoradgr · 2025-11-12T05:57:05Z

service/sharddistributor/handler/handler.go

+	// Stream subsequent updates
+	for {
+		select {
+		case <-server.Context().Done():


what if we stop shardDistributor? is it implicitly handled witht he server context?

Good question - I assume so - it's the only context availible at least

I think we can test this, shutting down the shard distributor and checking that the canaries are not hanging but they connect to a new stream :)

eleonoradgr · 2025-11-12T06:19:07Z

service/sharddistributor/store/etcd/executorstore/shardcache/pubsub.go

+		select {
+		case sub <- state:
+		default:
+			// Subscriber is not reading fast enough, skip this update


should we retry? we call refresh and then publish only in case of changes, let's say that no changes happen, some subscribers will have stale info until next change, are we fine with that?

I think that is a good point - maybe we can send a reconciliation message every 1s?

yes, this is a good idea

Will do a follow up PR

eleonoradgr · 2025-11-12T06:19:25Z

service/sharddistributor/store/etcd/executorstore/shardcache/pubsub.go

+// Subscribe returns a channel that receives executor state updates.
+func (p *executorStatePubSub) subscribe(ctx context.Context) (<-chan map[*store.ShardOwner][]string, func()) {
+	ch := make(chan map[*store.ShardOwner][]string)
+	uniqueID := uuid.New().String()


thinking out loud, should we return the subscription ID for debug purposes?

I don't see the value, but maybe if you elaborate a bit?

In case of a issues with a subscription we only have the subscriptionID stored on SD side and we don't know which instance is not receiving updates. We can understand which namespace is impacted but maybe it is too wide. I am thinking if we should prepend the caller instance to this uid for example.

Ill do a followup PR to add a spectator ID so we can make this connection

jakobht added 7 commits November 11, 2025 15:44

First implementation of the server side

bc0a271

Signed-off-by: Jakob Haahr Taankvist <[email protected]>

We now have a pubsub mechanism in the cache

3f205ea

Signed-off-by: Jakob Haahr Taankvist <[email protected]>

we now use the correct updates

5cd267c

Signed-off-by: Jakob Haahr Taankvist <[email protected]>

Missing files, and make pr

838c152

Signed-off-by: Jakob Haahr Taankvist <[email protected]>

Nil panics

e1e06af

Signed-off-by: Jakob Haahr Taankvist <[email protected]>

tests for handler

b2b2d81

Signed-off-by: Jakob Haahr Taankvist <[email protected]>

updated to fit new data formate

ae5d504

Signed-off-by: Jakob Haahr Taankvist <[email protected]>

jakobht requested review from 3vilhamster, Shaddoll, davidporter-id-au, demirkayaender, dkrotx, neil-xie, sankari165, shijiesheng and taylanisikdemir as code owners November 11, 2025 14:46

eleonoradgr reviewed Nov 12, 2025

View reviewed changes

eleonoradgr approved these changes Nov 12, 2025

View reviewed changes

jakobht merged commit 57f0d8d into cadence-workflow:master Nov 12, 2025
42 checks passed

feat(shard-distributor): implement WatchNamespaceState streaming RPC #7432

feat(shard-distributor): implement WatchNamespaceState streaming RPC #7432

Uh oh!

Conversation

jakobht commented Nov 11, 2025

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants