Skip to content

Conversation

@WHOIM1205
Copy link

Overview

Fixed a critical race condition in the MVCC watcher lifecycle where concurrent watcher cancellation and victim processing could trigger a fatal panic in the etcd server.

Under realistic high-load scenarios with slow watchers, a watcher may transiently exist outside of all watcher groups due to non-atomic state transitions. The existing code treated this state as impossible and crashed the server.

This PR makes the watcher cancellation logic resilient to this transient state by retrying until lifecycle convergence, preventing process termination.


The Issue

In server/storage/mvcc/watchable_store.go, the cancelWatcher function assumes the following invariant:

If a watcher is not present in synced or unsynced, it must be present in a victim batch.

During concurrent execution of:

  • victim processing (syncWatchers / moveVictims)
  • watcher cancellation (cancelWatcher)

this invariant does not always hold.

A watcher may be removed from the victim batch and have wa.victim cleared while a concurrent cancellation is in progress. When cancelWatcher observes this transient state, it panics with:

This panic crashes the etcd process and disrupts clients such as Kubernetes API servers.


Fix Applied

  • File: server/storage/mvcc/watchable_store.go
  • Updated cancelWatcher to retry when a watcher is transiently in-flight between lifecycle states instead of panicking.
  • The fix preserves existing watcher semantics and only affects the invalid error path.

This aligns cancelWatcher behavior with other watcher lifecycle code paths that already tolerate concurrent state transitions.


Testing & Verification

  • Added a new regression test TestCancelWatcherDuringVictimProcessing in:
    • server/storage/mvcc/watchable_store_test.go
  • The test exercises concurrent watcher cancellation and victim processing to reproduce the race condition.
  • Verified that the etcd server no longer panics under this scenario.

Verified locally with:

go test ./server/storage/mvcc -run TestCancelWatcherDuringVictimProcessing -count=20
go test ./server/storage/mvcc

…ocessing

Signed-off-by: WHOIM1205 <rathourprateek8@gmail.com>
@k8s-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: WHOIM1205
Once this PR has been reviewed and has the lgtm label, please assign spzala for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot
Copy link

Hi @WHOIM1205. Thanks for your PR.

I'm waiting for a etcd-io member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@WHOIM1205
Copy link
Author

hey @serathius

This fixes a real race in the watcher lifecycle where cancelWatcher could observe a transient state during concurrent victim processing and panic.

The panic relies on an invariant that does not always hold under concurrent execution. Retrying until the watcher lifecycle converges avoids crashing the server while preserving existing behavior.

A regression test is included to exercise the concurrent cancellation path.

@serathius serathius closed this Jan 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

3 participants