Fix ConfState divergence on crash during Raft ConfChange recovery #21190

WHOIM1205 · 2026-01-25T19:16:53Z

Fix Raft ConfState Divergence After Crash During ConfChange Application

Summary

This PR fixes a critical crash-recovery bug in etcd where Raft’s in-memory ConfState can diverge from the backend-persisted ConfState if the process crashes while applying a membership change.

The issue occurs because ConfState persistence is not atomic with ApplyConfChange(). On restart, etcd trusted the backend ConfState, which may be stale, leading to incorrect cluster membership after recovery.

This PR makes the WAL the single source of truth for rebuilding ConfState during bootstrap, eliminating this inconsistency.

Problem Description

When applying a ConfChange, etcd performs the following steps:

Updates Raft’s in-memory ConfState via ApplyConfChange()
Marks backend ConfState as dirty using SetConfState()
Persists it later during the backend transaction commit

If etcd crashes after step (1) but before the backend transaction commits, the system enters an inconsistent state:

Raft state (rebuilt from WAL) reflects the membership change
Backend ConfState remains stale
Bootstrap logic trusts the backend ConfState

This violates the invariant that Raft state and persisted cluster membership metadata must be consistent after recovery.

Why This Is Critical

Can cause incorrect cluster membership after restart
Leads to broken quorum calculations and failed leader elections
Can block all writes indefinitely
Impacts production Kubernetes control planes
Failure mode is silent and difficult to diagnose
Often requires manual intervention to recover

Root Cause

Backend ConfState is treated as authoritative during bootstrap
WAL already contains all committed ConfChange entries
No reconciliation exists between WAL-derived state and backend metadata
A crash between ApplyConfChange() and backend commit leaves persisted state stale

Fix Overview

Rebuild ConfState from WAL during bootstrap and reconcile backend state if necessary.

Key changes

Reconstruct ConfState by replaying committed ConfChange entries from WAL
Compare the rebuilt ConfState with the backend ConfState
If a mismatch is detected:
- Log a warning
- Persist the corrected ConfState to the backend before starting Raft

This guarantees crash-safe and deterministic recovery without changing Raft semantics.

Steps to Reproduce (Before Fix)

Start a 3-node etcd cluster
Remove a member using a ConfChangeRemoveNode
Crash etcd after ApplyConfChange() but before backend commit
Restart the node
Observe:
- Removed member reappears, or
- Leader election fails due to incorrect quorum size

Verification (After Fix)

ConfState is rebuilt from WAL on startup
Backend ConfState is corrected automatically
Cluster membership is consistent
Leader election succeeds and writes are accepted

Tests Added

Unit test for WAL-based ConfState reconstruction
Integration test simulating crash during ConfChange using gofail
Ensures backend and Raft ConfState match after recovery

Impact

Prevents cluster membership divergence after crashes
Eliminates quorum deadlocks and split-brain scenarios
Improves etcd reliability under failure
No behavior change during normal operation

Notes for Reviewers

WAL is treated as the authoritative source of Raft state
Fix is isolated to bootstrap logic
No changes to the Raft state machine or apply path
Safe for backporting

Signed-off-by: WHOIM1205 <rathourprateek8@gmail.com>

k8s-ci-robot · 2026-01-25T19:16:58Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: WHOIM1205
Once this PR has been reviewed and has the lgtm label, please assign fuweid for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot · 2026-01-25T19:17:03Z

Hi @WHOIM1205. Thanks for your PR.

I'm waiting for a etcd-io member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

WHOIM1205 · 2026-01-25T19:17:26Z

hey @serathius
This fixes a crash-recovery edge case where ConfState could diverge if etcd crashes between ApplyConfChange() and backend commit. The fix makes WAL authoritative during bootstrap and reconciles backend state if needed.
Happy to adjust or add more tests if required.

serathius · 2026-01-26T10:33:21Z

https://github.com/kubernetes/community/blob/88bcd83f56e8b15637a8b422e2bf290c922640c3/contributors/guide/pull-requests.md#ai-guidance

fix: persist confState atomically in applyConfChange

0f3996a

Signed-off-by: WHOIM1205 <rathourprateek8@gmail.com>

k8s-ci-robot added the needs-ok-to-test label Jan 25, 2026

k8s-ci-robot added the size/L label Jan 25, 2026

serathius closed this Jan 26, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix ConfState divergence on crash during Raft ConfChange recovery #21190

Fix ConfState divergence on crash during Raft ConfChange recovery #21190

Uh oh!

WHOIM1205 commented Jan 25, 2026

Uh oh!

k8s-ci-robot commented Jan 25, 2026

Uh oh!

k8s-ci-robot commented Jan 25, 2026

Uh oh!

WHOIM1205 commented Jan 25, 2026

Uh oh!

serathius commented Jan 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

3 participants

Fix ConfState divergence on crash during Raft ConfChange recovery #21190

Fix ConfState divergence on crash during Raft ConfChange recovery #21190

Uh oh!

Conversation

WHOIM1205 commented Jan 25, 2026

Fix Raft ConfState Divergence After Crash During ConfChange Application

Summary

Problem Description

Why This Is Critical

Root Cause

Fix Overview

Key changes

Steps to Reproduce (Before Fix)

Verification (After Fix)

Tests Added

Impact

Notes for Reviewers

Uh oh!

k8s-ci-robot commented Jan 25, 2026

Uh oh!

k8s-ci-robot commented Jan 25, 2026

Uh oh!

WHOIM1205 commented Jan 25, 2026

Uh oh!

serathius commented Jan 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

3 participants