kvserver: use expiration lease in TestLeaseTransferReplicatesLocks#164312
kvserver: use expiration lease in TestLeaseTransferReplicatesLocks#164312wenyihu6 wants to merge 1 commit intocockroachdb:masterfrom
Conversation
|
Merging to
|
| defer tc.Stopper().Stop(ctx) | ||
|
|
||
| scratch := tc.ScratchRange(t) | ||
| scratch := tc.ScratchRangeWithExpirationLease(t) |
There was a problem hiding this comment.
I didn't follow the explanation for why this is a fix. Typically, when we try to prevent leadership loss in tests, we increase the election timeout:
cockroach/pkg/kv/kvserver/client_raft_test.go
Lines 1895 to 1898 in 402cb79
There was a problem hiding this comment.
Thanks for the pointer! My understanding was that the prior fix of using expiration leases would sidestep the uncooperative lease transfer with leader lease but wasn't sure if it was the best approach. Agreed that the fix you proposed provides better coverage. Updated.
There was a problem hiding this comment.
When we set the election timeout very high, isn't there a chance that an initial election stalemates, and then we're stuck unavailable because the timer for the next attempt never fires?
There was a problem hiding this comment.
The comment linked above references one such case: DisablePreCampaignStoreLivenessCheck fixes the case when election can't succeed due to missing store liveness signal [1]. Did you mean that bit, or some other scenario(s)?
We should likely use that second knob too. The UX of this is a bit unfortunate.
There was a problem hiding this comment.
There is also a problem: if store liveness blips long enough, we lose the leader permanently in this test.
This test was flaky because it used a regular scratch range, which could get a leader lease under metamorphic testing. With leader leases, the lease follows Raft leadership — so a Raft election (common under race builds) causes an uncooperative lease change that clears the lock table without exporting unreplicated locks. The subsequent cooperative transfer then finds no locks to export, causing the metric assertion to fail. This commit sets RaftElectionTimeoutTicks high in the test to prevent timeout based election. Fixes: cockroachdb#164262
This test was flaky because it used a regular scratch range, which
could get a leader lease under metamorphic testing. With leader
leases, the lease follows Raft leadership — so a Raft election
(common under race builds) causes an uncooperative lease change
that clears the lock table without exporting unreplicated locks.
The subsequent cooperative transfer then finds no locks to export,
causing the metric assertion to fail.
This commit switches to ScratchRangeWithExpirationLease to avoid
this race.
Fixes: #164262