Skip to content

kvserver: use expiration lease in TestLeaseTransferReplicatesLocks#164312

Open
wenyihu6 wants to merge 1 commit intocockroachdb:masterfrom
wenyihu6:res
Open

kvserver: use expiration lease in TestLeaseTransferReplicatesLocks#164312
wenyihu6 wants to merge 1 commit intocockroachdb:masterfrom
wenyihu6:res

Conversation

@wenyihu6
Copy link
Contributor

This test was flaky because it used a regular scratch range, which
could get a leader lease under metamorphic testing. With leader
leases, the lease follows Raft leadership — so a Raft election
(common under race builds) causes an uncooperative lease change
that clears the lock table without exporting unreplicated locks.
The subsequent cooperative transfer then finds no locks to export,
causing the metric assertion to fail.

This commit switches to ScratchRangeWithExpirationLease to avoid
this race.

Fixes: #164262

@wenyihu6 wenyihu6 requested a review from a team as a code owner February 24, 2026 23:13
@trunk-io
Copy link
Contributor

trunk-io bot commented Feb 24, 2026

Merging to master in this repository is managed by Trunk.

  • To merge this pull request, check the box to the left or comment /trunk merge below.

@wenyihu6 wenyihu6 requested a review from stevendanna February 24, 2026 23:13
@cockroach-teamcity
Copy link
Member

This change is Reviewable

defer tc.Stopper().Stop(ctx)

scratch := tc.ScratchRange(t)
scratch := tc.ScratchRangeWithExpirationLease(t)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't follow the explanation for why this is a fix. Typically, when we try to prevent leadership loss in tests, we increase the election timeout:

// Suppress timeout-based elections to avoid leadership changes in ways
// this test doesn't expect. See DisablePreCampaignStoreLivenessCheck,
// which counteracts this knob's effect on leader leases.
RaftElectionTimeoutTicks: 1000000,

Copy link
Contributor Author

@wenyihu6 wenyihu6 Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the pointer! My understanding was that the prior fix of using expiration leases would sidestep the uncooperative lease transfer with leader lease but wasn't sure if it was the best approach. Agreed that the fix you proposed provides better coverage. Updated.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we set the election timeout very high, isn't there a chance that an initial election stalemates, and then we're stuck unavailable because the timer for the next attempt never fires?

Copy link
Collaborator

@pav-kv pav-kv Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment linked above references one such case: DisablePreCampaignStoreLivenessCheck fixes the case when election can't succeed due to missing store liveness signal [1]. Did you mean that bit, or some other scenario(s)?

We should likely use that second knob too. The UX of this is a bit unfortunate.

Copy link
Collaborator

@pav-kv pav-kv Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is also a problem: if store liveness blips long enough, we lose the leader permanently in this test.

This test was flaky because it used a regular scratch range, which
could get a leader lease under metamorphic testing. With leader
leases, the lease follows Raft leadership — so a Raft election
(common under race builds) causes an uncooperative lease change
that clears the lock table without exporting unreplicated locks.
The subsequent cooperative transfer then finds no locks to export,
causing the metric assertion to fail.

This commit sets RaftElectionTimeoutTicks high in the test to
prevent timeout based election.

Fixes: cockroachdb#164262
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

kv/kvserver: TestLeaseTransferReplicatesLocks failed

4 participants