[Test] Put shutdown marker on the last upgraded node only #132157

ywangd · 2025-07-30T08:18:24Z

This is the same failure as observed in #129644 for which the original fix #129680 did not really work. It did not work because the the ordering of checks. The shutdown marker is removed after the cluster passes ready check so that new shards can be allocated. But the cluster cannot pass the ready check before the shards are allocated. Hence the circular dependency. In hindsight, there is no need to put shutdown record for all nodes. It is only needed on the node that upgrades the last to prevent snapshot from completion during the upgrade process. This PR does that which ensures there are always 2 nodes for hosting new shards.

Resolves: #132135
Resolves: #132136
Resolves: #132137

This is the same failure as observed in elastic#129644 for which the original fix elastic#129680 did not really work. It did not work because the the ordering of checks. The shutdown marker is removed after the cluster passes ready check so that new shards can be allocated. But the cluster cannot pass the ready check before the shards are allocated. Hence the circular dependency. In hindsight, there is no need to put shutdown record for all nodes. It is only needed on the node that upgrades the last to prevent snapshot from completion during the upgrade process. This PR does that which ensures there are always 2 nodes for hosting new shards. Resolves: elastic#132135 Resolves: elastic#132136 Resolves: elastic#132137

elasticsearchmachine · 2025-07-30T08:18:50Z

Pinging @elastic/es-distributed-coordination (Team:Distributed Coordination)

nicktindall

LGTM

ywangd · 2025-07-31T04:07:03Z

@elasticmachine update branch

elasticsearchmachine · 2025-07-31T05:13:49Z

💔 Backport failed

Status	Branch	Result
❌	9.1	Commit could not be cherrypicked due to conflicts

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 132157

ywangd · 2025-07-31T05:23:40Z

💚 All backports created successfully

Status	Branch	Result
✅	9.1

Questions ?

Please refer to the Backport tool documentation

…2157) This is the same failure as observed in elastic#129644 for which the original fix elastic#129680 did not really work. It did not work because the the ordering of checks. The shutdown marker is removed after the cluster passes ready check so that new shards can be allocated. But the cluster cannot pass the ready check before the shards are allocated. Hence the circular dependency. In hindsight, there is no need to put shutdown record for all nodes. It is only needed on the node that upgrades the last to prevent snapshot from completion during the upgrade process. This PR does that which ensures there are always 2 nodes for hosting new shards. Resolves: elastic#132135 Resolves: elastic#132136 Resolves: elastic#132137 (cherry picked from commit f39ccb5) # Conflicts: # muted-tests.yml

…132233) This is the same failure as observed in #129644 for which the original fix #129680 did not really work. It did not work because the the ordering of checks. The shutdown marker is removed after the cluster passes ready check so that new shards can be allocated. But the cluster cannot pass the ready check before the shards are allocated. Hence the circular dependency. In hindsight, there is no need to put shutdown record for all nodes. It is only needed on the node that upgrades the last to prevent snapshot from completion during the upgrade process. This PR does that which ensures there are always 2 nodes for hosting new shards. Resolves: #132135 Resolves: #132136 Resolves: #132137 (cherry picked from commit f39ccb5) # Conflicts: # muted-tests.yml

…2157) This is the same failure as observed in elastic#129644 for which the original fix elastic#129680 did not really work. It did not work because the the ordering of checks. The shutdown marker is removed after the cluster passes ready check so that new shards can be allocated. But the cluster cannot pass the ready check before the shards are allocated. Hence the circular dependency. In hindsight, there is no need to put shutdown record for all nodes. It is only needed on the node that upgrades the last to prevent snapshot from completion during the upgrade process. This PR does that which ensures there are always 2 nodes for hosting new shards. Resolves: elastic#132135 Resolves: elastic#132136 Resolves: elastic#132137

ywangd requested a review from nicktindall July 30, 2025 08:18

ywangd added >test Issues or PRs that are addressing/adding tests :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs auto-backport Automatically create backport pull requests when merged v9.2.0 v9.1.1 labels Jul 30, 2025

elasticsearchmachine added the Team:Distributed Coordination Meta label for Distributed Coordination team label Jul 30, 2025

ywangd added 2 commits July 30, 2025 18:19

Merge remote-tracking branch 'origin/main' into es-132135-fix

b2ec76c

unmute

7ee98f7

nicktindall approved these changes Jul 31, 2025

View reviewed changes

Merge branch 'main' into es-132135-fix

dee9adc

ywangd added the auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) label Jul 31, 2025

elasticsearchmachine merged commit f39ccb5 into elastic:main Jul 31, 2025
33 checks passed

ywangd deleted the es-132135-fix branch July 31, 2025 05:13

elasticsearchmachine added the backport pending label Jul 31, 2025

ywangd mentioned this pull request Jul 31, 2025

[9.1] [Test] Put shutdown marker on the last upgraded node only (#132157) #132233

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Test] Put shutdown marker on the last upgraded node only #132157

[Test] Put shutdown marker on the last upgraded node only #132157

Uh oh!

ywangd commented Jul 30, 2025

Uh oh!

elasticsearchmachine commented Jul 30, 2025

Uh oh!

nicktindall left a comment

Uh oh!

ywangd commented Jul 31, 2025

Uh oh!

Uh oh!

elasticsearchmachine commented Jul 31, 2025

Uh oh!

ywangd commented Jul 31, 2025

Uh oh!

Uh oh!

[Test] Put shutdown marker on the last upgraded node only #132157

[Test] Put shutdown marker on the last upgraded node only #132157

Uh oh!

Conversation

ywangd commented Jul 30, 2025

Uh oh!

elasticsearchmachine commented Jul 30, 2025

Uh oh!

nicktindall left a comment

Choose a reason for hiding this comment

Uh oh!

ywangd commented Jul 31, 2025

Uh oh!

Uh oh!

elasticsearchmachine commented Jul 31, 2025

💔 Backport failed

Uh oh!

ywangd commented Jul 31, 2025

💚 All backports created successfully

Questions ?

Uh oh!

Uh oh!