Add integ test for simulating node join left event when data node clu… #19907
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
…ster state publication lag because the cluster applier thread being busy.
Description
Add test for cluster stability to verify cluster becomes stable after node join-left loop in cluster due to cluster publication lag because of cluster state applier thread occupied by a cluster state listener for a long running task. This simulates the scenario where the cluster applier thread is busy with shard clean up activity leading to node drops because of publication lag.
Setup:
Creates 7-node cluster (1 cluster manager + 6 data nodes)
Adds slow cluster state listener to subset of data nodes (30s sleep)
Continuously moves shards between nodes to trigger cluster state changes
Verifies cluster remains stabilizes after the cluster state listener is removed.
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.