Skip to content

Conversation

@SwethaGuptha
Copy link
Contributor

…ster state publication lag because the cluster applier thread being busy.

Description

Add test for cluster stability to verify cluster becomes stable after node join-left loop in cluster due to cluster publication lag because of cluster state applier thread occupied by a cluster state listener for a long running task. This simulates the scenario where the cluster applier thread is busy with shard clean up activity leading to node drops because of publication lag.

Setup:

  • Creates 7-node cluster (1 cluster manager + 6 data nodes)

  • Adds slow cluster state listener to subset of data nodes (30s sleep)

  • Continuously moves shards between nodes to trigger cluster state changes

  • Verifies cluster remains stabilizes after the cluster state listener is removed.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

…ster state publication lag because the cluster applier thread being busy.

Signed-off-by: Swetha Guptha <[email protected]>
@SwethaGuptha SwethaGuptha requested a review from a team as a code owner November 6, 2025 09:35
@github-actions
Copy link
Contributor

github-actions bot commented Nov 6, 2025

❌ Gradle check result for 9d64a60: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant