Skip to content

Conversation

DiannaHohensee
Copy link
Contributor

@DiannaHohensee DiannaHohensee commented Oct 21, 2025

The write load monitor will no longer attempt to address
hot-spots with a reroute request if there are no nodes
below the queue latency threshold to receive relocated
shards.

Excludes search role nodes when considering hot-spotting
nodes and relocation target nodes.

Closes ES-13237

@DiannaHohensee DiannaHohensee self-assigned this Oct 21, 2025
@DiannaHohensee DiannaHohensee added >non-issue :Distributed Coordination/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) Team:Distributed Coordination Meta label for Distributed Coordination team v9.3.0 labels Oct 21, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed-coordination (Team:Distributed Coordination)

@DiannaHohensee
Copy link
Contributor Author

I'm starting to wonder whether we'd want to check for index vs search nodes in this.. We could have a cluster with 1 index node and 1 search node, and try to rebalance the hot-spot.

I was just discussing DiscoveryNodeRole with Zhubo for the IndexBalanceAllocationDecider, and there are SEARCH_ROLE and INDEX_ROLE roles defined in stateful. So I believe it is doable.

@DiannaHohensee
Copy link
Contributor Author

I've got the index vs search roles handled now. Ready for another look.

* state contains #(numberOfIndexNodes) nodes with {@link DiscoveryNodeRole#INDEX_ROLE}, assigning the primary shards to those nodes,
* and #(numberOfSearchNodes) nodes with {@link DiscoveryNodeRole#SEARCH_ROLE}.
*/
public static ClusterState buildServerlessRoleNodes(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zhubotang-wq this new utility method might be useful in your work, too, if you didn't already build something.

public static final Map<String, ThreadPoolUsageStats> ZERO_USAGE_THREAD_POOL_USAGE_MAP = Map.of(
ThreadPool.Names.WRITE,
new NodeUsageStatsForThreadPools.ThreadPoolUsageStats(5, 0, 0)
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like test code in production classes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved the to the test file 👍

int numberOfSearchNodes,
int numberOfHotSpottingNodes
) {
final long queueLatencyThresholdMillis = randomLongBetween(1000, 5000);
Copy link
Contributor

@nicktindall nicktindall Oct 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: should we assert that numberOfHotSpottingNodes <= numberOfIndexNodes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done 👍

return state.build();
}

public record IndexState(IndexMetadata indexMetadata, IndexRoutingTable.Builder indexRoutingTableBuilder) {}
Copy link
Contributor

@nicktindall nicktindall Oct 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: could this record be private?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done 👍

Copy link
Contributor

@nicktindall nicktindall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM with a few nits

@DiannaHohensee DiannaHohensee added the auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) label Oct 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) :Distributed Coordination/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) >non-issue Team:Distributed Coordination Meta label for Distributed Coordination team v9.3.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants