-
Notifications
You must be signed in to change notification settings - Fork 25.6k
WriteLoadConstraintMonitor will not call reroute if no nodes are below threshold #136925
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Pinging @elastic/es-distributed-coordination (Team:Distributed Coordination) |
…w threshold Closes ES-13237
cd4b73a
to
3270aa5
Compare
I'm starting to wonder whether we'd want to check for index vs search nodes in this.. We could have a cluster with 1 index node and 1 search node, and try to rebalance the hot-spot. I was just discussing DiscoveryNodeRole with Zhubo for the IndexBalanceAllocationDecider, and there are SEARCH_ROLE and INDEX_ROLE roles defined in stateful. So I believe it is doable. |
...r/src/main/java/org/elasticsearch/cluster/routing/allocation/WriteLoadConstraintMonitor.java
Show resolved
Hide resolved
I've got the index vs search roles handled now. Ready for another look. |
* state contains #(numberOfIndexNodes) nodes with {@link DiscoveryNodeRole#INDEX_ROLE}, assigning the primary shards to those nodes, | ||
* and #(numberOfSearchNodes) nodes with {@link DiscoveryNodeRole#SEARCH_ROLE}. | ||
*/ | ||
public static ClusterState buildServerlessRoleNodes( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zhubotang-wq this new utility method might be useful in your work, too, if you didn't already build something.
public static final Map<String, ThreadPoolUsageStats> ZERO_USAGE_THREAD_POOL_USAGE_MAP = Map.of( | ||
ThreadPool.Names.WRITE, | ||
new NodeUsageStatsForThreadPools.ThreadPoolUsageStats(5, 0, 0) | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like test code in production classes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved the to the test file 👍
int numberOfSearchNodes, | ||
int numberOfHotSpottingNodes | ||
) { | ||
final long queueLatencyThresholdMillis = randomLongBetween(1000, 5000); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: should we assert that numberOfHotSpottingNodes <= numberOfIndexNodes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done 👍
return state.build(); | ||
} | ||
|
||
public record IndexState(IndexMetadata indexMetadata, IndexRoutingTable.Builder indexRoutingTableBuilder) {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: could this record be private?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This LGTM with a few nits
The write load monitor will no longer attempt to address
hot-spots with a reroute request if there are no nodes
below the queue latency threshold to receive relocated
shards.
Excludes search role nodes when considering hot-spotting
nodes and relocation target nodes.
Closes ES-13237