-
Notifications
You must be signed in to change notification settings - Fork 25.3k
Collect node thread pool usage for shard balancing #131480
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Collect node thread pool usage for shard balancing #131480
Conversation
Adds a new transport action to collect usage stats from the data nodes. ClusterInfoService uses the action to pull thread pool usage information from the data nodes to the master node periodically. Also removes NodeUsageStatsForThreadPoolsCollector as an interface/plugin and replaces it with a single class implementation. Closes ES-12316
Pinging @elastic/es-distributed-coordination (Team:Distributed Coordination) |
/** | ||
* Defines the request/response types for {@link TransportNodeUsageStatsForThreadPoolsAction}. | ||
*/ | ||
public class NodeUsageStatsForThreadPoolsAction { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a TransportNodesStatsAction which can produce thread-pool usage. Do we need a separate action for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The calls to collect the thread pool stats are destructive. For example, collecting the max queue latency seen since the last call and then resetting max seen to zero. Pool utilization is also destructive, resetting an execution time tracker after collection. So we can't hook the new stats up to the TransportNodesStatsAction API and have random callers clearing the state we'll need for allocation.
Adds a new transport action to collect usage stats from the
data nodes. ClusterInfoService uses the action to pull thread
pool usage information from the data nodes to the master node
periodically.
Also removes NodeUsageStatsForThreadPoolsCollector as
a plugin interface and replaces it with a single class
implementation.
Closes ES-12316