-
Notifications
You must be signed in to change notification settings - Fork 3.8k
Implement dynamic capacity for kubernetes task runner #18591
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Implement dynamic capacity for kubernetes task runner #18591
Conversation
@Override | ||
public boolean isSidecarSupport() | ||
{ | ||
return staticConfig.isSidecarSupport(); |
Check notice
Code scanning / CodeQL
Deprecated method or constructor invocation Note
KubernetesTaskRunnerStaticConfig.isSidecarSupport
...extensions/src/main/java/org/apache/druid/k8s/overlord/KubernetesTaskRunnerStaticConfig.java
Fixed
Show fixed
Hide fixed
...extensions/src/main/java/org/apache/druid/k8s/overlord/KubernetesTaskRunnerStaticConfig.java
Fixed
Show fixed
Hide fixed
...extensions/src/main/java/org/apache/druid/k8s/overlord/KubernetesTaskRunnerStaticConfig.java
Fixed
Show fixed
Hide fixed
...extensions/src/main/java/org/apache/druid/k8s/overlord/KubernetesTaskRunnerStaticConfig.java
Fixed
Show fixed
Hide fixed
...extensions/src/main/java/org/apache/druid/k8s/overlord/KubernetesTaskRunnerStaticConfig.java
Fixed
Show fixed
Hide fixed
...extensions/src/main/java/org/apache/druid/k8s/overlord/KubernetesTaskRunnerStaticConfig.java
Fixed
Show fixed
Hide fixed
...extensions/src/main/java/org/apache/druid/k8s/overlord/KubernetesTaskRunnerStaticConfig.java
Fixed
Show fixed
Hide fixed
...extensions/src/main/java/org/apache/druid/k8s/overlord/KubernetesTaskRunnerStaticConfig.java
Fixed
Show fixed
Hide fixed
...extensions/src/main/java/org/apache/druid/k8s/overlord/KubernetesTaskRunnerStaticConfig.java
Fixed
Show fixed
Hide fixed
...es-overlord-extensions/src/main/java/org/apache/druid/k8s/overlord/KubernetesTaskRunner.java
Outdated
Show resolved
Hide resolved
...d-extensions/src/test/java/org/apache/druid/k8s/overlord/taskadapter/K8sTaskAdapterTest.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, can remove the deprecated defaultIfNull
calls.
...extensions/src/main/java/org/apache/druid/k8s/overlord/KubernetesTaskRunnerStaticConfig.java
Show resolved
Hide resolved
...ensions/src/main/java/org/apache/druid/k8s/overlord/KubernetesTaskRunnerEffectiveConfig.java
Outdated
Show resolved
Hide resolved
@GabrielCWT , thanks for putting this together. I might be mistaken but I am not entirely sure if the task capacity of a cluster should be a dynamic config. Changing the task capacity should not be a very frequent requirement for any cluster. Could you please elaborate on why you feel the current setup is not adequate? |
Restarting the Overlord can be time-consuming and potentially risky as we could face issues when trying to redeploy the Overlord instance. Updating the task capacity dynamically allows admins to adjust the cluster safely, reducing operational downtime and complexity. While changes to task capacity are infrequent, I feel that providing a safer runtime option would help to minimize disruptions. |
I think it's true that changing the task slot for middle manager requires the restart as we might need a redeployment of middelmanager to some servers with bigger resources. but for K8S-based task scheduling, the resources is allocated at K8S side which is out of druid, restarting of the overlord does not make any sense, we should have the ablity to reload the capacity dynamically. As @GabrielCWT has stated above, restarting overlord is a heavy and risky operation in production. |
Thanks for the responses, @FrankChen021 , @GabrielCWT . Even though with the K8s task runner, the task pods are not technically a part of the Druid cluster, the Overlord still has to manage those tasks. The KubernetesTaskRunner itself keeps separate threads for each running task to track their status (this PR also updates that thread count, if I am not mistaken). So I should imagine that a major change in task capacity would also require some kind of scaling of the Overlord itself. Also, why is the Overlord restart "risky" or even "slow"? Doesn't a version upgrade require an Overlord restart too? |
In a k8s deployment, after increasing the task capability, increasing the overlord resources may not be always needed. For example, we generally set the cpu limit to a higher value while keeping the cpu request relatively low at the initial deployment, when capacity is increased, no need to increase the cpu resource. I mean risky because overlord needs to restore all tasks, previously we had some problems (maybe bug) that after switching leaders, overlord failed to elect a new leader. We try our best not to restart coordinator/overlord in production. |
Yes, there might be some bugs around that. Also, the K8s task runner makes certain list pod calls, which are pretty heavy
Oh, how frequently do you upgrade your cluster? I agree that K8s task runner is buggy and we should improve upon it. Instead, we should trying to fix up the actual problems in the task runner which make Overlord leader switch erroneous. What are your thoughts, @FrankChen021 ? |
We don't upgrade clusters very frequently, may be once a year or more than 1 year. We do adjust the capacity (upsize or downsize) regularly based on load/requirement.
The main idea of dynamic configuration is not to circumvent problems at restarting phase, it's about reducing the operation complexity and saving time. even restarting overlord is smooth, I don't think changing such configuration requires a restart from users/operators' view. for static configurations, operators have to change configuration files, sync files to kubenetes, restarting components, it's a heavy work flow. |
Thanks for the clarification, @FrankChen021 ! I am just a little apprehensive since the K8s task runner is already pretty buggy. I haven't gone through the whole PR yet. Will try to do a thorough review today. |
Hi @kfaraz are u reviewing this PR? I hope we can merge it into druid 35. |
} | ||
} | ||
|
||
private void syncCapacityWithDynamicConfig() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code is not maintainable.
Can we do this in a seperate thread instead of calling sync everywhere.
Is the currentCapacity thread safe ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is possible to make currentCapacity
to be thread safe.
Can we do this in a seperate thread instead of calling sync everywhere.
This was an alternative solution though I am not sure if we are thinking of the same implementation. My solution was to have a thread which periodically checks whether there has been any changes to the capacity and update currentCapacity
accordingly. However, I wasn't sure if it was a waste of resources as there would need to be a trade off between responsiveness (how quick will the changes be visible) and resources (since we need to wake the thread up every X seconds).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
using a separate thread to check is also a bad idea , it does not solve the real problem here but increase the complexity.
the real problem here is that the config manager does not provide a notification mechanism when it detects configuration changes.
If we look at the config manager implementation, it provides a method swapIfNew
to check and set new values. This is a place where we can add notification.
I think we can add a new overridden watch
method which accepts a Runnable
as callback.
This callback is kept in the internal ConfigHolder
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added observer support for ConfigHolders
Yes, @FrankChen021 , I will go through the changes either today or tomorrow. |
Description
The aim of the PR is to enable changing
capacity
for theKubernetesTaskRunner
. This would be done through the existing POST API/druid/indexer/v1/k8s/taskrunner/executionconfig
.K8 configuration changes
In order to do this, I have added a new interface
KubernetesTaskRunnerConfig
and renamed the existing config toKubernetesTaskRunnerStaticConfig
. The interface will be implemented by the existing static config and a newKubernetesTaskRunnerEffectiveConfig
which will be a wrapper class to encapsulate both the dynamic and static configs.The effective config will fall back to the static config's capacity if the dynamic config has not been set.
Changes to
/druid/indexer/v1/k8s/taskrunner/executionconfig
behaviourThe API will now take a new
capacity
field. On top of this, the fields will now be optional. If any field isnull
or not passed, we will use the existing dynamic config values.Release note
New
capacity
field for/druid/indexer/v1/k8s/taskrunner/executionconfig
POST API. It will change the capacity forKubernetesTaskRunner
.Challenges
In order to update the capacity for the task runner, I am calling a new function
syncCapacityWithDynamicConfig
before every task is run in order to update the thread pool to the newest config.The issue with this is that any changes by the user will not be immediately reflected on the web console's homepage under the "Tasks" widget. The "task slots" would only be updated after a new task has been run.
I could not find a way to add a callback to the updating of dynamic configurations and felt that having a check every few seconds to see if the dynamic configuration had been updated was unnecessarily complex. I am open to suggestions if there are better ways to update the task runner.