-
Notifications
You must be signed in to change notification settings - Fork 625
[Feature] Support Volcano Network Topology Aware Scheduling for kuberay #4105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
mode, modeOk := app.ObjectMeta.Labels[NetworkTopologyModeLabelKey] | ||
highestTier, tierOk := app.ObjectMeta.Labels[NetworkTopologyHighestTierAllowedLabelKey] | ||
if modeOk && tierOk { | ||
highestTierInt, err := strconv.Atoi(highestTier) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we handle this error so that the user will get a better understanding of what happened?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should. Changed my PR. Thanks for reviewing.
|
||
mode, modeOk := app.ObjectMeta.Labels[NetworkTopologyModeLabelKey] | ||
highestTier, tierOk := app.ObjectMeta.Labels[NetworkTopologyHighestTierAllowedLabelKey] | ||
if modeOk && tierOk { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to Network Topology Aware Scheduling Policy, the highestTierAllowed
is not required if mode
is soft. If the highestTierAllowed
is not set with soft mode, the NetworkTopologySpec
would not be set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a good catch.
Changed my PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should synchronize the NetworkTopology
field for PodGroup when updating an existing RayCluster labels. WDYT?
Thanks. @win5923 Good point.
|
We can do this in syncPodGroup, but I think you’re right. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the contribution, the changes look good to me.
But i think we should wait until #3972
is merged, as it includes the interface changes that might impact this implementation.
Thanks @win5923 Do you know when is the next release of the kuberay operator. Can my change be included? |
Nov. 1, 2025 (Branch Cut: Oct. 10). Yes, I believe this PR can be included in v1.5, as the changes are relatively minor and should be safe to merge before the cut-off. |
Sorry for the late reply. I'll review this PR asap. |
Why are these changes needed?
Support for Volcano Network Topology Aware Scheduling
Close issue. #3641
Test
Build a image and use this image for kuberay operator
Apply a raycluster with labels
The volcano podgroup has the field injected
Some other combinations of labels and result.
—
volcano.sh/network-topology-mode: hard
volcano.sh/network-topology-highest-tier-allowed: "2"
—
volcano.sh/network-topology-mode: hard
——
volcano.sh/network-topology-mode: soft
——
no label => no network topology
spec:
minMember: 20
minResources:
cpu: "160"
memory: 160Gi
nvidia.com/h100: "160"
priorityClassName: ml-tier1
queue: queue2
—-
"PodGroup.Error":"failed to convert volcano.sh/network-topology-highest-tier-allowed label to int: strconv.Atoi: parsing "abc": invalid syntax for podgroup ray-kwok-raycluster-h100-q2-soft-topology2-pg in namespace kuberay
Related issue number
Closes #3641
Checks