-
Notifications
You must be signed in to change notification settings - Fork 146
feat: pod overlay for kubernetes scheduler (#1067,#1068) #1148
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Please take a look @kiukchung and @d4l3k and let me know if this something that looks good. I'll then include more testing evidence if that's the case. |
5d53679
to
1eaaa28
Compare
1eaaa28
to
ce37a51
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #1148 +/- ##
==========================================
- Coverage 91.65% 91.59% -0.07%
==========================================
Files 83 83
Lines 6483 6615 +132
==========================================
+ Hits 5942 6059 +117
- Misses 541 556 +15
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
ce37a51
to
661bfa3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
--
thanks for the update! LGTM. One small comment:
|
That's a good question, @kiukchung ! Technically we override keys for dicts so by default it would perhaps be least surprising if we also override the list but I think we can add a support for append also.
role.metadata = {
"kubernetes": lambda pod_dict: {
**pod_dict,
"spec": {
**pod_dict["spec"],
"tolerations": pod_dict["spec"].get("tolerations", []) + [{"key": "new"}]
}
}
} and we can load yaml or whatever using code: import yaml
import fsspec
with fsspec.open("file:///path/to/overlay.yaml", "r") as f:
overlay_dict = yaml.safe_load(f)
role.metadata={
"kubernetes": lambda pod: {
**pod,
"spec": {
**pod["spec"],
**overlay_dict.get("spec", {}),
"tolerations": pod["spec"].get("tolerations", []) + overlay_dict.get("spec", {}).get("tolerations", [])
}
}
}
{
"kubernetes": [
{
"op": "add",
"path": "/spec/nodeSelector/gpu",
"value": "true"
},
{
"op": "add",
"path": "/spec/tolerations/-",
"value": {
"key": "nvidia.com/gpu",
"operator": "Exists"
}
}
]
} I don't want to invent anything new here @kiukchung , it should be ideally some standard tools only, so perhaps a lambda would be ideal, especially since TorchX is investing in programmatic approaches. |
661bfa3
to
483f7af
Compare
Adding support for custom kubernetes pod overlay.
This fixes #1067 and #1068.
This way we don't need to add a runopts for each spec parameter.
Test plan:
[x] added unit tests