Skip to content

feat(katib-sdk): support node_selector in Katib Python SDK#2606

Open
Priyanshu-u07 wants to merge 2 commits intokubeflow:masterfrom
Priyanshu-u07:katib-sdk-node-selector
Open

feat(katib-sdk): support node_selector in Katib Python SDK#2606
Priyanshu-u07 wants to merge 2 commits intokubeflow:masterfrom
Priyanshu-u07:katib-sdk-node-selector

Conversation

@Priyanshu-u07
Copy link

@Priyanshu-u07 Priyanshu-u07 commented Jan 27, 2026

Description

This PR adds support for configuring Kubernetes nodeSelector in the Katib Python SDK.

Currently, users cannot control node placement for Katib trial pods when using the Python SDK, even though Kubernetes supports nodeSelector at the Pod level. This is especially limiting for workloads that require specific nodes (e.g. GPU, high-memory, or custom-labeled nodes).

The change introduces an optional node_selector parameter to KatibClient.tune(), which injects the provided nodeSelector into the TrialTemplate PodSpec. The scheduling is then handled natively by Kubernetes without requiring any backend changes.

This improves flexibility and brings the Python SDK closer to feature parity with Kubernetes-native workflows.

Changes Included

   sdk/python/v1beta1/kubeflow/katib/api/katib_client.py:
         Added a node_selector parameter to the tune method to allow users to specify node or hardware constraints.
         Added validation to ensure node_selector is a dictionary of string key-value pairs.
         Updated Trial template generation to inject nodeSelector into the Pod spec for both Job and PyTorchJob trials.

Tests Included

   sdk/python/v1beta1/kubeflow/katib/api/katib_client_test.py:
         Validation Unit Tests: Added cases to test_tune_data to verify that node_selector only accepts a dict of strings.
         Functional Tests for Jobs: Added verification in test_tune to ensure node selectors are correctly injected into the Pod specification for standard Jobs.
         Functional Tests for Distributed Training: Added verification in test_tune to ensure node selectors are correctly applied to all replicas (Master and Workers) in PyTorchJob templates.

Manual Verification

Installed Katib in a local Minikube cluster.

Labeled the node with a custom label (e.g. gpu=nvidia-a100).

Created a Katib experiment using KatibClient.tune() with the node_selector parameter.

Verified that Trial Pods were scheduled only on nodes matching the specified node selector using:
kubectl get pods -n kubeflow -o

Confirmed that the nodeSelector field was correctly injected into the generated Trial PodSpec.

Fixes #2603

Checklist:

  • Docs included if any changes are user facing

Signed-off-by: Priyanshu-u07 <connect.priyanshu8271@gmail.com>
@google-oss-prow
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign johnugeorge for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@github-actions
Copy link

🎉 Welcome to the Kubeflow Katib repo! 🎉

Thanks for opening your first PR! We're excited to have you onboard 🚀

Next steps:

Feel free to ask questions in the comments. Thanks again for contributing! 🙏

Signed-off-by: Priyanshu-u07 <connect.priyanshu8271@gmail.com>
@Priyanshu-u07
Copy link
Author

@andreyvelich please check once

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add node selector configuration for Python SDK (katib.tune)

1 participant