RHOAI 2.22 GPU as a Service Lab overlay #152

shebistar · 2025-09-19T12:57:40Z

What does this PR do?

Add a new overlay for the GPU as a Service Lab, using RHOAI 2.22 and OCP 4.19

Test Plan

Deploy a new demo cluster --> execute bootstrap --> select rhoai-stable-2.22-aws-gpu-time-sliced bootstrap.

strangiato · 2025-09-19T21:41:24Z

...ents/operators/gpu-operator-certified/instance/components/time-sliced/kustomization.ori.yaml

@@ -0,0 +1,11 @@
+apiVersion: kustomize.config.k8s.io/v1alpha1


This may be a duplicate file?

strangiato · 2025-09-19T21:53:05Z

components/operators/gpu-operator-certified/instance/components/time-sliced/job.yaml

@@ -0,0 +1,40 @@
+---
+apiVersion: batch/v1
+kind: Job


Is this different from the aws-gpu-machineset component?

It looks like it is deploying the same GPU instance type and my quick check doesn't seem to deploy something different.

If it is different I would prefer to move this portion into its own component.

The main reason for that is that the aws-gpu job is really a work around for us to get GPUs in the demo environment and not something that we would generally use in a customer environment. We want to easily remove the job that creates the GPU machinesets but we may still want to use time-slicing.

strangiato

Everything looks pretty good.

I made a few minor nitpicks that I would appreciate your thoughts on to help make sure we can keep our options flexible in the future.

strangiato · 2025-09-19T22:23:40Z

components/operators/openshift-ai/instance/components/kueue-operator/README.md

@@ -0,0 +1,25 @@
+# components-distributed-compute


Don't forget to update the readme

strangiato · 2025-09-19T22:28:46Z

...ents/operators/openshift-ai/instance/components/kueue-operator/patch-datasciencecluster.yaml

+  components:
+    codeflare:
+      managementState: Managed
+    kueue:


I'm wondering if it would make more sense to call this something besides kueue-operator since this is kind of the opposite.

I wouldn't mind updating the current component-distributed-compute to disable the kueue operator and including the Red Hat build of Kueue in the AI Accelerator by default since that will be the preferred method moving forward.

I don't think we have any sort of backwards compatibility concerns since keueu/distributed compute was not really used in any of the existing examples.

Thoughts?

Rodriguez Isaziga, Sebastian (ext) (DI IT DEMA ALM 1) added 5 commits September 19, 2025 14:32

automatic update to branch by bootstrap script

0a9d211

automatic update to repo by bootstrap script

3972ba1

Add rhoai-stable-2.22-aws-gpu-time-sliced overlay

8e70aa7

Update to gpu-time-sliced values

2fea019

Update app-of-apps to redhat-ai-services repo

ca43b77

shebistar requested a review from a team as a code owner September 19, 2025 12:57

Rodriguez Isaziga, Sebastian (ext) (DI IT DEMA ALM 1) added 3 commits September 19, 2025 15:04

Add Kueue Operator to the RHOAI instance folder

504b66a

automatic update to branch by bootstrap script

badb4d0

automatic update to repo by bootstrap script

6171c43

strangiato reviewed Sep 19, 2025

View reviewed changes

strangiato requested changes Sep 19, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

RHOAI 2.22 GPU as a Service Lab overlay #152

RHOAI 2.22 GPU as a Service Lab overlay #152

Uh oh!

shebistar commented Sep 19, 2025

Uh oh!

strangiato Sep 19, 2025

Uh oh!

strangiato Sep 19, 2025

Uh oh!

strangiato left a comment

Uh oh!

strangiato Sep 19, 2025

Uh oh!

strangiato Sep 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		@@ -0,0 +1,11 @@
		apiVersion: kustomize.config.k8s.io/v1alpha1

Uh oh!

RHOAI 2.22 GPU as a Service Lab overlay #152

Are you sure you want to change the base?

RHOAI 2.22 GPU as a Service Lab overlay #152

Uh oh!

Conversation

shebistar commented Sep 19, 2025

What does this PR do?

Test Plan

Uh oh!

strangiato Sep 19, 2025

Choose a reason for hiding this comment

Uh oh!

strangiato Sep 19, 2025

Choose a reason for hiding this comment

Uh oh!

strangiato left a comment

Choose a reason for hiding this comment

Uh oh!

strangiato Sep 19, 2025

Choose a reason for hiding this comment

Uh oh!

strangiato Sep 19, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants