Skip to content

Conversation

@strangiato
Copy link
Member

What does this PR do?

Provides example overlays that can be used with IBM Cloud

Checklist

  • You have completed the described test plan and included screenshots in the PR or comments showing the results
  • Relevant documentation has been updated
  • You have squashed commits (including commits to test from your branch) to be logical and reduce unnecessary commits

Test Plan

[Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.]

@strangiato
Copy link
Member Author

@fgharo I started a quick PR here to capture some of the efforts you guys made to get the AI Accelerator working with IBM cloud.

Would love for you follow up here and bring back over some of the other changes.

I see in your fork you did have some additional items like needing specific operator-group names for things like the GPU operator. We may consider just updating those names in the base folders since it doesn't really matter what the name is in other clusters.

I don't think we need to have a new IBM cloud cluster example available at this point, but getting some of the base config stuff captured here would be a great start for the next engagement we have to handle on IBM cloud.

@strangiato
Copy link
Member Author

@fgharo
Copy link
Contributor

fgharo commented Jul 29, 2025

We could potentially test with this demo environment:

https://catalog.demo.redhat.com/catalog?item=babylon-catalog-prod/ibm.ocp4-demo-rhoic.prod&utm_source=webapp&utm_medium=share-link

Trevor I tried testing this on the demo link you suggested. However, the cluster it comes with is of version 4.15 and ibmcloud requires atleast Openshift 4.16 as seen by the error I got when I attempted the installation please see below cli output.

$ ibmcloud oc cluster addon enable openshift-ai \
    --cluster rhpds \
    --param oaiInstallPlanApproval=Automatic \
    --param oaiCodeflare=Managed \
    --param oaiKserve=Managed \
    --param nvidiaCudaTest=true \
    --param pipelineEnabled=true \
    --param nvidiaEnablecatd=true \
    --param nfdEnabled=true
Enabling add-on openshift-ai for cluster rhpds...
...
FAILED
The 'openshift-ai' add-on is not supported on clusters that run version 4.15.51. The supported version range is >=4.16.0 <4.18.0. Create or update your cluster to a supported version for the add-on and try again. To review version migration actions, see 'http://ibm.biz/iks-versions' for Kubernetes or 'https://ibm.biz/roks-versions' for OpenShift. (E3419)

To see the add-on versions that are supported for your cluster version, run 'ibmcloud ks addon-versions'.

Incident ID: 2e1245cc-925f-4c4e-9fec-329ebaded231

I tried upgrading the version but my user does not have the type of access to upgrade the Openshift cluster. Please see the following screenshot.
unable-to-update-ocp-cluster

Nevertheless, I went ahead and still validated that the rest of the operators were able to be deployed on the cluster. Please see following screenshot.
argoappshealthy1

Finally, I know you wanted this completely in yaml however the ibm folks said that is not possible and I either have to use the ui or the ibmcloud oc cluster addon enable openshift-ai... above. Please see the following screenshot.

unable-to-install-ibmcloud-rhoai-via-yaml

I added some of this information to the bootstrap/overlays/rhoai-eus-2.16-ibmcloud-gpu/README.md file.

@vladi-rh
Copy link

@fgharo just to get clear idea: is the slack comment from IBM final blocker that renders this effort not mergable or we are waiting for improvement from IBM cloud side?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants