This guide provides a demonstration of how to get up and running with llm-d
on RHOAI based on https://access.redhat.com/articles/7131048.
- OpenShift 4.18+
- role:
cluster-admin
- role:
Red Hat Demo Platform Options (Tested)
NOTE: The node sizes below are the recommended minimum to select for provisioning
- AWS with OpenShift Open Environment
- 1 x Control Plane -
m6a.2xlarge
- 0 x Workers -
m6a.4xlarge
- 1 x Control Plane -
- Red Hat OpenShift Container Platform Cluster (AWS)
- 1 x Control Plane
until oc apply -k demo/llm-d; do : ; done
See ocp-4.18 for installation of llm-d
dependencies
until oc apply -k gitops/ocp-4.18; do : ; done
- Notes
- Deploying a model by using the Distributed Inference Server with llm-d
- LLM-D: GPU-Accelerated Cache-Aware LLM Inference
- Demystifying Inferencing at Scale with LLM-D on Red Hat Openshift on IBM Cloud
- OAI Release Notes - 2.24
- Verify this works in a bare metal install (istio)
- Fix
oc apply -f gitops/instance/llm-d/deployment.yaml
- `no matches for kind "LLMInferenceService" in version "serving.kserve.io/v1alpha1"
- https://github.com/kserve/kserve/blob/master/config/crd/kustomization.yaml