`llm-d` Deployment Guide

This guide provides a demonstration of how to get up and running with llm-d on RHOAI based on https://access.redhat.com/articles/7131048.

Prerequisites

Red Hat OpenShift AI 2.24

Install Infra Prereqs

OpenShift 4.18 - see ocp-4-18-setup for manual installation of llm-d dependencies
OpenShift 4.19 - dependencies needed for llm-d are shipped in OCP 4.19

Configure RHOAI to Disable Knative Serving

RHOAI 2.x leverages Knative Serving by default. The following configurations disable Knative.

`DSCInitialization`

Set the serviceMesh.managementState to removed, as shown in the following example (this requires an admin role):

serviceMesh:
    ...
    managementState: Removed

You can do this through the RHOAI UI as shown below:

Click to expand

`DSC`

Create a data science cluster (DSC) with the following information set in kserve and serving:

kserve:
    defaultDeploymentMode: RawDeployment
    managementState: Managed
    ...
    serving:
        ...
        managementState: Removed
        ...

You can create the DSC through the RHOAI UI as shown below, using the dsc.yaml provided in this repo:

Click to expand

Deploy A Gateway

llm-d leverages Gateway API Inference Extension.

As described in Getting Started with Gateway API for the Ingress Operator, we can can deploy a GatewayClass and Gateway named named openshift-ai-inference in the openshift-ingress namespace:

oc apply -f gateway.yaml

We can see the Gateway is deployed:

oc get gateways -n openshift-ingress

>> NAME                     CLASS   ADDRESS                                                            PROGRAMMED   AGE
>> openshift-ai-inference   istio   openshift-ai-inference-istio.openshift-ingress.svc.cluster.local   True         9d

Deploy An LLMService with `llm-d`

With the gateway deployed, we can now deploy an LLMInferenceService using KServe, which creates an infernece pool of vLLM servers and an end-point-picker (EPP) for smart scheduling across the vLLM servers.

The deployment.yaml contains a sample manifest for deploying:

oc create ns llm-test
oc apply -f deployment.yaml -n llm-test

We can see the llminferenceservice is deployed ...

oc get llminferenceservice -n llm-test

>> NAME   URL   READY   REASON   AGE
>> qwen         True             9m44s

... and that the router-scheduler and vllm pods are ready to go:

oc get pods -n llm-test

>> NAME                                            READY   STATUS    RESTARTS   AGE
>> qwen-kserve-c59dbf75-5ztf2                      1/1     Running   0          9m15s
>> qwen-kserve-c59dbf75-dlfj6                      1/1     Running   0          9m15s
>> qwen-kserve-router-scheduler-67dbbfb947-hn7ln   1/1     Running   0          9m15s

We can query the model at the gateway's address:

curl -X POST http://openshift-ai-inference-istio.openshift-ingress.svc.cluster.local/llm-test/qwen/v1/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "Qwen/Qwen3-0.6B",
    "prompt": "Explain the difference between supervised and unsupervised learning in machine learning. Include examples of algorithms used in each type.",
    "max_tokens": 200,
    "temperature": 0.7,
    "top_p": 0.9
  }'

Cleanup

oc delete llminferenceservice qwen -n llm-test

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
images		images
ocp-4-18-setup		ocp-4-18-setup
pd-disaggregation		pd-disaggregation
README.md		README.md
deployment.yaml		deployment.yaml
dsc.yaml		dsc.yaml
gateway.yaml		gateway.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

`llm-d` Deployment Guide

Prerequisites

Install Infra Prereqs

Configure RHOAI to Disable Knative Serving

`DSCInitialization`

`DSC`

Deploy A Gateway

Deploy An LLMService with `llm-d`

Cleanup

About

Uh oh!

Releases

Packages

robertgshaw2-redhat/llm-d-demo

Folders and files

Latest commit

History

Repository files navigation

llm-d Deployment Guide

Prerequisites

Install Infra Prereqs

Configure RHOAI to Disable Knative Serving

DSCInitialization

DSC

Deploy A Gateway

Deploy An LLMService with llm-d

Cleanup

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

`llm-d` Deployment Guide

`DSCInitialization`

`DSC`

Deploy An LLMService with `llm-d`

Packages