Skip to content

Commit f9d1346

Browse files
committed
add init models fall back
Signed-off-by: JaredforReal <[email protected]>
1 parent a245d5c commit f9d1346

File tree

9 files changed

+171
-203
lines changed

9 files changed

+171
-203
lines changed

deploy/kubernetes/deployment.yaml renamed to deploy/kubernetes/base/deployment.yaml

Lines changed: 5 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -77,7 +77,6 @@ spec:
7777
env:
7878
- name: HF_HUB_CACHE
7979
value: /tmp/hf_cache
80-
# Reduced resource requirements for init container
8180
resources:
8281
requests:
8382
memory: "512Mi"
@@ -91,6 +90,7 @@ spec:
9190
containers:
9291
- name: semantic-router
9392
image: ghcr.io/vllm-project/semantic-router/extproc:latest
93+
imagePullPolicy: IfNotPresent
9494
args: ["--secure=true"]
9595
securityContext:
9696
runAsNonRoot: false
@@ -128,14 +128,13 @@ spec:
128128
periodSeconds: 30
129129
timeoutSeconds: 10
130130
failureThreshold: 3
131-
# Significantly reduced resource requirements for kind cluster
132131
resources:
133132
requests:
134-
memory: "3Gi" # Reduced from 8Gi
135-
cpu: "1" # Reduced from 2
133+
memory: "3Gi"
134+
cpu: "1"
136135
limits:
137-
memory: "6Gi" # Reduced from 12Gi
138-
cpu: "2" # Reduced from 4
136+
memory: "6Gi"
137+
cpu: "2"
139138
volumes:
140139
- name: config-volume
141140
configMap:

deploy/kubernetes/base/kustomization.yaml

Lines changed: 1 addition & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@ resources:
55
- ./namespace.yaml
66
- ./pvc.yaml
77
- ./service.yaml
8+
- ./deployment.yaml
89

910
configMapGenerator:
1011
- name: semantic-router-config
@@ -13,7 +14,3 @@ configMapGenerator:
1314
- ./tools_db.json
1415

1516
namespace: vllm-semantic-router-system
16-
17-
images:
18-
- name: ghcr.io/vllm-project/semantic-router/extproc
19-
newTag: latest
Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
apiVersion: v1
2+
kind: PersistentVolume
3+
metadata:
4+
name: semantic-router-models-pv
5+
labels:
6+
app: semantic-router
7+
spec:
8+
capacity:
9+
storage: 50Gi
10+
accessModes:
11+
- ReadWriteOnce
12+
storageClassName: standard
13+
persistentVolumeReclaimPolicy: Retain
14+
hostPath:
15+
path: /tmp/hostpath-provisioner/models
16+
type: DirectoryOrCreate

deploy/kubernetes/deployment.katan.yaml

Lines changed: 0 additions & 181 deletions
This file was deleted.

deploy/kubernetes/overlays/core/kustomization.yaml

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,4 +3,3 @@ kind: Kustomization
33

44
resources:
55
- ../../base
6-
- ../../deployment.yaml

deploy/kubernetes/overlays/llm-katan/kustomization.yaml

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,4 +3,9 @@ kind: Kustomization
33

44
resources:
55
- ../../base
6-
- ../../deployment.katan.yaml
6+
7+
patches:
8+
- target:
9+
kind: Deployment
10+
name: semantic-router
11+
path: patch-llm-katan.yaml
Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
apiVersion: apps/v1
2+
kind: Deployment
3+
metadata:
4+
name: semantic-router
5+
spec:
6+
template:
7+
spec:
8+
containers:
9+
- name: semantic-router
10+
imagePullPolicy: IfNotPresent
11+
- name: llm-katan
12+
image: ghcr.io/vllm-project/semantic-router/llm-katan:latest
13+
imagePullPolicy: IfNotPresent
14+
args:
15+
- llm-katan
16+
- --model
17+
- /app/models/Qwen/Qwen3-0.6B
18+
- --served-model-name
19+
- qwen3
20+
- --host
21+
- 0.0.0.0
22+
- --port
23+
- "8002"
24+
ports:
25+
- containerPort: 8002
26+
name: katan
27+
protocol: TCP
28+
volumeMounts:
29+
- name: models-volume
30+
mountPath: /app/models

website/docs/installation/kubernetes.md

Lines changed: 34 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -42,6 +42,38 @@ Configure the semantic router by editing `deploy/kubernetes/config.yaml`. This f
4242
- llm-katan: semantic-router + an llm-katan sidecar listening on 8002 and serving model name `qwen3`
4343
- Path: `deploy/kubernetes/overlays/llm-katan`
4444

45+
### Repository layout (deploy/kubernetes/)
46+
47+
```
48+
deploy/kubernetes/
49+
base/
50+
kustomization.yaml # base kustomize: namespace, PVC, service, deployment
51+
namespace.yaml # Namespace for all resources
52+
pvc.yaml # PVC for models (storageClass and size adjustable)
53+
service.yaml # Service exposing gRPC/metrics/HTTP ports
54+
deployment.yaml # Semantic Router Deployment (init downloads by default)
55+
config.yaml # Router config (mounted via ConfigMap)
56+
tools_db.json # Tools DB (mounted via ConfigMap)
57+
pv.example.yaml # OPTIONAL: hostPath PV example for local models
58+
overlays/
59+
core/
60+
kustomization.yaml # Uses only base
61+
llm-katan/
62+
kustomization.yaml # Patches base to add llm-katan sidecar
63+
patch-llm-katan.yaml # Strategic-merge patch injecting sidecar
64+
kustomization.yaml # Root points to overlays/core by default
65+
README.md # Additional notes
66+
namespace.yaml, pvc.yaml, service.yaml (top-level shortcuts kept for backward compat)
67+
```
68+
69+
Notes:
70+
71+
- The base deployment includes an initContainer that downloads required models on first run.
72+
- If your cluster has limited egress, prefer mounting local models via a PV/PVC and skip downloads:
73+
- Copy `base/pv.example.yaml` to `base/pv.yaml`, apply it, and ensure `base/pvc.yaml` is bound to that PV.
74+
- Mount point remains `/app/models` in the pod.
75+
- See “Network Tips” for details on hostPath PV, image mirrors, and preloading images.
76+
4577
Important notes before you apply manifests:
4678

4779
- `vllm_endpoints.address` must be an IP address (not hostname) reachable from inside the cluster. If your LLM backends run as K8s Services, use the ClusterIP (for example `10.96.0.10`) and set `port` accordingly. Do not include protocol or path.
@@ -69,9 +101,9 @@ To run with the llm-katan overlay instead:
69101

70102
```bash
71103
kubectl apply -k deploy/kubernetes/overlays/llm-katan
72-
````
104+
```
73105

74-
````
106+
Note: The llm-katan overlay no longer references parent files directly. It uses a local patch (`deploy/kubernetes/overlays/llm-katan/patch-llm-katan.yaml`) to inject the sidecar, avoiding kustomize parent-directory restrictions.
75107

76108
## Step 3: Install Envoy Gateway
77109

0 commit comments

Comments
 (0)