[BUG] Annotations from engine/router on pods not passed down to pod spec reconcile

## What happened?

With changes in PR: https://github.com/sgl-project/ome/pull/391, we found the metadatas (annotations/labels) from engine/router on pods not being passed down to the pod spec reconcile logic so causing wrong determinations in pod spec reconcile logic where we need to rely on the values of those annotations/labels.

The wrong determinations we found so far in the pod spec reconcile logic (caused by https://github.com/sgl-project/ome/pull/391)

1. wrong env variable added;
2. wrong volume mount added;
3. wrong volume added.

## What did you expect to happen?
It is okay to make changes to avoid unnecessary annotations/labels added to the unwanted components. The expectation is for the places where the logic needs to rely on those metadatas (annotations/labels from engine/router on pods), we need to pass down the correct metadatas, to make sure the existing reconcile logic won't break.

The expectation for such changes:
1. Make sure to mention explicitly in PR description what components you target to change to have different metadata;
2. Make sure you still pass the original metadata when there is a need to rely on its value.

Under the case your goal has conflicts with current reconciling logic, you should open an issue.

## How can we reproduce it (as minimally and precisely as possible)?

Under the case where the runtime spec is with an annotation `ome.io/inject-model-init: 'true'` under `engineConfig` which we use to inject the model init for model enigma usage, with changes from https://github.com/sgl-project/ome/pull/391, this annotations from engine won't be passed down to pod spec reconcile logic so causing unwanted env variable (MODEL_PATH) and unwanted volume mount added to our serving container.

Example runtime with `ome.io/inject-model-init: 'true'` annotation:
```
apiVersion: ome.io/v1beta1
kind: ClusterServingRuntime
metadata:
  annotations:
    meta.helm.sh/release-name: ome-model-serving
    meta.helm.sh/release-namespace: ome
  labels:
    app.kubernetes.io/managed-by: Helm
  name: sample-runtime
spec:
  engineConfig:
    affinity:
      nodeAffinity:
        requiredDuringSchedulingIgnoredDuringExecution:
          nodeSelectorTerms:
            - matchExpressions:
                - key: node.kubernetes.io/instance-type
                  operator: In
                  values:
                    - BM.GPU.H100.8
    annotations:
      ome.io/inject-model-init: 'true'
    labels:
      logging-forward: enabled
    runner:
      image: sample-image:sample-tag
      ports:
        - containerPort: 8080
          name: http1
          protocol: TCP
      resources:
        limits:
          nvidia.com/gpu: '2'
        requests:
          cpu: 16
          memory: 120Gi
          nvidia.com/gpu: '2'
      volumeMounts:
        - mountPath: /dev/shm
          name: shared-memory
        - mountPath: /opt/ml/model
          name: model-empty-dir
    tolerations:
      - effect: NoSchedule
        key: nvidia.com/gpu
        operator: Exists
    volumes:
      - emptyDir:
          medium: Memory
          sizeLimit: 1Gi
        name: shared-memory
      - emptyDir:
          medium: Memory
        name: model-empty-dir
  modelSizeRange:
    max: 120B
    min: 100B
  supportedModelFormats:
    - autoSelect: true
      modelArchitecture: CohereForCausalLM
      modelFormat:
        name: tensorrtllm
        operator: Equal
        version: v0.8.0
        weight: 1
      modelFramework:
        name: tensorrtllm
        operator: Equal
        version: v0.8.0
        weight: 1
      version: '1.0'
```

## Anything else we need to know?
It is related to changes in PR: https://github.com/sgl-project/ome/pull/391

PR is reverted, please follow this issue to make corresponding changes moving forward. 

## Environment
- OME version:
- Kubernetes version (use `kubectl version`):
- Cloud provider or hardware configuration:
- OS (e.g., from `/etc/os-release`):
- Runtime (SGLang, vLLM, etc.) and version:
- Model being served (if applicable):
- Install method (Helm, kubectl, etc.):

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Annotations from engine/router on pods not passed down to pod spec reconcile #417

What happened?

What did you expect to happen?

How can we reproduce it (as minimally and precisely as possible)?

Anything else we need to know?

Environment

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[BUG] Annotations from engine/router on pods not passed down to pod spec reconcile #417

Description

What happened?

What did you expect to happen?

How can we reproduce it (as minimally and precisely as possible)?

Anything else we need to know?

Environment

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions