Convert step 09 to Python - Deploy via modelservice #323

yossiovadia · 2025-09-03T16:56:41Z

This converts the bash script setup/steps/09_deploy_via_modelservice.sh to Python as setup/steps/09_deploy_via_modelservice.py.

Key changes:

Full Python conversion of model deployment logic
Improved error handling and logging
Support for REPLACE_ENV variable processing
Consistent integration with existing Python conversion framework

Fixes #269

This converts the bash script setup/steps/09_deploy_via_modelservice.sh to Python as setup/steps/09_deploy_via_modelservice.py. Key changes: - Full Python conversion of model deployment logic - Improved error handling and logging - Support for REPLACE_ENV variable processing - Consistent integration with existing Python conversion framework Fixes llm-d#269 Co-Authored-By: Claude <[email protected]>

yossiovadia · 2025-09-03T16:57:15Z

NOTE - was only able to dry-run test it ( have some proxy problems connecting to real env now)

setup/steps/09_deploy_via_modelservice.py

kalantar · 2025-09-08T14:52:07Z

setup/steps/09_deploy_via_modelservice.py

+decode:
+  replicaCount: {{ decode_replicas }}
+  kserve:
+    storageUri: "{{ storage_uri }}"
+  {{ decode_extra_pod_config }}
+  {{ decode_command }}
+  {{ decode_extra_container_config }}
+
+prefill:
+  replicaCount: {{ prefill_replicas }}
+  kserve:
+    storageUri: "{{ storage_uri }}"
+  {{ prefill_extra_pod_config }}
+  {{ prefill_command }}
+  {{ prefill_extra_container_config }}
+
+{% if mount_model_volume %}
+cache:
+  storageClass: {{ storage_class }}
+  size: {{ cache_size }}
+{% endif %}
+
+{% if gateway_enabled %}
+gateway:
+  domain: {{ gateway_domain }}
+{% endif %}
+
+{% if route_enabled %}
+route:
+  enabled: true
+  domain: {{ route_domain }}
+{% endif %}


This is not a direct translation. Why the move to kserve as part of this PR?

there was fundamental error in my conversion approach. Instead of doing a true line-by-line translation of the bash script's YAML template(lines 60-239), I mistakenly referenced a different template structure that uses kserve configuration blocks. This was likely due to working with multiple modelservice configurations and conflating different deployment patterns.

no problem. we probably want a kserve deployment pattern in addtion to standalone and modelservice

- Fixed dependency issue by adding Jinja2 to install_deps.sh - Completely rewrote template structure in 09_deploy_via_modelservice.py to match original bash script exactly - Removed incorrect kserve structure that was not in original bash script - Template now properly reflects the actual bash script logic and structure Co-Authored-By: Claude <[email protected]>

kalantar · 2025-09-08T17:50:23Z

setup/steps/09_deploy_via_modelservice.py

+    if ev.get("control_environment_type_modelservice_active", "0") != "1":
+        deploy_methods = ev.get("deploy_methods", "")


Suggested change

if ev.get("control_environment_type_modelservice_active", "0") != "1":

deploy_methods = ev.get("deploy_methods", "")

if not ev["control_environment_type_modelservice_active"]:

deploy_methods = ev.get("deploy_methods", "unknown")

kalantar · 2025-09-08T18:06:41Z

When I execute, I get this error on the execution of helmfile:

STDERR:
  Error: failed to parse /var/folders/s9/ssw_1_396xdbftq9ldvwb9bw0000gn/T/helmfile4109704876/kalantar-test2-meta-lla-4833382e-instruct-ms-values-86c54476fc: error converting YAML to JSON: yaml: line 66: did not find expected key

I inspected the values file (00/ms-values.yaml) and see args is:

            - "[--enforce-eager____--block-size____REPLACE_ENV_LLMDBENCH_VLLM_COMMON_BLOCK_SIZE____--kv-transfer-config____'{"kv_connector":"NixlConnector","kv_role":"kv_both"}'____--tensor-parallel-size____REPLACE_ENV_LLMDBENCH_VLLM_MODELSERVICE_PREFILL_ACCELERATOR_NR____--disable-log-requests____--disable-uvicorn-access-log____--max-model-len____REPLACE_ENV_LLMDBENCH_VLLM_COMMON_MAX_MODEL_LEN]"

I'm guessing add_arguments() is not handling this case as the sh version does.
Note that there is an add_arguments() in functions.py; perhaps we should be reusing.

Additional edit:
00/ms-values.yaml has the indentation for annotations incorrect. This is probably the reason for the error message.

  annotations:
            deployed-by: kalantar
      modelservice: llm-d-benchmark

kalantar · 2025-09-08T18:09:07Z

Due to introduction of (incomplete) schema validation in modelservice, should add --skip-schema-validation to the call to helmfile. See llm-d-incubation/llm-d-modelservice#113.

kalantar · 2025-09-08T18:51:56Z

Due to introduction of (incomplete) schema validation in modelservice, should add --skip-schema-validation to the call to helmfile. See llm-d-incubation/llm-d-modelservice#113.

While trying to fix this, I found several fields that were blank in the values file. Perhaps in addition to dry run, inspect the resulting values file and/run helm template using it.

- Fix YAML indentation in add_annotations() (4 spaces vs 6 spaces) - Use existing add_command_line_options() from functions.py instead of custom implementation - Fix argument formatting to use bash-style multi-line strings with continuations - Fix affinity configuration to get fresh values after check_affinity() call - Fix boolean environment variable check for modelservice activation - Add --skip-schema-validation flag to helmfile command - Set LLMDBENCH_CURRENT_STEP=09 for functions.py compatibility All REPLACE_ENV patterns now process correctly and helm template validation passes. Co-Authored-By: Claude <[email protected]>

yossiovadia · 2025-09-09T15:11:28Z

thanks for the valuable review. this is a challenging one..

Import fix: Added add_command_line_options to imports from functions.py
Annotations indentation fix: Changed from 6-space to 4-space indentation
Removed custom function: Replaced custom add_command_line_options() with existing functions.py implementation
Affinity fix: Get fresh affinity values from environment after check_affinity() call
Environment variable check fix: Changed from string comparison to boolean check
Arguments format fix: Changed template from args: to args: | for multi-line string format
Current step setting: Set LLMDBENCH_CURRENT_STEP=09 for functions.py compatibility
Helmfile flag: Added --skip-schema-validation flag to helmfile command

kalantar · 2025-09-10T13:01:14Z

I did:

LLMDBENCH_CONTROL_STEP_09_IMPLEMENTATION=py LLMDBENCH_VLLM_COMMON_NAMESPACE=kalantar-test setup/standup.sh -v -c inference-scheduling.sh -s 7,8,9 -n

(extra steps to ensure creation of all helm files)
Then copied the resulting ms-values.yaml:

cp ~/data/inference-scheduling/setup/helm/llmdbench/00/ms-values.yaml ms-values.yaml.py

I then repeated:

LLMDBENCH_CONTROL_STEP_09_IMPLEMENTATION=sh LLMDBENCH_VLLM_COMMON_NAMESPACE=kalantar-test setup/standup.sh -v -c inference-scheduling.sh -s 7,8,9 -n

cp ~/data/inference-scheduling/setup/helm/llmdbench/00/ms-values.yaml ms-values.yaml.sh

Finally, I did diff:

% diff ms-values.yaml.py ms-values.yaml.sh
66,68c66,71
<           deployed-by: kalantar
<     modelservice: llm-d-benchmark
<
---
>       deployed-by: kalantar
>       modelservice: llm-d-benchmark
>   podAnnotations:
>       deployed-by: kalantar
>       modelservice: llm-d-benchmark
>   #no____config
75,82c78,89
<     args: |
<       --enforce-eager \
<         --block-size 64 \
<         --kv-transfer-config '{"kv_connector":"NixlConnector","kv_role":"kv_both"}' \
<         --tensor-parallel-size 4 \
<         --disable-log-requests \
<         --disable-uvicorn-access-log \
<         --max-model-len 16000
---
>     args:
>       - "--enforce-eager"
>       - "--block-size"
>       - "64"
>       - "--kv-transfer-config"
>       - '{"kv_connector":"NixlConnector","kv_role":"kv_both"}'
>       - "--tensor-parallel-size"
>       - "4"
>       - "--disable-log-requests"
>       - "--disable-uvicorn-access-log"
>       - "--max-model-len"
>       - "16000"
88c95,106
<
---
>       - name: UCX_TLS
>         value: "cuda_ipc,cuda_copy,tcp"
>       - name: VLLM_NIXL_SIDE_CHANNEL_PORT
>         value: "5557"
>       - name: VLLM_NIXL_SIDE_CHANNEL_HOST
>         valueFrom:
>           fieldRef:
>             fieldPath: status.podIP
>       - name: VLLM_LOGGING_LEVEL
>         value: DEBUG
>       - name: VLLM_ALLOW_LONG_MAX_MODEL_LEN
>         value: "1"
94c112
<                 auto: "auto"
---
>         nvidia.com/gpu: "4"
100c118
<                 auto: "auto"
---
>         nvidia.com/gpu: "4"
123,124c141,148
<     volumeMounts:
<   volumes:
---
>       ports:
>         - containerPort: 5557
>           protocol: TCP
>         - containerPort: 8200
>           name: metrics
>           protocol: TCP
>     volumeMounts: []
>   volumes: []
137,139c161,166
<           deployed-by: kalantar
<     modelservice: llm-d-benchmark
<
---
>       deployed-by: kalantar
>       modelservice: llm-d-benchmark
>   podAnnotations:
>       deployed-by: kalantar
>       modelservice: llm-d-benchmark
>   #no____config
146,149c173,178
<     args: |
<       --disable-log-requests \
<         --max-model-len 16000 \
<         --tensor-parallel-size 1
---
>     args:
>       - "--disable-log-requests"
>       - "--max-model-len"
>       - "16000"
>       - "--tensor-parallel-size"
>       - "1"
157c186,197
<
---
>       - name: UCX_TLS
>         value: "cuda_ipc,cuda_copy,tcp"
>       - name: VLLM_NIXL_SIDE_CHANNEL_PORT
>         value: "5557"
>       - name: VLLM_NIXL_SIDE_CHANNEL_HOST
>         valueFrom:
>           fieldRef:
>             fieldPath: status.podIP
>       - name: VLLM_LOGGING_LEVEL
>         value: DEBUG
>       - name: VLLM_ALLOW_LONG_MAX_MODEL_LEN
>         value: "1"
163c203
<                 auto: "auto"
---
>         nvidia.com/gpu: "1"
169c209
<                 auto: "auto"
---
>         nvidia.com/gpu: "1"
192,193c232,239
<     volumeMounts:
<   volumes:
---
>       ports:
>         - containerPort: 5557
>           protocol: TCP
>         - containerPort: 8200
>           name: metrics
>           protocol: TCP
>     volumeMounts: []
>   volumes: []

- Merge dependency lists from HEAD and upstream/main - Keep both pykube-ng (from upstream) and Jinja2 (from HEAD) - Resolves conflict between competing dependency lists

Address reviewer feedback: - Add missing podAnnotations sections for decode and prefill - Fix GPU resource calculation using get_accelerator_nr function - Add missing container port configurations (5557, 8200) - Fix arguments format to use proper multi-line strings - Import missing functions from functions.py - Use proper accelerator count calculation instead of 'auto'

Major improvements to match bash script exactly: - Replace custom add_command_line_options with functions.py version - Replace custom add_additional_env_to_yaml with functions.py version - Replace custom add_config with functions.py version - Replace custom add_annotations with functions.py version - Fix args format: change from 'args: |' to proper 'args:' list format - Remove redundant custom function implementations - Use consistent function behavior across bash and Python This should resolve the reviewer's concerns about missing environment variables, incorrect YAML formatting, and inconsistent behavior.

kalantar · 2025-09-10T21:31:36Z

setup/steps/09_deploy_via_modelservice.py

+    """
+    if not resource_name or not resource_value:
+        return ""
+    return f"        {resource_name}: \"{resource_value}\""


Suggested change

return f" {resource_name}: \"{resource_value}\""

return f"{resource_name}: \"{resource_value}\""

kalantar · 2025-09-10T21:32:53Z

setup/steps/09_deploy_via_modelservice.py

+    prefill_cpu_nr = ev.get("vllm_modelservice_prefill_cpu_nr", "")
+
+    # Resource configuration
+    accelerator_resource = ev.get("vllm_common_accelerator_resource", "")


The env var gets reset in check_affinity(); the value in ev does not.

Suggested change

accelerator_resource = ev.get("vllm_common_accelerator_resource", "")

accelerator_resource = os.getenv("LLMDBENCH_VLLM_COMMON_ACCELERATOR_RESOURCE")

kalantar · 2025-09-10T21:34:15Z

setup/steps/09_deploy_via_modelservice.py

+    # Extra configurations
+    decode_extra_pod_config = ev.get("vllm_modelservice_decode_extra_pod_config", "")
+    decode_extra_container_config = ev.get("vllm_modelservice_decode_extra_container_config", "")
+    decode_extra_volume_mounts = ev.get("vllm_modelservice_decode_extra_volume_mounts", "")
+    decode_extra_volumes = ev.get("vllm_modelservice_decode_extra_volumes", "")
+
+    prefill_extra_pod_config = ev.get("vllm_modelservice_prefill_extra_pod_config", "")
+    prefill_extra_container_config = ev.get("vllm_modelservice_prefill_extra_container_config", "")
+    prefill_extra_volume_mounts = ev.get("vllm_modelservice_prefill_extra_volume_mounts", "")
+    prefill_extra_volumes = ev.get("vllm_modelservice_prefill_extra_volumes", "")


I'm sure there is a more elegant way to do this. However, this works. The problem is that the env variables are defaulted to "" in env.sh.

Suggested change

# Extra configurations

decode_extra_pod_config = ev.get("vllm_modelservice_decode_extra_pod_config", "")

decode_extra_container_config = ev.get("vllm_modelservice_decode_extra_container_config", "")

decode_extra_volume_mounts = ev.get("vllm_modelservice_decode_extra_volume_mounts", "")

decode_extra_volumes = ev.get("vllm_modelservice_decode_extra_volumes", "")

prefill_extra_pod_config = ev.get("vllm_modelservice_prefill_extra_pod_config", "")

prefill_extra_container_config = ev.get("vllm_modelservice_prefill_extra_container_config", "")

prefill_extra_volume_mounts = ev.get("vllm_modelservice_prefill_extra_volume_mounts", "")

prefill_extra_volumes = ev.get("vllm_modelservice_prefill_extra_volumes", "")

# Extra configurations

decode_extra_pod_config = ev.get("vllm_modelservice_decode_extra_pod_config", "#no____config")

if decode_extra_pod_config == "":

decode_extra_pod_config = "#no____config"

decode_extra_container_config = ev.get("vllm_modelservice_decode_extra_container_config", "#no____config")

if decode_extra_container_config == "":

decode_extra_container_config = "#no____config"

decode_extra_volume_mounts = ev.get("vllm_modelservice_decode_extra_volume_mounts", "[]")

if decode_extra_volume_mounts == "":

decode_extra_volume_mounts = "[]"

decode_extra_volumes = ev.get("vllm_modelservice_decode_extra_volumes", "[]")

if decode_extra_volumes == "":

decode_extra_volumes = "[]"

prefill_extra_pod_config = ev.get("vllm_modelservice_prefill_extra_pod_config", "#no____config")

if prefill_extra_pod_config == "":

prefill_extra_pod_config = "#no____config"

prefill_extra_container_config = ev.get("vllm_modelservice_prefill_extra_container_config", "#no____config")

if prefill_extra_container_config == "":

prefill_extra_container_config = "#no____config"

prefill_extra_volume_mounts = ev.get("vllm_modelservice_prefill_extra_volume_mounts", "[]")

if prefill_extra_volume_mounts == "":

prefill_extra_volume_mounts = "[]"

prefill_extra_volumes = ev.get("vllm_modelservice_prefill_extra_volumes", "[]")

if prefill_extra_volumes == "":

prefill_extra_volumes = "[]"

kalantar · 2025-09-10T21:36:38Z

setup/steps/09_deploy_via_modelservice.py

+        # Set up configuration preparation
+        add_config_prep()


Given the proposed changes earlier (comment: # Extra configurations), this is no longer needed. Nor is the method.

kalantar · 2025-09-10T21:37:26Z

setup/steps/09_deploy_via_modelservice.py

+def add_pod_annotations(annotation_var: str) -> str:
+    """
+    Generate podAnnotations YAML section.
+    """
+    return functions_add_annotations(annotation_var)


Is this necessary?

kalantar · 2025-09-10T21:40:36Z

In addition to the changes here, I believe the following changes are needed in the current version of functions.py:

% git diff setup/functions.py
diff --git a/setup/functions.py b/setup/functions.py
index 66e1964..b976789 100644
--- a/setup/functions.py
+++ b/setup/functions.py
@@ -864,7 +864,7 @@ def add_annotations(varname: str) -> str:
             key, value = entry.split(":", 1)
             annotation_lines.append(f"{indent}{key.strip()}: {value.strip()}")

-    return "\n".join(annotation_lines)
+    return "\n".join(annotation_lines).lstrip()


 def render_string(input_string):
@@ -929,8 +929,8 @@ def add_command_line_options(args_string):
     """
     current_step = os.environ.get("LLMDBENCH_CURRENT_STEP", "")

-    # Process REPLACE_ENV variables first
     if args_string:
+        # Process REPLACE_ENV variables first
         processed_args = render_string(args_string)

         # Handle formatting based on step and content
@@ -961,7 +961,7 @@ def add_command_line_options(args_string):
                 processed_args = processed_args.replace(";", ";\n      ")
                 processed_args = processed_args.replace(" --", " \\\n        --")

-            return f"      {processed_args}"
+            return f"  {processed_args}"
         else:
             # Default case
             processed_args = processed_args.replace("____", " ")
@@ -974,12 +974,12 @@ def add_command_line_options(args_string):
             return ""


-def add_additional_env_to_yaml(env_vars_string):
+def add_additional_env_to_yaml(env_vars_string_or_file):
     """
     Generate additional environment variables YAML.
     Equivalent to the bash add_additional_env_to_yaml function.
     """
-    if not env_vars_string:
+    if not env_vars_string_or_file:
         return ""

     # Determine indentation based on environment type
@@ -996,9 +996,16 @@ def add_additional_env_to_yaml(env_vars_string):
         name_indent = "        "  # default 8 spaces
         value_indent = "          "  # default 10 spaces

+    try:
+        with open(env_vars_string_or_file, 'r') as f:
+            contents = f.read()
+            return '\n'.join(f"{name_indent}{line}" for line in render_string(contents).splitlines()).lstrip()
+    except FileNotFoundError:
+        pass
+
     # Parse environment variables (comma-separated list)
     env_lines = []
-    for envvar in env_vars_string.split(","):
+    for envvar in env_vars_string_or_file.split(","):
         envvar = envvar.strip()
         if envvar:
             # Remove LLMDBENCH_VLLM_STANDALONE_ prefix if present
@@ -1025,12 +1032,14 @@ def add_config(obj_or_filename, num_spaces=0, label
=""):
         try:
             with open(obj_or_filename, 'r') as f:
                 contents = f.read()
+                indented_contents = '\n'.join(f"{spaces}{line}" for line in contents.splitlines())
         except FileNotFoundError:
+            indented_contents = contents
             pass

-    indented_contents = '\n'.join(f"{spaces}{line}" for line in contents.splitlines())
-    if indented_contents.strip() != "{}" :
-        indented_contents = f"  {label}\n{indented_contents}"
+    if indented_contents.strip() != "" :
+        if label != "" :
+            indented_contents = f"{label}:\n{indented_contents}"
     else :
         indented_contents = ""
     return indented_contents

kalantar · 2025-09-10T21:46:30Z

setup/steps/09_deploy_via_modelservice.py

+    ports:
+      - containerPort: {decode_inference_port}
+      - containerPort: 5557


This is insufficient. The metrics collection requires a name. There is an env variable that should be looked at. An example is in scenarios/examples/inference-scheduling.sh. I think it should be possible to reuse add_config().

export LLMDBENCH_VLLM_MODELSERVICE_EXTRA_CONTAINER_CONFIG=$(mktemp) cat << EOF > ${LLMDBENCH_VLLM_MODELSERVICE_EXTRA_CONTAINER_CONFIG} ports: - containerPort: 5557 protocol: TCP - containerPort: 8200 name: metrics protocol: TCP EOF

Note that in this example, the 8200 should probably be replaced by REPLACE_ENV_VLLM_MODELSERVICE_DECODE_INFERENCE_PORT

kalantar · 2025-09-11T12:12:47Z

This is a related PR #335 that also proposes changes to add_additional_env_to_yaml()

Changes made: - Remove unnecessary wrapper functions (add_config_prep, add_pod_annotations) - Fix accelerator_resource to use os.environ.get() instead of ev.get() as suggested by reviewer - check_affinity() resets env vars - Replace wrapper calls with direct functions.py calls - Clean up redundant function definitions Still pending reviewer guidance on: - Container port configuration with metrics names - Volume mounts/volumes default behavior - Functions.py dependencies if any

Issue: add_config() function only treated '{}' as empty but not '[]' This caused volume mounts to show malformed YAML instead of being omitted Fix: Extend the empty check to include both '{}' and '[]' Now both empty objects and arrays are properly omitted from YAML output This resolves the volume mount configuration issue mentioned by reviewer.

yossiovadia · 2025-09-11T14:54:05Z

Hi @kalantar (Michael),

I've addressed some of your feedback items and would like your guidance on the remaining implementation -

those are the completed Items:

-Removed unnecessary wrapper functions (add_config_prep, add_pod_annotations)
-Fixed accelerator_resource to use os.environ.get() instead of ev.get() as you suggested
-Replaced custom implementations with functions.py versions
-Fixed YAML arguments format to match bash output

Fixed functions.py add_config() to handle empty arrays "[]" like empty objects "{}"

however, need your guidance On:

1. Container Port Configuration with Metrics Names
You mentioned "ports need a name for metrics collection". Which approach do you prefer? some sugestions -

Option A: Add name fields directly
ports:
- containerPort: 5557
name: metrics
- containerPort: 8200
name: inference

Option B: Use add_config() function for ports configuration
ports:
{functions_add_config(some_port_env_var, 4)}

Option C: or to try research existing port naming patterns in codebase(?)

2. Volume Mounts/Volumes Default Behavior
Currently showing volumeMounts: [] instead of proper configuration. What's the expected behavior when there are no volume mounts?

Option A: Change defaults from "[]" to "#no____config"
Option B: Modify how add_config() handles empty arrays
Option C: Remove defaults entirely and let add_config() handle it

_Should empty volume configs result in:

No line at all (empty)
Empty array volumeMounts: []
Some other format?_

Please let me know your preferences and - thanks for your patience with this conversion!

kalantar · 2025-09-11T16:03:35Z

Container Port Configuration with Metrics Names

I am inclined to suggest option B since this aligns well with the approach taken for other things.

Volume Mounts/Volumes Default Behavior

Ideally when there are no volumes and volume mounts, this field should be skipped. I think I had problems doing this in the sh version. Should be easier in py.

- Remove extra spacing in filter_empty_resource function formatting - Implement option B for container ports by removing hardcoded ports and using add_config approach - Add conditional_volume_config function to skip empty volume/volumeMount fields entirely - Implement add_config_prep function to set proper defaults for empty configurations - Remove generate_ms_rules_yaml function and inline the logic for cleaner code - Remove unused imports (jinja2, yaml, render_string) to reduce complexity Co-Authored-By: Claude <[email protected]>

kalantar · 2025-09-11T18:50:16Z

I tried this again. It is looking good. I do see that decode.extraConfig (and prefill.extraConfig) has no value (#no____config). Probably the default here should be {} and like volumes, volumeMounts, if there is no config the whole element should be removed.

  extraConfig:
#no____config

kalantar · 2025-09-11T19:25:49Z

Hmm. Tried again. And now not seeing that. Seeing these extraConfig without : and no value.

    extraConfig

  containers:

The formatting still off for several things. I noted yesterday that many things seemed indented more than they should. I think the indentation added at the start is being added to the indent in the yaml string. So may need to do lstrip() before sending results (or after getting). For example, annotations and podAnnotations have this issue.

- Replace "#no____config" defaults with "{}" for extraConfig sections - Add conditional_extra_config function to skip empty extraConfig entirely - Add lstrip() calls throughout template to fix YAML indentation - Ensure proper YAML syntax and formatting for all config sections Addresses reviewer Michael's feedback on template structure and formatting.

yossiovadia · 2025-09-12T15:17:25Z

@kalantar I've fixed the extraConfig YAML formatting issue you identified. The problem was missing colons and improper structure.

Before (broken YAML):

extraConfig

  containers:
    - name: vllm

After (fixed YAML):

extraConfig:
  containers:
    - name: vllm

What was fixed:

Added missing colon (:) after extraConfig labels
Fixed indentation structure to prevent double-indentation
Enhanced empty config checking to skip malformed sections entirely

The fix is in commit f7b5b4f. The conditional_extra_config function now properly formats the YAML structure and handles empty configurations correctly.

Vezio · 2025-09-19T15:12:54Z

=> Fri Sep 19 11:11:30 EDT 2025 - ./setup/standup.sh - === Running step: 09_deploy_via_modelservice.py ===
2025-09-19 11:11:32,269 - INFO - ℹ️ Environment variable LLMDBENCH_VLLM_COMMON_AFFINITY automatically set to "nvidia.com/gpu.product:NVIDIA-H100-80GB-HBM3"
2025-09-19 11:11:33,004 - INFO - 🚀 Installing helm chart "ms-llmdbench" via helmfile...
2025-09-19 11:11:35,765 - INFO -
ERROR while executing command "helmfile --namespace vezio-llmd-bench --kubeconfig /var/folders/pj/7679q1l50672dgqrzp26fxdh0000gn/T/auto-standupXXX.eYXrBBH5rJ/environment/context.ctx --selector name=ibm-gran-b1f57d6a-instruct-ms apply -f /var/folders/pj/7679q1l50672dgqrzp26fxdh0000gn/T/auto-standupXXX.eYXrBBH5rJ/setup/helm/llmdbench/helmfile-00.yaml --skip-diff-on-install --skip-schema-validation"
2025-09-19 11:11:35,766 - INFO - Comparing release=ibm-gran-b1f57d6a-instruct-ms, chart=llm-d-modelservice/llm-d-modelservice, namespace=vezio-llmd-bench

2025-09-19 11:11:35,766 - INFO - Adding repo llm-d-modelservice https://llm-d-incubation.github.io/llm-d-modelservice/
"llm-d-modelservice" has been added to your repositories

Adding repo llm-d-infra https://llm-d-incubation.github.io/llm-d-infra/
"llm-d-infra" has been added to your repositories

Listing releases matching ^ibm-gran-b1f57d6a-instruct-ms$
ibm-gran-b1f57d6a-instruct-ms	vezio-llmd-bench	1       	2025-09-19 10:46:52.091323 -0400 EDT	deployedllm-d-modelservice-v0.2.9	v0.2.0

in /var/folders/pj/7679q1l50672dgqrzp26fxdh0000gn/T/auto-standupXXX.eYXrBBH5rJ/setup/helm/llmdbench/helmfile-00.yaml: command "/Users/vezio/homebrew/bin/helm" exited with non-zero status:

PATH:
  /Users/vezio/homebrew/bin/helm

ARGS:
  0: helm (4 bytes)
  1: --kubeconfig (12 bytes)
  2: /var/folders/pj/7679q1l50672dgqrzp26fxdh0000gn/T/auto-standupXXX.eYXrBBH5rJ/environment/context.ctx (99 bytes)
  3: diff (4 bytes)
  4: upgrade (7 bytes)
  5: --allow-unreleased (18 bytes)
  6: ibm-gran-b1f57d6a-instruct-ms (29 bytes)
  7: llm-d-modelservice/llm-d-modelservice (37 bytes)
  8: --version (9 bytes)
  9: v0.2.9 (6 bytes)
  10: --skip-schema-validation (24 bytes)
  11: --namespace (11 bytes)
  12: vezio-llmd-bench (16 bytes)
  13: --values (8 bytes)
  14: /var/folders/pj/7679q1l50672dgqrzp26fxdh0000gn/T/helmfile3094013161/vezio-llmd-bench-ibm-gran-b1f57d6a-instruct-ms-values-7bd67d7474 (132 bytes)
  15: --reset-values (14 bytes)
  16: --detailed-exitcode (19 bytes)

ERROR:
  exit status 1

EXIT STATUS
  1

STDERR:
  Error: Failed to render chart: exit status 1: Error: YAML parse error on llm-d-modelservice/templates/decode-deployment.yaml: error converting YAML to JSON: yaml: line 98: could not find expected ':'
  Use --debug flag to render out invalid YAML
  Error: plugin "diff" exited with error

COMBINED OUTPUT:
  Error: Failed to render chart: exit status 1: Error: YAML parse error on llm-d-modelservice/templates/decode-deployment.yaml: error converting YAML to JSON: yaml: line 98: could not find expected ':'
  Use --debug flag to render out invalid YAML
  Error: plugin "diff" exited with error

2025-09-19 11:11:35,766 - INFO - ❌ Failed to deploy helm chart for model ibm-granite/granite-3.3-2b-instruct

@yossiovadia - I'm still getting a yaml parse issue.

Vezio · 2025-09-19T15:33:06Z

Zeroing in on the yaml issue:

    75	          command: ["vllm", "serve"]
    76	          args:
    77	            - /model-cache/models/ibm-granite/granite-3.3-2b-instruct
    78	            - --port
    79	            - "8000"
    80	            - --served-model-name
    81	            - "ibm-granite/granite-3.3-2b-instruct"
    82
    83	            --disable-log-requests \ --max-model-len 16384 \ --tensor-parallel-size 1
    84	          env:
    85	          - name: VLLM_IS_PREFILL

83 --disable-log-requests \ --max-model-len 16384 \ --tensor-parallel-size 1

Seems to be a potential culprit ?

yossiovadia · 2025-09-19T15:37:42Z

I'll ( about time ) will setup a real env to be able to test it ( vs dry run) will update shortly. thanks for the review.

yossiovadia · 2025-09-19T17:14:27Z

ok, my apologies for the long back and forth. main reason is i had to dry run. I finally got a real cluster and managed to fix & validate all the steps ( 00->09 , including ) and all works well now.

Vezio

Still appears to break when specifying scenario via -c:

./setup/standup.sh -p vezio-llmd-bench -m ibm-granite/granite-3.3-2b-instruct -c inference-scheduling.sh

==> Mon Sep 22 10:06:18 EDT 2025 - ./setup/standup.sh - === Running step: 09_deploy_via_modelservice.py ===
2025-09-22 10:06:20,670 - INFO - ℹ️ Environment variable LLMDBENCH_VLLM_COMMON_AFFINITY automatically set to "nvidia.com/gpu.product:NVIDIA-H100-80GB-HBM3"
2025-09-22 10:06:21,509 - INFO - 🚀 Installing helm chart "ms-llmdbench" via helmfile...
2025-09-22 10:06:23,770 - INFO -
ERROR while executing command "helmfile --namespace vezio-llmd-bench --kubeconfig /Users/vezio/data/inference-scheduling/environment/context.ctx --selector name=ibm-gran-b1f57d6a-instruct-ms apply -f /Users/vezio/data/inference-scheduling/setup/helm/llmdbench/helmfile-00.yaml --skip-diff-on-install --skip-schema-validation"
2025-09-22 10:06:23,771 - INFO - Comparing release=ibm-gran-b1f57d6a-instruct-ms, chart=llm-d-modelservice/llm-d-modelservice, namespace=vezio-llmd-bench

2025-09-22 10:06:23,771 - INFO - Adding repo llm-d-modelservice https://llm-d-incubation.github.io/llm-d-modelservice/
"llm-d-modelservice" has been added to your repositories

Adding repo llm-d-infra https://llm-d-incubation.github.io/llm-d-infra/
"llm-d-infra" has been added to your repositories

Listing releases matching ^ibm-gran-b1f57d6a-instruct-ms$
ibm-gran-b1f57d6a-instruct-ms	vezio-llmd-bench	2       	2025-09-19 16:12:38.073087 -0400 EDT	deployedllm-d-modelservice-v0.2.9	v0.2.0

in /Users/vezio/data/inference-scheduling/setup/helm/llmdbench/helmfile-00.yaml: command "/Users/vezio/homebrew/bin/helm" exited with non-zero status:

PATH:
  /Users/vezio/homebrew/bin/helm

ARGS:
  0: helm (4 bytes)
  1: --kubeconfig (12 bytes)
  2: /Users/vezio/data/inference-scheduling/environment/context.ctx (62 bytes)
  3: diff (4 bytes)
  4: upgrade (7 bytes)
  5: --allow-unreleased (18 bytes)
  6: ibm-gran-b1f57d6a-instruct-ms (29 bytes)
  7: llm-d-modelservice/llm-d-modelservice (37 bytes)
  8: --version (9 bytes)
  9: v0.2.9 (6 bytes)
  10: --skip-schema-validation (24 bytes)
  11: --namespace (11 bytes)
  12: vezio-llmd-bench (16 bytes)
  13: --values (8 bytes)
  14: /var/folders/pj/7679q1l50672dgqrzp26fxdh0000gn/T/helmfile1913400668/vezio-llmd-bench-ibm-gran-b1f57d6a-instruct-ms-values-85686dd485 (132 bytes)
  15: --reset-values (14 bytes)
  16: --detailed-exitcode (19 bytes)

ERROR:
  exit status 1

EXIT STATUS
  1

STDERR:
  Error: Failed to render chart: exit status 1: Error: failed to parse /var/folders/pj/7679q1l50672dgqrzp26fxdh0000gn/T/helmfile1913400668/vezio-llmd-bench-ibm-gran-b1f57d6a-instruct-ms-values-85686dd485: error converting YAML to JSON: yaml: line 82: did not find expected '-' indicator
  Error: plugin "diff" exited with error

COMBINED OUTPUT:
  Error: Failed to render chart: exit status 1: Error: failed to parse /var/folders/pj/7679q1l50672dgqrzp26fxdh0000gn/T/helmfile1913400668/vezio-llmd-bench-ibm-gran-b1f57d6a-instruct-ms-values-85686dd485: error converting YAML to JSON: yaml: line 82: did not find expected '-' indicator
  Error: plugin "diff" exited with error

2025-09-22 10:06:23,772 - INFO - ❌ Failed to deploy helm chart for model ibm-granite/granite-3.3-2b-instruct

Vezio · 2025-09-22T14:12:23Z

When no scenario (-c) is specified - it runs to completion with no errors.

kalantar · 2025-09-22T15:15:29Z

The problem appears to be in the generated setup/helm/llmdbench/00/ms-values.yaml. The args look like:

 78     args:
 79       - "--enforce-eager"
 80       - "--block-size"
 81       - "64"
 82       - "--kv-transfer-config"
 83       - "'{"kv_connector":"NixlConnector","kv_role":"kv_both"}'"
 84       - "--disable-log-requests"
 85       - "--disable-uvicorn-access-log"
 86       - "--max-model-len"
 87       - "16000--tensor-parallel-size"
 88       - "4"

I'm not sure about line 82 as reported. My ide complains about line 83. I think line 87 is also suspect.
When run with LLMDBENCH_CONTROL_STEP_09_IMPLEMENTATION=sh, this part of the generated file looks like:

 84     args:
 85       - "--enforce-eager"
 86       - "--block-size"
 87       - "64"
 88       - "--kv-transfer-config"
 89       - '{"kv_connector":"NixlConnector","kv_role":"kv_both"}'
 90       - "--disable-log-requests"
 91       - "--disable-uvicorn-access-log"
 92       - "--max-model-len"
 93       - "16000--tensor-parallel-size"
 94       - "4

line 83/89 has different quoting.

TBH, I am suspicious of line 87/93 and the quoting on 94.
The line numbers don't match because a difference in the generation of the HTTPRoute; in the py version there is no timeouts section.

kalantar · 2025-09-22T15:37:32Z

The problem lines lines 87 / 93-94 are caused by the scenarios/example/inference-scheduling.sh. See #364.

kalantar · 2025-09-22T15:42:31Z

The problem lines lines 87 / 93-94 are caused by the scenarios/example/inference-scheduling.sh. See #364.

This fix also seems to fix the problem with quoting that caused the helm errors.

kalantar · 2025-09-22T17:09:08Z

I take that back, I had LLMDBENCH_CONTROL_STEP_09_IMPLEMENTATION=sh. When I set it to py I still get an error caused by the quoting of this line: - "'{"kv_connector":"NixlConnector","kv_role":"kv_both"}'"

The behavior is different from sh.

Vezio · 2025-09-22T17:31:40Z

Same error continues despite @kalantar's related fix.

kalantar · 2025-09-22T19:49:00Z

I commented above

The line numbers don't match because a difference in the generation of the HTTPRoute; in the py version there is no timeouts section.

The changes to the sh version of step 09 in this PR: #357 should be incorporated in the python version to address this issue.

kalantar · 2025-09-25T13:31:10Z

@yossiovadia has the comment #323 (comment) been addressed? I think this is the last outstanding issue I have been waiting for before reviewing again.

Resolve the quoting issue identified in comment llm-d#323 where Python generated: - "'{"kv_connector":"NixlConnector","kv_role":"kv_both"}'" (broken) Now correctly generates like bash implementation: - '{"kv_connector":"NixlConnector","kv_role":"kv_both"}' (working) - Detect arguments that already have single quotes (JSON strings) - Use them as-is without additional double quote wrapping - Only wrap regular arguments in double quotes Addresses the last outstanding reviewer concern before final approval.

- Remove extra spacing in filter_empty_resource function formatting - Implement option B for container ports by removing hardcoded ports and using add_config approach - Add conditional_volume_config function to skip empty volume/volumeMount fields entirely - Implement add_config_prep function to set proper defaults for empty configurations - Remove generate_ms_rules_yaml function and inline the logic for cleaner code - Remove unused imports (jinja2, yaml, render_string) to reduce complexity Co-Authored-By: Claude <[email protected]> Signed-off-by: Yossi Ovadia <[email protected]>

- Replace "#no____config" defaults with "{}" for extraConfig sections - Add conditional_extra_config function to skip empty extraConfig entirely - Add lstrip() calls throughout template to fix YAML indentation - Ensure proper YAML syntax and formatting for all config sections Addresses reviewer Michael's feedback on template structure and formatting. Signed-off-by: Yossi Ovadia <[email protected]>

…double indentation - Fixed conditional_extra_config function to properly add colon after extraConfig label - Added proper empty config checking before processing to avoid unnecessary sections - Fixed indentation structure to prevent double-indentation issues - Resolves issue where extraConfig sections appeared without colons (extraConfig\n containers:) - Now correctly generates: extraConfig:\n content... Co-Authored-By: Claude <[email protected]> Signed-off-by: Yossi Ovadia <[email protected]>

- Fix YAML args formatting: Convert from malformed backslash format to proper YAML list with quoted strings - Resolve Helm deployment error: 'cannot unmarshal number into Go struct field Container.args of type string' - Add accelerator resource auto-detection: Convert 'auto' to 'nvidia.com/gpu' - Improve resource section generation: Clean resource limits/requests without empty values - Fix CPU/memory fallback logic: Use common values when specific ones are empty - Remove duplicate check_storage_class function causing pykube object_factory errors - Add OpenShift L40S GPU scenario for testing modelservice deployment Tested on real OpenShift cluster with NVIDIA L40S GPUs. Full pipeline (steps 0-9) now completes successfully with Python step 09 implementation. Signed-off-by: Yossi Ovadia <[email protected]>

Resolve issue where custom scenarios with complex arguments (like JSON strings) caused 'did not find expected '-' indicator' YAML parsing errors. - Change args parsing to split on '____' delimiters instead of whitespace - Add proper quote escaping for arguments containing JSON or special characters - Preserve complex arguments like --kv-transfer-config with embedded JSON Fixes reviewer issue: Custom scenarios (-c) now work correctly. Signed-off-by: Yossi Ovadia <[email protected]>

Address reviewer feedback on malformed args output: - Fix concatenated arguments like '16000--tensor-parallel-size' - Resolve unclosed quote issues like '"4' - Clean up trailing backslashes and quotes from bash line continuations - Prevent empty arguments from being added to YAML Resolves specific issues identified in generated ms-values.yaml lines 87 and 94. Signed-off-by: Yossi Ovadia <[email protected]>

Incorporate changes from PR llm-d#357 that add infinite timeouts to HTTPRoute configurations: - Add timeouts section with backendRequest: 0s and request: 0s to both HTTPRoute rules - Prevents request timeouts during long-running model inference operations - Matches the bash implementation timeout behavior Resolves reviewer feedback about missing timeout sections in Python version. Signed-off-by: Yossi Ovadia <[email protected]>

Resolve the quoting issue identified in comment llm-d#323 where Python generated: - "'{"kv_connector":"NixlConnector","kv_role":"kv_both"}'" (broken) Now correctly generates like bash implementation: - '{"kv_connector":"NixlConnector","kv_role":"kv_both"}' (working) - Detect arguments that already have single quotes (JSON strings) - Use them as-is without additional double quote wrapping - Only wrap regular arguments in double quotes Addresses the last outstanding reviewer concern before final approval. Signed-off-by: Yossi Ovadia <[email protected]>

Vezio · 2025-09-26T18:22:20Z

==> Fri Sep 26 14:08:16 EDT 2025 - /Users/vezio/IBM/llmd/test/test-llm-d-benchmark/llm-d-benchmark/setup/run.sh - ✅ Done rendering "random_concurrent" workload profile templates to "/Users/vezio/data/pd-disaggregation/workload/profiles/"
/Users/vezio/IBM/llmd/test/test-llm-d-benchmark/llm-d-benchmark/setup/run.sh: line 407: /Users/vezio/IBM/llmd/test/test-llm-d-benchmark/llm-d-benchmark/setup/presets/gaie/plugins-v2.yaml: No such file or directory
...
...
...
==> Fri Sep 26 14:18:34 EDT 2025 - /Users/vezio/IBM/llmd/test/test-llm-d-benchmark/llm-d-benchmark/setup/run.sh - ✅ Benchmark execution for model "meta-llama/Llama-3.1-70B-Instruct" completed
==> Fri Sep 26 14:18:34 EDT 2025 - /Users/vezio/IBM/llmd/test/test-llm-d-benchmark/llm-d-benchmark/setup/run.sh - 🗑️ Deleting pod "llmdbench-vllm-benchmark-launcher" for model "meta-llama/Llama-3.1-70B-Instruct" ...
==> Fri Sep 26 14:18:35 EDT 2025 - /Users/vezio/IBM/llmd/test/test-llm-d-benchmark/llm-d-benchmark/setup/run.sh - ✅ Pod "llmdbench-vllm-benchmark-launcher" for model "meta-llama/Llama-3.1-70B-Instruct" deleted
==> Fri Sep 26 14:18:35 EDT 2025 - /Users/vezio/IBM/llmd/test/test-llm-d-benchmark/llm-d-benchmark/setup/run.sh - 🏗️ Collecting results for model "meta-llama/Llama-3.1-70B-Instruct" (meta-llama/Llama-3.1-70B-Instruct) to "/Users/vezio/data/pd-disaggregation/results/vllm-benchmark_1758909201-none_llm-d-70b-instruct"...
ERROR while executing command "oc --kubeconfig /Users/vezio/data/pd-disaggregation/environment/context.ctx --namespace llmdbench cp --retries=5 access-to-harness-data-workload-pvc:/requests/vllm-benchmark_1758909201-none_llm-d-70b-instruct /Users/vezio/data/pd-disaggregation/results/vllm-benchmark_1758909201-none_llm-d-70b-instruct"

Error from server (BadRequest): pod access-to-harness-data-workload-pvc does not have a host assigned
==> Fri Sep 26 14:18:36 EDT 2025 - ./setup/e2e.sh - ℹ️  Unconditionally moving "/Users/vezio/data/pd-disaggregation" to "/Users/vezio/data/pd-disaggregation.none"...
mv: rename /Users/vezio/data/pd-disaggregation to /Users/vezio/data/pd-disaggregation.none/pd-disaggregation: Directory not empty

So I'm not sure why it's not using the correct namespace? It appears to be using the default llmdbench namespace, but not the custom one I provided - all other commands were executed accordingly...but it seems to be failing because of that.

This could be unrelated - but I'm not sure why I would see this failures now?

After speaking w/ @maugustosilva I believe this is my own user error based on a recent PR that was merged on some NS changes. Rerunning now!

Vezio · 2025-09-26T19:47:56Z

This looks pretty good based on:

./setup/e2e.sh -p vezio-test -c pd-disaggregation.sh -t modelservice --deep

NAME                                                  READY   STATUS      RESTARTS   AGE
download-model-9fthc                                  0/1     Completed   0          10m
infra-llmdbench-inference-gateway-6d97dcbfb9-zn8b2    1/1     Running     0          5m14s
meta-lla-8f96c2da-instruct-decode-5944cdc5b7-5n2pq    2/2     Running     0          4m50s
meta-lla-8f96c2da-instruct-gaie-epp-d4d866c65-x5t6b   1/1     Running     0          5m5s
meta-lla-8f96c2da-instruct-prefill-78bbc9ccd-xrpdw    1/1     Running     0          4m51s

==> Fri Sep 26 15:20:46 EDT 2025 - /Users/vezio/IBM/llmd/test/test-llm-d-benchmark/llm-d-benchmark/setup/standup.sh - === Running step: 09_deploy_via_modelservice.sh ===

==> Fri Sep 26 15:20:50 EDT 2025 - /Users/vezio/IBM/llmd/test/test-llm-d-benchmark/llm-d-benchmark/setup/standup.sh - ℹ️ Environment variable LLMDBENCH_VLLM_COMMON_AFFINITY automatically set to "nvidia.com/gpu.product:NVIDIA-H100-80GB-HBM3"
==> Fri Sep 26 15:20:52 EDT 2025 - /Users/vezio/IBM/llmd/test/test-llm-d-benchmark/llm-d-benchmark/setup/standup.sh - 🚀 Installing helm chart "ms-llmdbench" via helmfile...
==> Fri Sep 26 15:21:02 EDT 2025 - /Users/vezio/IBM/llmd/test/test-llm-d-benchmark/llm-d-benchmark/setup/standup.sh - ✅ vezio-test-meta-lla-8f96c2da-instruct-ms helm chart deployed successfully
==> Fri Sep 26 15:21:02 EDT 2025 - /Users/vezio/IBM/llmd/test/test-llm-d-benchmark/llm-d-benchmark/setup/standup.sh - ⏳ waiting for (decode) pods serving model meta-llama/Llama-3.1-70B-Instruct to be created...
==> Fri Sep 26 15:21:03 EDT 2025 - /Users/vezio/IBM/llmd/test/test-llm-d-benchmark/llm-d-benchmark/setup/standup.sh - ✅ (decode) pods serving model meta-llama/Llama-3.1-70B-Instruct created
==> Fri Sep 26 15:21:03 EDT 2025 - /Users/vezio/IBM/llmd/test/test-llm-d-benchmark/llm-d-benchmark/setup/standup.sh - ⏳ waiting for (prefill) pods serving model meta-llama/Llama-3.1-70B-Instruct to be created...
==> Fri Sep 26 15:21:03 EDT 2025 - /Users/vezio/IBM/llmd/test/test-llm-d-benchmark/llm-d-benchmark/setup/standup.sh - ✅ (prefill) pods serving model meta-llama/Llama-3.1-70B-Instruct created
==> Fri Sep 26 15:21:04 EDT 2025 - /Users/vezio/IBM/llmd/test/test-llm-d-benchmark/llm-d-benchmark/setup/standup.sh - ⏳ Waiting for (decode) pods serving model meta-llama/Llama-3.1-70B-Instruct to be in "Running" state (timeout=900000s)...
==> Fri Sep 26 15:27:25 EDT 2025 - /Users/vezio/IBM/llmd/test/test-llm-d-benchmark/llm-d-benchmark/setup/standup.sh - 🚀 (decode) pods serving model meta-llama/Llama-3.1-70B-Instruct running
==> Fri Sep 26 15:27:25 EDT 2025 - /Users/vezio/IBM/llmd/test/test-llm-d-benchmark/llm-d-benchmark/setup/standup.sh - ⏳ Waiting for (prefill) pods serving model meta-llama/Llama-3.1-70B-Instruct to be in "Running" state (timeout=900000s)...
==> Fri Sep 26 15:27:26 EDT 2025 - /Users/vezio/IBM/llmd/test/test-llm-d-benchmark/llm-d-benchmark/setup/standup.sh - 🚀 (prefill) pods serving model meta-llama/Llama-3.1-70B-Instruct running
==> Fri Sep 26 15:27:26 EDT 2025 - /Users/vezio/IBM/llmd/test/test-llm-d-benchmark/llm-d-benchmark/setup/standup.sh - ⏳ Waiting for (decode) pods serving meta-llama/Llama-3.1-70B-Instruct to be Ready (timeout=900000s)...



==> Fri Sep 26 15:31:23 EDT 2025 - /Users/vezio/IBM/llmd/test/test-llm-d-benchmark/llm-d-benchmark/setup/standup.sh - 🚀 (decode) pods serving model meta-llama/Llama-3.1-70B-Instruct ready
==> Fri Sep 26 15:31:24 EDT 2025 - /Users/vezio/IBM/llmd/test/test-llm-d-benchmark/llm-d-benchmark/setup/standup.sh - ⏳ Waiting for (prefill) pods serving meta-llama/Llama-3.1-70B-Instruct to be Ready (timeout=900000s)...
==> Fri Sep 26 15:31:25 EDT 2025 - /Users/vezio/IBM/llmd/test/test-llm-d-benchmark/llm-d-benchmark/setup/standup.sh - 🚀 (prefill) pods serving model meta-llama/Llama-3.1-70B-Instruct ready
==> Fri Sep 26 15:31:27 EDT 2025 - /Users/vezio/IBM/llmd/test/test-llm-d-benchmark/llm-d-benchmark/setup/standup.sh - 📜 Exposing pods serving model meta-llama/Llama-3.1-70B-Instruct as service...
==> Fri Sep 26 15:31:27 EDT 2025 - /Users/vezio/IBM/llmd/test/test-llm-d-benchmark/llm-d-benchmark/setup/standup.sh - ✅ Service for pods service model meta-llama/Llama-3.1-70B-Instruct created
==> Fri Sep 26 15:31:27 EDT 2025 - /Users/vezio/IBM/llmd/test/test-llm-d-benchmark/llm-d-benchmark/setup/standup.sh - ✅ Model "meta-llama/Llama-3.1-70B-Instruct" and associated service deployed.
==> Fri Sep 26 15:31:27 EDT 2025 - /Users/vezio/IBM/llmd/test/test-llm-d-benchmark/llm-d-benchmark/setup/standup.sh - ✅ modelservice completed model deployment

I've seen other issues but looks unrelated to this PR.

kalantar reviewed Sep 8, 2025

View reviewed changes

yossiovadia requested a review from kalantar September 8, 2025 15:36

kalantar reviewed Sep 8, 2025

View reviewed changes

yossiovadia requested a review from kalantar September 9, 2025 16:34

yossiovadia added 3 commits September 10, 2025 09:35

Resolve merge conflict in install_deps.sh

c5953a0

- Merge dependency lists from HEAD and upstream/main - Keep both pykube-ng (from upstream) and Jinja2 (from HEAD) - Resolves conflict between competing dependency lists

kalantar reviewed Sep 10, 2025

View reviewed changes

yossiovadia added 2 commits September 11, 2025 07:45

yossiovadia requested a review from kalantar September 17, 2025 14:41

Vezio approved these changes Sep 19, 2025

View reviewed changes

Vezio requested changes Sep 22, 2025

View reviewed changes

yossiovadia requested a review from Vezio September 22, 2025 15:45

yossiovadia and others added 8 commits September 26, 2025 08:58

yossiovadia force-pushed the convert-step-09-to-python branch from 771b345 to 1ec90d6 Compare September 26, 2025 15:59

Vezio approved these changes Sep 26, 2025

View reviewed changes

maugustosilva merged commit f505618 into llm-d:main Sep 29, 2025
5 checks passed

		if ev.get("control_environment_type_modelservice_active", "0") != "1":
		deploy_methods = ev.get("deploy_methods", "")

	return f" {resource_name}: \"{resource_value}\""
	return f"{resource_name}: \"{resource_value}\""

	accelerator_resource = ev.get("vllm_common_accelerator_resource", "")
	accelerator_resource = os.getenv("LLMDBENCH_VLLM_COMMON_ACCELERATOR_RESOURCE")

-    # Extra configurations
-    decode_extra_pod_config = ev.get("vllm_modelservice_decode_extra_pod_config", "")
-    decode_extra_container_config = ev.get("vllm_modelservice_decode_extra_container_config", "")
-    decode_extra_volume_mounts = ev.get("vllm_modelservice_decode_extra_volume_mounts", "")
-    decode_extra_volumes = ev.get("vllm_modelservice_decode_extra_volumes", "")
-    prefill_extra_pod_config = ev.get("vllm_modelservice_prefill_extra_pod_config", "")
-    prefill_extra_container_config = ev.get("vllm_modelservice_prefill_extra_container_config", "")
-    prefill_extra_volume_mounts = ev.get("vllm_modelservice_prefill_extra_volume_mounts", "")
-    prefill_extra_volumes = ev.get("vllm_modelservice_prefill_extra_volumes", "")
+    # Extra configurations
+    decode_extra_pod_config = ev.get("vllm_modelservice_decode_extra_pod_config", "#no____config")
+    if decode_extra_pod_config == "":
+        decode_extra_pod_config = "#no____config"
+    decode_extra_container_config = ev.get("vllm_modelservice_decode_extra_container_config", "#no____config")
+    if decode_extra_container_config == "":
+        decode_extra_container_config = "#no____config"
+    decode_extra_volume_mounts = ev.get("vllm_modelservice_decode_extra_volume_mounts", "[]")
+    if decode_extra_volume_mounts == "":
+        decode_extra_volume_mounts = "[]"
+    decode_extra_volumes = ev.get("vllm_modelservice_decode_extra_volumes", "[]")
+    if decode_extra_volumes == "":
+        decode_extra_volumes = "[]"
+    prefill_extra_pod_config = ev.get("vllm_modelservice_prefill_extra_pod_config", "#no____config")
+    if prefill_extra_pod_config == "":
+        prefill_extra_pod_config = "#no____config"
+    prefill_extra_container_config = ev.get("vllm_modelservice_prefill_extra_container_config", "#no____config")
+    if prefill_extra_container_config == "":
+        prefill_extra_container_config = "#no____config"
+    prefill_extra_volume_mounts = ev.get("vllm_modelservice_prefill_extra_volume_mounts", "[]")
+    if prefill_extra_volume_mounts == "":
+        prefill_extra_volume_mounts = "[]"
+    prefill_extra_volumes = ev.get("vllm_modelservice_prefill_extra_volumes", "[]")
+    if prefill_extra_volumes == "":
+        prefill_extra_volumes = "[]"

Convert step 09 to Python - Deploy via modelservice #323

Convert step 09 to Python - Deploy via modelservice #323

Uh oh!

Conversation

yossiovadia commented Sep 3, 2025

Uh oh!

yossiovadia commented Sep 3, 2025

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kalantar commented Sep 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kalantar commented Sep 8, 2025

Uh oh!

kalantar commented Sep 8, 2025

Uh oh!

yossiovadia commented Sep 9, 2025

Uh oh!

kalantar commented Sep 10, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kalantar commented Sep 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kalantar commented Sep 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yossiovadia commented Sep 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kalantar commented Sep 11, 2025

Uh oh!

kalantar commented Sep 11, 2025

Uh oh!

kalantar commented Sep 11, 2025

Uh oh!

yossiovadia commented Sep 12, 2025

Before (broken YAML):

After (fixed YAML):

What was fixed:

Uh oh!

Vezio commented Sep 19, 2025

Uh oh!

Vezio commented Sep 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yossiovadia commented Sep 19, 2025

Uh oh!

yossiovadia commented Sep 19, 2025

Uh oh!

Vezio left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Vezio commented Sep 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kalantar commented Sep 22, 2025

Uh oh!

kalantar commented Sep 22, 2025

Uh oh!

kalantar commented Sep 8, 2025 •

edited

Loading

kalantar commented Sep 10, 2025 •

edited

Loading

kalantar commented Sep 11, 2025 •

edited

Loading

yossiovadia commented Sep 11, 2025 •

edited

Loading

Vezio commented Sep 19, 2025 •

edited

Loading

Vezio left a comment •

edited

Loading

Vezio commented Sep 22, 2025 •

edited

Loading

kalantar commented Sep 22, 2025 •

edited

Loading

Vezio commented Sep 26, 2025 •

edited

Loading

Vezio commented Sep 26, 2025 •

edited

Loading