Skip to content

Conversation

yossiovadia
Copy link
Contributor

This converts the bash script setup/steps/09_deploy_via_modelservice.sh to Python as setup/steps/09_deploy_via_modelservice.py.

Key changes:

  • Full Python conversion of model deployment logic
  • Improved error handling and logging
  • Support for REPLACE_ENV variable processing
  • Consistent integration with existing Python conversion framework

Fixes #269

This converts the bash script setup/steps/09_deploy_via_modelservice.sh
to Python as setup/steps/09_deploy_via_modelservice.py.

Key changes:
- Full Python conversion of model deployment logic
- Improved error handling and logging
- Support for REPLACE_ENV variable processing
- Consistent integration with existing Python conversion framework

Fixes llm-d#269

Co-Authored-By: Claude <[email protected]>
@yossiovadia
Copy link
Contributor Author

NOTE - was only able to dry-run test it ( have some proxy problems connecting to real env now)

Comment on lines 117 to 148
decode:
replicaCount: {{ decode_replicas }}
kserve:
storageUri: "{{ storage_uri }}"
{{ decode_extra_pod_config }}
{{ decode_command }}
{{ decode_extra_container_config }}

prefill:
replicaCount: {{ prefill_replicas }}
kserve:
storageUri: "{{ storage_uri }}"
{{ prefill_extra_pod_config }}
{{ prefill_command }}
{{ prefill_extra_container_config }}

{% if mount_model_volume %}
cache:
storageClass: {{ storage_class }}
size: {{ cache_size }}
{% endif %}

{% if gateway_enabled %}
gateway:
domain: {{ gateway_domain }}
{% endif %}

{% if route_enabled %}
route:
enabled: true
domain: {{ route_domain }}
{% endif %}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a direct translation. Why the move to kserve as part of this PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there was fundamental error in my conversion approach. Instead of doing a true line-by-line translation of the bash script's YAML template(lines 60-239), I mistakenly referenced a different template structure that uses kserve configuration blocks. This was likely due to working with multiple modelservice configurations and conflating different deployment patterns.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no problem. we probably want a kserve deployment pattern in addtion to standalone and modelservice

- Fixed dependency issue by adding Jinja2 to install_deps.sh
- Completely rewrote template structure in 09_deploy_via_modelservice.py to match original bash script exactly
- Removed incorrect kserve structure that was not in original bash script
- Template now properly reflects the actual bash script logic and structure

Co-Authored-By: Claude <[email protected]>
@yossiovadia yossiovadia requested a review from kalantar September 8, 2025 15:36
Comment on lines 545 to 546
if ev.get("control_environment_type_modelservice_active", "0") != "1":
deploy_methods = ev.get("deploy_methods", "")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if ev.get("control_environment_type_modelservice_active", "0") != "1":
deploy_methods = ev.get("deploy_methods", "")
if not ev["control_environment_type_modelservice_active"]:
deploy_methods = ev.get("deploy_methods", "unknown")

@kalantar
Copy link
Contributor

kalantar commented Sep 8, 2025

When I execute, I get this error on the execution of helmfile:

STDERR:
  Error: failed to parse /var/folders/s9/ssw_1_396xdbftq9ldvwb9bw0000gn/T/helmfile4109704876/kalantar-test2-meta-lla-4833382e-instruct-ms-values-86c54476fc: error converting YAML to JSON: yaml: line 66: did not find expected key

I inspected the values file (00/ms-values.yaml) and see args is:

            - "[--enforce-eager____--block-size____REPLACE_ENV_LLMDBENCH_VLLM_COMMON_BLOCK_SIZE____--kv-transfer-config____'{"kv_connector":"NixlConnector","kv_role":"kv_both"}'____--tensor-parallel-size____REPLACE_ENV_LLMDBENCH_VLLM_MODELSERVICE_PREFILL_ACCELERATOR_NR____--disable-log-requests____--disable-uvicorn-access-log____--max-model-len____REPLACE_ENV_LLMDBENCH_VLLM_COMMON_MAX_MODEL_LEN]"

I'm guessing add_arguments() is not handling this case as the sh version does.
Note that there is an add_arguments() in functions.py; perhaps we should be reusing.

Additional edit:
00/ms-values.yaml has the indentation for annotations incorrect. This is probably the reason for the error message.

  annotations:
            deployed-by: kalantar
      modelservice: llm-d-benchmark

@kalantar
Copy link
Contributor

kalantar commented Sep 8, 2025

Due to introduction of (incomplete) schema validation in modelservice, should add --skip-schema-validation to the call to helmfile. See llm-d-incubation/llm-d-modelservice#113.

@kalantar
Copy link
Contributor

kalantar commented Sep 8, 2025

Due to introduction of (incomplete) schema validation in modelservice, should add --skip-schema-validation to the call to helmfile. See llm-d-incubation/llm-d-modelservice#113.

While trying to fix this, I found several fields that were blank in the values file. Perhaps in addition to dry run, inspect the resulting values file and/run helm template using it.

- Fix YAML indentation in add_annotations() (4 spaces vs 6 spaces)
- Use existing add_command_line_options() from functions.py instead of custom implementation
- Fix argument formatting to use bash-style multi-line strings with continuations
- Fix affinity configuration to get fresh values after check_affinity() call
- Fix boolean environment variable check for modelservice activation
- Add --skip-schema-validation flag to helmfile command
- Set LLMDBENCH_CURRENT_STEP=09 for functions.py compatibility

All REPLACE_ENV patterns now process correctly and helm template validation passes.

Co-Authored-By: Claude <[email protected]>
@yossiovadia
Copy link
Contributor Author

thanks for the valuable review. this is a challenging one..

  1. Import fix: Added add_command_line_options to imports from functions.py
  2. Annotations indentation fix: Changed from 6-space to 4-space indentation
  3. Removed custom function: Replaced custom add_command_line_options() with existing functions.py implementation
  4. Affinity fix: Get fresh affinity values from environment after check_affinity() call
  5. Environment variable check fix: Changed from string comparison to boolean check
  6. Arguments format fix: Changed template from args: to args: | for multi-line string format
  7. Current step setting: Set LLMDBENCH_CURRENT_STEP=09 for functions.py compatibility
  8. Helmfile flag: Added --skip-schema-validation flag to helmfile command

@yossiovadia yossiovadia requested a review from kalantar September 9, 2025 16:34
@kalantar
Copy link
Contributor

I did:

LLMDBENCH_CONTROL_STEP_09_IMPLEMENTATION=py LLMDBENCH_VLLM_COMMON_NAMESPACE=kalantar-test setup/standup.sh -v -c inference-scheduling.sh -s 7,8,9 -n

(extra steps to ensure creation of all helm files)
Then copied the resulting ms-values.yaml:

cp ~/data/inference-scheduling/setup/helm/llmdbench/00/ms-values.yaml ms-values.yaml.py

I then repeated:

LLMDBENCH_CONTROL_STEP_09_IMPLEMENTATION=sh LLMDBENCH_VLLM_COMMON_NAMESPACE=kalantar-test setup/standup.sh -v -c inference-scheduling.sh -s 7,8,9 -n
cp ~/data/inference-scheduling/setup/helm/llmdbench/00/ms-values.yaml ms-values.yaml.sh

Finally, I did diff:

% diff ms-values.yaml.py ms-values.yaml.sh
66,68c66,71
<           deployed-by: kalantar
<     modelservice: llm-d-benchmark
<
---
>       deployed-by: kalantar
>       modelservice: llm-d-benchmark
>   podAnnotations:
>       deployed-by: kalantar
>       modelservice: llm-d-benchmark
>   #no____config
75,82c78,89
<     args: |
<       --enforce-eager \
<         --block-size 64 \
<         --kv-transfer-config '{"kv_connector":"NixlConnector","kv_role":"kv_both"}' \
<         --tensor-parallel-size 4 \
<         --disable-log-requests \
<         --disable-uvicorn-access-log \
<         --max-model-len 16000
---
>     args:
>       - "--enforce-eager"
>       - "--block-size"
>       - "64"
>       - "--kv-transfer-config"
>       - '{"kv_connector":"NixlConnector","kv_role":"kv_both"}'
>       - "--tensor-parallel-size"
>       - "4"
>       - "--disable-log-requests"
>       - "--disable-uvicorn-access-log"
>       - "--max-model-len"
>       - "16000"
88c95,106
<
---
>       - name: UCX_TLS
>         value: "cuda_ipc,cuda_copy,tcp"
>       - name: VLLM_NIXL_SIDE_CHANNEL_PORT
>         value: "5557"
>       - name: VLLM_NIXL_SIDE_CHANNEL_HOST
>         valueFrom:
>           fieldRef:
>             fieldPath: status.podIP
>       - name: VLLM_LOGGING_LEVEL
>         value: DEBUG
>       - name: VLLM_ALLOW_LONG_MAX_MODEL_LEN
>         value: "1"
94c112
<                 auto: "auto"
---
>         nvidia.com/gpu: "4"
100c118
<                 auto: "auto"
---
>         nvidia.com/gpu: "4"
123,124c141,148
<     volumeMounts:
<   volumes:
---
>       ports:
>         - containerPort: 5557
>           protocol: TCP
>         - containerPort: 8200
>           name: metrics
>           protocol: TCP
>     volumeMounts: []
>   volumes: []
137,139c161,166
<           deployed-by: kalantar
<     modelservice: llm-d-benchmark
<
---
>       deployed-by: kalantar
>       modelservice: llm-d-benchmark
>   podAnnotations:
>       deployed-by: kalantar
>       modelservice: llm-d-benchmark
>   #no____config
146,149c173,178
<     args: |
<       --disable-log-requests \
<         --max-model-len 16000 \
<         --tensor-parallel-size 1
---
>     args:
>       - "--disable-log-requests"
>       - "--max-model-len"
>       - "16000"
>       - "--tensor-parallel-size"
>       - "1"
157c186,197
<
---
>       - name: UCX_TLS
>         value: "cuda_ipc,cuda_copy,tcp"
>       - name: VLLM_NIXL_SIDE_CHANNEL_PORT
>         value: "5557"
>       - name: VLLM_NIXL_SIDE_CHANNEL_HOST
>         valueFrom:
>           fieldRef:
>             fieldPath: status.podIP
>       - name: VLLM_LOGGING_LEVEL
>         value: DEBUG
>       - name: VLLM_ALLOW_LONG_MAX_MODEL_LEN
>         value: "1"
163c203
<                 auto: "auto"
---
>         nvidia.com/gpu: "1"
169c209
<                 auto: "auto"
---
>         nvidia.com/gpu: "1"
192,193c232,239
<     volumeMounts:
<   volumes:
---
>       ports:
>         - containerPort: 5557
>           protocol: TCP
>         - containerPort: 8200
>           name: metrics
>           protocol: TCP
>     volumeMounts: []
>   volumes: []

- Merge dependency lists from HEAD and upstream/main
- Keep both pykube-ng (from upstream) and Jinja2 (from HEAD)
- Resolves conflict between competing dependency lists
Address reviewer feedback:
- Add missing podAnnotations sections for decode and prefill
- Fix GPU resource calculation using get_accelerator_nr function
- Add missing container port configurations (5557, 8200)
- Fix arguments format to use proper multi-line strings
- Import missing functions from functions.py
- Use proper accelerator count calculation instead of 'auto'
Major improvements to match bash script exactly:
- Replace custom add_command_line_options with functions.py version
- Replace custom add_additional_env_to_yaml with functions.py version
- Replace custom add_config with functions.py version
- Replace custom add_annotations with functions.py version
- Fix args format: change from 'args: |' to proper 'args:' list format
- Remove redundant custom function implementations
- Use consistent function behavior across bash and Python

This should resolve the reviewer's concerns about missing environment
variables, incorrect YAML formatting, and inconsistent behavior.
"""
if not resource_name or not resource_value:
return ""
return f" {resource_name}: \"{resource_value}\""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return f" {resource_name}: \"{resource_value}\""
return f"{resource_name}: \"{resource_value}\""

prefill_cpu_nr = ev.get("vllm_modelservice_prefill_cpu_nr", "")

# Resource configuration
accelerator_resource = ev.get("vllm_common_accelerator_resource", "")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The env var gets reset in check_affinity(); the value in ev does not.

Suggested change
accelerator_resource = ev.get("vllm_common_accelerator_resource", "")
accelerator_resource = os.getenv("LLMDBENCH_VLLM_COMMON_ACCELERATOR_RESOURCE")

Comment on lines +209 to +218
# Extra configurations
decode_extra_pod_config = ev.get("vllm_modelservice_decode_extra_pod_config", "")
decode_extra_container_config = ev.get("vllm_modelservice_decode_extra_container_config", "")
decode_extra_volume_mounts = ev.get("vllm_modelservice_decode_extra_volume_mounts", "")
decode_extra_volumes = ev.get("vllm_modelservice_decode_extra_volumes", "")

prefill_extra_pod_config = ev.get("vllm_modelservice_prefill_extra_pod_config", "")
prefill_extra_container_config = ev.get("vllm_modelservice_prefill_extra_container_config", "")
prefill_extra_volume_mounts = ev.get("vllm_modelservice_prefill_extra_volume_mounts", "")
prefill_extra_volumes = ev.get("vllm_modelservice_prefill_extra_volumes", "")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sure there is a more elegant way to do this. However, this works. The problem is that the env variables are defaulted to "" in env.sh.

Suggested change
# Extra configurations
decode_extra_pod_config = ev.get("vllm_modelservice_decode_extra_pod_config", "")
decode_extra_container_config = ev.get("vllm_modelservice_decode_extra_container_config", "")
decode_extra_volume_mounts = ev.get("vllm_modelservice_decode_extra_volume_mounts", "")
decode_extra_volumes = ev.get("vllm_modelservice_decode_extra_volumes", "")
prefill_extra_pod_config = ev.get("vllm_modelservice_prefill_extra_pod_config", "")
prefill_extra_container_config = ev.get("vllm_modelservice_prefill_extra_container_config", "")
prefill_extra_volume_mounts = ev.get("vllm_modelservice_prefill_extra_volume_mounts", "")
prefill_extra_volumes = ev.get("vllm_modelservice_prefill_extra_volumes", "")
# Extra configurations
decode_extra_pod_config = ev.get("vllm_modelservice_decode_extra_pod_config", "#no____config")
if decode_extra_pod_config == "":
decode_extra_pod_config = "#no____config"
decode_extra_container_config = ev.get("vllm_modelservice_decode_extra_container_config", "#no____config")
if decode_extra_container_config == "":
decode_extra_container_config = "#no____config"
decode_extra_volume_mounts = ev.get("vllm_modelservice_decode_extra_volume_mounts", "[]")
if decode_extra_volume_mounts == "":
decode_extra_volume_mounts = "[]"
decode_extra_volumes = ev.get("vllm_modelservice_decode_extra_volumes", "[]")
if decode_extra_volumes == "":
decode_extra_volumes = "[]"
prefill_extra_pod_config = ev.get("vllm_modelservice_prefill_extra_pod_config", "#no____config")
if prefill_extra_pod_config == "":
prefill_extra_pod_config = "#no____config"
prefill_extra_container_config = ev.get("vllm_modelservice_prefill_extra_container_config", "#no____config")
if prefill_extra_container_config == "":
prefill_extra_container_config = "#no____config"
prefill_extra_volume_mounts = ev.get("vllm_modelservice_prefill_extra_volume_mounts", "[]")
if prefill_extra_volume_mounts == "":
prefill_extra_volume_mounts = "[]"
prefill_extra_volumes = ev.get("vllm_modelservice_prefill_extra_volumes", "[]")
if prefill_extra_volumes == "":
prefill_extra_volumes = "[]"

Comment on lines 583 to 584
# Set up configuration preparation
add_config_prep()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the proposed changes earlier (comment: # Extra configurations), this is no longer needed. Nor is the method.

Comment on lines 39 to 43
def add_pod_annotations(annotation_var: str) -> str:
"""
Generate podAnnotations YAML section.
"""
return functions_add_annotations(annotation_var)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this necessary?

@kalantar
Copy link
Contributor

kalantar commented Sep 10, 2025

In addition to the changes here, I believe the following changes are needed in the current version of functions.py:

% git diff setup/functions.py
diff --git a/setup/functions.py b/setup/functions.py
index 66e1964..b976789 100644
--- a/setup/functions.py
+++ b/setup/functions.py
@@ -864,7 +864,7 @@ def add_annotations(varname: str) -> str:
             key, value = entry.split(":", 1)
             annotation_lines.append(f"{indent}{key.strip()}: {value.strip()}")

-    return "\n".join(annotation_lines)
+    return "\n".join(annotation_lines).lstrip()


 def render_string(input_string):
@@ -929,8 +929,8 @@ def add_command_line_options(args_string):
     """
     current_step = os.environ.get("LLMDBENCH_CURRENT_STEP", "")

-    # Process REPLACE_ENV variables first
     if args_string:
+        # Process REPLACE_ENV variables first
         processed_args = render_string(args_string)

         # Handle formatting based on step and content
@@ -961,7 +961,7 @@ def add_command_line_options(args_string):
                 processed_args = processed_args.replace(";", ";\n      ")
                 processed_args = processed_args.replace(" --", " \\\n        --")

-            return f"      {processed_args}"
+            return f"  {processed_args}"
         else:
             # Default case
             processed_args = processed_args.replace("____", " ")
@@ -974,12 +974,12 @@ def add_command_line_options(args_string):
             return ""


-def add_additional_env_to_yaml(env_vars_string):
+def add_additional_env_to_yaml(env_vars_string_or_file):
     """
     Generate additional environment variables YAML.
     Equivalent to the bash add_additional_env_to_yaml function.
     """
-    if not env_vars_string:
+    if not env_vars_string_or_file:
         return ""

     # Determine indentation based on environment type
@@ -996,9 +996,16 @@ def add_additional_env_to_yaml(env_vars_string):
         name_indent = "        "  # default 8 spaces
         value_indent = "          "  # default 10 spaces

+    try:
+        with open(env_vars_string_or_file, 'r') as f:
+            contents = f.read()
+            return '\n'.join(f"{name_indent}{line}" for line in render_string(contents).splitlines()).lstrip()
+    except FileNotFoundError:
+        pass
+
     # Parse environment variables (comma-separated list)
     env_lines = []
-    for envvar in env_vars_string.split(","):
+    for envvar in env_vars_string_or_file.split(","):
         envvar = envvar.strip()
         if envvar:
             # Remove LLMDBENCH_VLLM_STANDALONE_ prefix if present
@@ -1025,12 +1032,14 @@ def add_config(obj_or_filename, num_spaces=0, label
=""):
         try:
             with open(obj_or_filename, 'r') as f:
                 contents = f.read()
+                indented_contents = '\n'.join(f"{spaces}{line}" for line in contents.splitlines())
         except FileNotFoundError:
+            indented_contents = contents
             pass

-    indented_contents = '\n'.join(f"{spaces}{line}" for line in contents.splitlines())
-    if indented_contents.strip() != "{}" :
-        indented_contents = f"  {label}\n{indented_contents}"
+    if indented_contents.strip() != "" :
+        if label != "" :
+            indented_contents = f"{label}:\n{indented_contents}"
     else :
         indented_contents = ""
     return indented_contents

Comment on lines 307 to 309
ports:
- containerPort: {decode_inference_port}
- containerPort: 5557
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is insufficient. The metrics collection requires a name. There is an env variable that should be looked at. An example is in scenarios/examples/inference-scheduling.sh. I think it should be possible to reuse add_config().

export LLMDBENCH_VLLM_MODELSERVICE_EXTRA_CONTAINER_CONFIG=$(mktemp)
cat << EOF > ${LLMDBENCH_VLLM_MODELSERVICE_EXTRA_CONTAINER_CONFIG}
ports:
  - containerPort: 5557
    protocol: TCP
  - containerPort: 8200
    name: metrics
    protocol: TCP
EOF

Note that in this example, the 8200 should probably be replaced by REPLACE_ENV_VLLM_MODELSERVICE_DECODE_INFERENCE_PORT

@kalantar
Copy link
Contributor

kalantar commented Sep 11, 2025

This is a related PR #335 that also proposes changes to add_additional_env_to_yaml()

Changes made:
- Remove unnecessary wrapper functions (add_config_prep, add_pod_annotations)
- Fix accelerator_resource to use os.environ.get() instead of ev.get()
  as suggested by reviewer - check_affinity() resets env vars
- Replace wrapper calls with direct functions.py calls
- Clean up redundant function definitions

Still pending reviewer guidance on:
- Container port configuration with metrics names
- Volume mounts/volumes default behavior
- Functions.py dependencies if any
Issue: add_config() function only treated '{}' as empty but not '[]'
This caused volume mounts to show malformed YAML instead of being omitted

Fix: Extend the empty check to include both '{}' and '[]'
Now both empty objects and arrays are properly omitted from YAML output

This resolves the volume mount configuration issue mentioned by reviewer.
@yossiovadia
Copy link
Contributor Author

yossiovadia commented Sep 11, 2025

Hi @kalantar (Michael),

I've addressed some of your feedback items and would like your guidance on the remaining implementation -

those are the completed Items:

-Removed unnecessary wrapper functions (add_config_prep, add_pod_annotations)
-Fixed accelerator_resource to use os.environ.get() instead of ev.get() as you suggested
-Replaced custom implementations with functions.py versions
-Fixed YAML arguments format to match bash output

  • Fixed functions.py add_config() to handle empty arrays "[]" like empty objects "{}"

however, need your guidance On:

1. Container Port Configuration with Metrics Names
You mentioned "ports need a name for metrics collection". Which approach do you prefer? some sugestions -

Option A: Add name fields directly
ports:
- containerPort: 5557
name: metrics
- containerPort: 8200
name: inference

Option B: Use add_config() function for ports configuration
ports:
{functions_add_config(some_port_env_var, 4)}

Option C: or to try research existing port naming patterns in codebase(?)


2. Volume Mounts/Volumes Default Behavior
Currently showing volumeMounts: [] instead of proper configuration. What's the expected behavior when there are no volume mounts?

Option A: Change defaults from "[]" to "#no____config"
Option B: Modify how add_config() handles empty arrays
Option C: Remove defaults entirely and let add_config() handle it

_Should empty volume configs result in:

  • No line at all (empty)
  • Empty array volumeMounts: []
  • Some other format?_

Please let me know your preferences and - thanks for your patience with this conversion!

@kalantar
Copy link
Contributor

  1. Container Port Configuration with Metrics Names

I am inclined to suggest option B since this aligns well with the approach taken for other things.

  1. Volume Mounts/Volumes Default Behavior

Ideally when there are no volumes and volume mounts, this field should be skipped. I think I had problems doing this in the sh version. Should be easier in py.

yossiovadia added a commit to yossiovadia/llm-d-benchmark that referenced this pull request Sep 11, 2025
- Remove extra spacing in filter_empty_resource function formatting
- Implement option B for container ports by removing hardcoded ports and using add_config approach
- Add conditional_volume_config function to skip empty volume/volumeMount fields entirely
- Implement add_config_prep function to set proper defaults for empty configurations
- Remove generate_ms_rules_yaml function and inline the logic for cleaner code
- Remove unused imports (jinja2, yaml, render_string) to reduce complexity

Co-Authored-By: Claude <[email protected]>
@kalantar
Copy link
Contributor

I tried this again. It is looking good. I do see that decode.extraConfig (and prefill.extraConfig) has no value (#no____config). Probably the default here should be {} and like volumes, volumeMounts, if there is no config the whole element should be removed.

  extraConfig:
#no____config

@kalantar
Copy link
Contributor

Hmm. Tried again. And now not seeing that. Seeing these extraConfig without : and no value.

    extraConfig

  containers:

The formatting still off for several things. I noted yesterday that many things seemed indented more than they should. I think the indentation added at the start is being added to the indent in the yaml string. So may need to do lstrip() before sending results (or after getting). For example, annotations and podAnnotations have this issue.

yossiovadia added a commit to yossiovadia/llm-d-benchmark that referenced this pull request Sep 12, 2025
- Replace "#no____config" defaults with "{}" for extraConfig sections
- Add conditional_extra_config function to skip empty extraConfig entirely
- Add lstrip() calls throughout template to fix YAML indentation
- Ensure proper YAML syntax and formatting for all config sections

Addresses reviewer Michael's feedback on template structure and formatting.
@yossiovadia
Copy link
Contributor Author

@kalantar I've fixed the extraConfig YAML formatting issue you identified. The problem was missing colons and improper structure.

Before (broken YAML):

extraConfig

  containers:
    - name: vllm

After (fixed YAML):

extraConfig:
  containers:
    - name: vllm

What was fixed:

  • Added missing colon (:) after extraConfig labels
  • Fixed indentation structure to prevent double-indentation
  • Enhanced empty config checking to skip malformed sections entirely

The fix is in commit f7b5b4f. The conditional_extra_config function now properly formats the YAML structure and handles empty configurations correctly.

@Vezio
Copy link
Collaborator

Vezio commented Sep 19, 2025

=> Fri Sep 19 11:11:30 EDT 2025 - ./setup/standup.sh - === Running step: 09_deploy_via_modelservice.py ===
2025-09-19 11:11:32,269 - INFO - ℹ️ Environment variable LLMDBENCH_VLLM_COMMON_AFFINITY automatically set to "nvidia.com/gpu.product:NVIDIA-H100-80GB-HBM3"
2025-09-19 11:11:33,004 - INFO - 🚀 Installing helm chart "ms-llmdbench" via helmfile...
2025-09-19 11:11:35,765 - INFO -
ERROR while executing command "helmfile --namespace vezio-llmd-bench --kubeconfig /var/folders/pj/7679q1l50672dgqrzp26fxdh0000gn/T/auto-standupXXX.eYXrBBH5rJ/environment/context.ctx --selector name=ibm-gran-b1f57d6a-instruct-ms apply -f /var/folders/pj/7679q1l50672dgqrzp26fxdh0000gn/T/auto-standupXXX.eYXrBBH5rJ/setup/helm/llmdbench/helmfile-00.yaml --skip-diff-on-install --skip-schema-validation"
2025-09-19 11:11:35,766 - INFO - Comparing release=ibm-gran-b1f57d6a-instruct-ms, chart=llm-d-modelservice/llm-d-modelservice, namespace=vezio-llmd-bench

2025-09-19 11:11:35,766 - INFO - Adding repo llm-d-modelservice https://llm-d-incubation.github.io/llm-d-modelservice/
"llm-d-modelservice" has been added to your repositories

Adding repo llm-d-infra https://llm-d-incubation.github.io/llm-d-infra/
"llm-d-infra" has been added to your repositories

Listing releases matching ^ibm-gran-b1f57d6a-instruct-ms$
ibm-gran-b1f57d6a-instruct-ms	vezio-llmd-bench	1       	2025-09-19 10:46:52.091323 -0400 EDT	deployedllm-d-modelservice-v0.2.9	v0.2.0

in /var/folders/pj/7679q1l50672dgqrzp26fxdh0000gn/T/auto-standupXXX.eYXrBBH5rJ/setup/helm/llmdbench/helmfile-00.yaml: command "/Users/vezio/homebrew/bin/helm" exited with non-zero status:

PATH:
  /Users/vezio/homebrew/bin/helm

ARGS:
  0: helm (4 bytes)
  1: --kubeconfig (12 bytes)
  2: /var/folders/pj/7679q1l50672dgqrzp26fxdh0000gn/T/auto-standupXXX.eYXrBBH5rJ/environment/context.ctx (99 bytes)
  3: diff (4 bytes)
  4: upgrade (7 bytes)
  5: --allow-unreleased (18 bytes)
  6: ibm-gran-b1f57d6a-instruct-ms (29 bytes)
  7: llm-d-modelservice/llm-d-modelservice (37 bytes)
  8: --version (9 bytes)
  9: v0.2.9 (6 bytes)
  10: --skip-schema-validation (24 bytes)
  11: --namespace (11 bytes)
  12: vezio-llmd-bench (16 bytes)
  13: --values (8 bytes)
  14: /var/folders/pj/7679q1l50672dgqrzp26fxdh0000gn/T/helmfile3094013161/vezio-llmd-bench-ibm-gran-b1f57d6a-instruct-ms-values-7bd67d7474 (132 bytes)
  15: --reset-values (14 bytes)
  16: --detailed-exitcode (19 bytes)

ERROR:
  exit status 1

EXIT STATUS
  1

STDERR:
  Error: Failed to render chart: exit status 1: Error: YAML parse error on llm-d-modelservice/templates/decode-deployment.yaml: error converting YAML to JSON: yaml: line 98: could not find expected ':'
  Use --debug flag to render out invalid YAML
  Error: plugin "diff" exited with error

COMBINED OUTPUT:
  Error: Failed to render chart: exit status 1: Error: YAML parse error on llm-d-modelservice/templates/decode-deployment.yaml: error converting YAML to JSON: yaml: line 98: could not find expected ':'
  Use --debug flag to render out invalid YAML
  Error: plugin "diff" exited with error

2025-09-19 11:11:35,766 - INFO - ❌ Failed to deploy helm chart for model ibm-granite/granite-3.3-2b-instruct

@yossiovadia - I'm still getting a yaml parse issue.

@Vezio
Copy link
Collaborator

Vezio commented Sep 19, 2025

Zeroing in on the yaml issue:

    75	          command: ["vllm", "serve"]
    76	          args:
    77	            - /model-cache/models/ibm-granite/granite-3.3-2b-instruct
    78	            - --port
    79	            - "8000"
    80	            - --served-model-name
    81	            - "ibm-granite/granite-3.3-2b-instruct"
    82
    83	            --disable-log-requests \ --max-model-len 16384 \ --tensor-parallel-size 1
    84	          env:
    85	          - name: VLLM_IS_PREFILL

83 --disable-log-requests \ --max-model-len 16384 \ --tensor-parallel-size 1

Seems to be a potential culprit ?

@yossiovadia
Copy link
Contributor Author

I'll ( about time ) will setup a real env to be able to test it ( vs dry run) will update shortly. thanks for the review.

@yossiovadia
Copy link
Contributor Author

ok, my apologies for the long back and forth. main reason is i had to dry run. I finally got a real cluster and managed to fix & validate all the steps ( 00->09 , including ) and all works well now.

Copy link
Collaborator

@Vezio Vezio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still appears to break when specifying scenario via -c:

./setup/standup.sh -p vezio-llmd-bench -m ibm-granite/granite-3.3-2b-instruct -c inference-scheduling.sh

==> Mon Sep 22 10:06:18 EDT 2025 - ./setup/standup.sh - === Running step: 09_deploy_via_modelservice.py ===
2025-09-22 10:06:20,670 - INFO - ℹ️ Environment variable LLMDBENCH_VLLM_COMMON_AFFINITY automatically set to "nvidia.com/gpu.product:NVIDIA-H100-80GB-HBM3"
2025-09-22 10:06:21,509 - INFO - 🚀 Installing helm chart "ms-llmdbench" via helmfile...
2025-09-22 10:06:23,770 - INFO -
ERROR while executing command "helmfile --namespace vezio-llmd-bench --kubeconfig /Users/vezio/data/inference-scheduling/environment/context.ctx --selector name=ibm-gran-b1f57d6a-instruct-ms apply -f /Users/vezio/data/inference-scheduling/setup/helm/llmdbench/helmfile-00.yaml --skip-diff-on-install --skip-schema-validation"
2025-09-22 10:06:23,771 - INFO - Comparing release=ibm-gran-b1f57d6a-instruct-ms, chart=llm-d-modelservice/llm-d-modelservice, namespace=vezio-llmd-bench

2025-09-22 10:06:23,771 - INFO - Adding repo llm-d-modelservice https://llm-d-incubation.github.io/llm-d-modelservice/
"llm-d-modelservice" has been added to your repositories

Adding repo llm-d-infra https://llm-d-incubation.github.io/llm-d-infra/
"llm-d-infra" has been added to your repositories

Listing releases matching ^ibm-gran-b1f57d6a-instruct-ms$
ibm-gran-b1f57d6a-instruct-ms	vezio-llmd-bench	2       	2025-09-19 16:12:38.073087 -0400 EDT	deployedllm-d-modelservice-v0.2.9	v0.2.0

in /Users/vezio/data/inference-scheduling/setup/helm/llmdbench/helmfile-00.yaml: command "/Users/vezio/homebrew/bin/helm" exited with non-zero status:

PATH:
  /Users/vezio/homebrew/bin/helm

ARGS:
  0: helm (4 bytes)
  1: --kubeconfig (12 bytes)
  2: /Users/vezio/data/inference-scheduling/environment/context.ctx (62 bytes)
  3: diff (4 bytes)
  4: upgrade (7 bytes)
  5: --allow-unreleased (18 bytes)
  6: ibm-gran-b1f57d6a-instruct-ms (29 bytes)
  7: llm-d-modelservice/llm-d-modelservice (37 bytes)
  8: --version (9 bytes)
  9: v0.2.9 (6 bytes)
  10: --skip-schema-validation (24 bytes)
  11: --namespace (11 bytes)
  12: vezio-llmd-bench (16 bytes)
  13: --values (8 bytes)
  14: /var/folders/pj/7679q1l50672dgqrzp26fxdh0000gn/T/helmfile1913400668/vezio-llmd-bench-ibm-gran-b1f57d6a-instruct-ms-values-85686dd485 (132 bytes)
  15: --reset-values (14 bytes)
  16: --detailed-exitcode (19 bytes)

ERROR:
  exit status 1

EXIT STATUS
  1

STDERR:
  Error: Failed to render chart: exit status 1: Error: failed to parse /var/folders/pj/7679q1l50672dgqrzp26fxdh0000gn/T/helmfile1913400668/vezio-llmd-bench-ibm-gran-b1f57d6a-instruct-ms-values-85686dd485: error converting YAML to JSON: yaml: line 82: did not find expected '-' indicator
  Error: plugin "diff" exited with error

COMBINED OUTPUT:
  Error: Failed to render chart: exit status 1: Error: failed to parse /var/folders/pj/7679q1l50672dgqrzp26fxdh0000gn/T/helmfile1913400668/vezio-llmd-bench-ibm-gran-b1f57d6a-instruct-ms-values-85686dd485: error converting YAML to JSON: yaml: line 82: did not find expected '-' indicator
  Error: plugin "diff" exited with error

2025-09-22 10:06:23,772 - INFO - ❌ Failed to deploy helm chart for model ibm-granite/granite-3.3-2b-instruct

@Vezio
Copy link
Collaborator

Vezio commented Sep 22, 2025

When no scenario (-c) is specified - it runs to completion with no errors.

@kalantar
Copy link
Contributor

The problem appears to be in the generated setup/helm/llmdbench/00/ms-values.yaml. The args look like:

 78     args:
 79       - "--enforce-eager"
 80       - "--block-size"
 81       - "64"
 82       - "--kv-transfer-config"
 83       - "'{"kv_connector":"NixlConnector","kv_role":"kv_both"}'"
 84       - "--disable-log-requests"
 85       - "--disable-uvicorn-access-log"
 86       - "--max-model-len"
 87       - "16000--tensor-parallel-size"
 88       - "4"

I'm not sure about line 82 as reported. My ide complains about line 83. I think line 87 is also suspect.
When run with LLMDBENCH_CONTROL_STEP_09_IMPLEMENTATION=sh, this part of the generated file looks like:

 84     args:
 85       - "--enforce-eager"
 86       - "--block-size"
 87       - "64"
 88       - "--kv-transfer-config"
 89       - '{"kv_connector":"NixlConnector","kv_role":"kv_both"}'
 90       - "--disable-log-requests"
 91       - "--disable-uvicorn-access-log"
 92       - "--max-model-len"
 93       - "16000--tensor-parallel-size"
 94       - "4

line 83/89 has different quoting.

TBH, I am suspicious of line 87/93 and the quoting on 94.
The line numbers don't match because a difference in the generation of the HTTPRoute; in the py version there is no timeouts section.

@kalantar
Copy link
Contributor

The problem lines lines 87 / 93-94 are caused by the scenarios/example/inference-scheduling.sh. See #364.

@kalantar
Copy link
Contributor

The problem lines lines 87 / 93-94 are caused by the scenarios/example/inference-scheduling.sh. See #364.

This fix also seems to fix the problem with quoting that caused the helm errors.

@yossiovadia yossiovadia requested a review from Vezio September 22, 2025 15:45
@kalantar
Copy link
Contributor

kalantar commented Sep 22, 2025

I take that back, I had LLMDBENCH_CONTROL_STEP_09_IMPLEMENTATION=sh. When I set it to py I still get an error caused by the quoting of this line: - "'{"kv_connector":"NixlConnector","kv_role":"kv_both"}'"

The behavior is different from sh.

@Vezio
Copy link
Collaborator

Vezio commented Sep 22, 2025

Same error continues despite @kalantar's related fix.

@kalantar
Copy link
Contributor

I commented above

The line numbers don't match because a difference in the generation of the HTTPRoute; in the py version there is no timeouts section.

The changes to the sh version of step 09 in this PR: #357 should be incorporated in the python version to address this issue.

@kalantar
Copy link
Contributor

@yossiovadia has the comment #323 (comment) been addressed? I think this is the last outstanding issue I have been waiting for before reviewing again.

yossiovadia added a commit to yossiovadia/llm-d-benchmark that referenced this pull request Sep 25, 2025
Resolve the quoting issue identified in comment llm-d#323 where Python generated:
- "'{"kv_connector":"NixlConnector","kv_role":"kv_both"}'" (broken)

Now correctly generates like bash implementation:
- '{"kv_connector":"NixlConnector","kv_role":"kv_both"}' (working)

- Detect arguments that already have single quotes (JSON strings)
- Use them as-is without additional double quote wrapping
- Only wrap regular arguments in double quotes

Addresses the last outstanding reviewer concern before final approval.
yossiovadia and others added 8 commits September 26, 2025 08:58
- Remove extra spacing in filter_empty_resource function formatting
- Implement option B for container ports by removing hardcoded ports and using add_config approach
- Add conditional_volume_config function to skip empty volume/volumeMount fields entirely
- Implement add_config_prep function to set proper defaults for empty configurations
- Remove generate_ms_rules_yaml function and inline the logic for cleaner code
- Remove unused imports (jinja2, yaml, render_string) to reduce complexity

Co-Authored-By: Claude <[email protected]>
Signed-off-by: Yossi Ovadia <[email protected]>
- Replace "#no____config" defaults with "{}" for extraConfig sections
- Add conditional_extra_config function to skip empty extraConfig entirely
- Add lstrip() calls throughout template to fix YAML indentation
- Ensure proper YAML syntax and formatting for all config sections

Addresses reviewer Michael's feedback on template structure and formatting.

Signed-off-by: Yossi Ovadia <[email protected]>
…double indentation

- Fixed conditional_extra_config function to properly add colon after extraConfig label
- Added proper empty config checking before processing to avoid unnecessary sections
- Fixed indentation structure to prevent double-indentation issues
- Resolves issue where extraConfig sections appeared without colons (extraConfig\n  containers:)
- Now correctly generates: extraConfig:\n    content...

Co-Authored-By: Claude <[email protected]>
Signed-off-by: Yossi Ovadia <[email protected]>
- Fix YAML args formatting: Convert from malformed backslash format to proper YAML list with quoted strings
- Resolve Helm deployment error: 'cannot unmarshal number into Go struct field Container.args of type string'
- Add accelerator resource auto-detection: Convert 'auto' to 'nvidia.com/gpu'
- Improve resource section generation: Clean resource limits/requests without empty values
- Fix CPU/memory fallback logic: Use common values when specific ones are empty
- Remove duplicate check_storage_class function causing pykube object_factory errors
- Add OpenShift L40S GPU scenario for testing modelservice deployment

Tested on real OpenShift cluster with NVIDIA L40S GPUs. Full pipeline (steps 0-9) now completes successfully with Python step 09 implementation.

Signed-off-by: Yossi Ovadia <[email protected]>
Resolve issue where custom scenarios with complex arguments (like JSON strings)
caused 'did not find expected '-' indicator' YAML parsing errors.

- Change args parsing to split on '____' delimiters instead of whitespace
- Add proper quote escaping for arguments containing JSON or special characters
- Preserve complex arguments like --kv-transfer-config with embedded JSON

Fixes reviewer issue: Custom scenarios (-c) now work correctly.

Signed-off-by: Yossi Ovadia <[email protected]>
Address reviewer feedback on malformed args output:
- Fix concatenated arguments like '16000--tensor-parallel-size'
- Resolve unclosed quote issues like '"4'
- Clean up trailing backslashes and quotes from bash line continuations
- Prevent empty arguments from being added to YAML

Resolves specific issues identified in generated ms-values.yaml lines 87 and 94.

Signed-off-by: Yossi Ovadia <[email protected]>
Incorporate changes from PR llm-d#357 that add infinite timeouts to HTTPRoute configurations:
- Add timeouts section with backendRequest: 0s and request: 0s to both HTTPRoute rules
- Prevents request timeouts during long-running model inference operations
- Matches the bash implementation timeout behavior

Resolves reviewer feedback about missing timeout sections in Python version.

Signed-off-by: Yossi Ovadia <[email protected]>
Resolve the quoting issue identified in comment llm-d#323 where Python generated:
- "'{"kv_connector":"NixlConnector","kv_role":"kv_both"}'" (broken)

Now correctly generates like bash implementation:
- '{"kv_connector":"NixlConnector","kv_role":"kv_both"}' (working)

- Detect arguments that already have single quotes (JSON strings)
- Use them as-is without additional double quote wrapping
- Only wrap regular arguments in double quotes

Addresses the last outstanding reviewer concern before final approval.

Signed-off-by: Yossi Ovadia <[email protected]>
@yossiovadia yossiovadia force-pushed the convert-step-09-to-python branch from 771b345 to 1ec90d6 Compare September 26, 2025 15:59
@Vezio
Copy link
Collaborator

Vezio commented Sep 26, 2025

==> Fri Sep 26 14:08:16 EDT 2025 - /Users/vezio/IBM/llmd/test/test-llm-d-benchmark/llm-d-benchmark/setup/run.sh - ✅ Done rendering "random_concurrent" workload profile templates to "/Users/vezio/data/pd-disaggregation/workload/profiles/"
/Users/vezio/IBM/llmd/test/test-llm-d-benchmark/llm-d-benchmark/setup/run.sh: line 407: /Users/vezio/IBM/llmd/test/test-llm-d-benchmark/llm-d-benchmark/setup/presets/gaie/plugins-v2.yaml: No such file or directory
...
...
...
==> Fri Sep 26 14:18:34 EDT 2025 - /Users/vezio/IBM/llmd/test/test-llm-d-benchmark/llm-d-benchmark/setup/run.sh - ✅ Benchmark execution for model "meta-llama/Llama-3.1-70B-Instruct" completed
==> Fri Sep 26 14:18:34 EDT 2025 - /Users/vezio/IBM/llmd/test/test-llm-d-benchmark/llm-d-benchmark/setup/run.sh - 🗑️ Deleting pod "llmdbench-vllm-benchmark-launcher" for model "meta-llama/Llama-3.1-70B-Instruct" ...
==> Fri Sep 26 14:18:35 EDT 2025 - /Users/vezio/IBM/llmd/test/test-llm-d-benchmark/llm-d-benchmark/setup/run.sh - ✅ Pod "llmdbench-vllm-benchmark-launcher" for model "meta-llama/Llama-3.1-70B-Instruct" deleted
==> Fri Sep 26 14:18:35 EDT 2025 - /Users/vezio/IBM/llmd/test/test-llm-d-benchmark/llm-d-benchmark/setup/run.sh - 🏗️ Collecting results for model "meta-llama/Llama-3.1-70B-Instruct" (meta-llama/Llama-3.1-70B-Instruct) to "/Users/vezio/data/pd-disaggregation/results/vllm-benchmark_1758909201-none_llm-d-70b-instruct"...
ERROR while executing command "oc --kubeconfig /Users/vezio/data/pd-disaggregation/environment/context.ctx --namespace llmdbench cp --retries=5 access-to-harness-data-workload-pvc:/requests/vllm-benchmark_1758909201-none_llm-d-70b-instruct /Users/vezio/data/pd-disaggregation/results/vllm-benchmark_1758909201-none_llm-d-70b-instruct"

Error from server (BadRequest): pod access-to-harness-data-workload-pvc does not have a host assigned
==> Fri Sep 26 14:18:36 EDT 2025 - ./setup/e2e.sh - ℹ️  Unconditionally moving "/Users/vezio/data/pd-disaggregation" to "/Users/vezio/data/pd-disaggregation.none"...
mv: rename /Users/vezio/data/pd-disaggregation to /Users/vezio/data/pd-disaggregation.none/pd-disaggregation: Directory not empty

So I'm not sure why it's not using the correct namespace? It appears to be using the default llmdbench namespace, but not the custom one I provided - all other commands were executed accordingly...but it seems to be failing because of that.

This could be unrelated - but I'm not sure why I would see this failures now?

After speaking w/ @maugustosilva I believe this is my own user error based on a recent PR that was merged on some NS changes. Rerunning now!

@Vezio
Copy link
Collaborator

Vezio commented Sep 26, 2025

This looks pretty good based on:

./setup/e2e.sh -p vezio-test -c pd-disaggregation.sh -t modelservice --deep

NAME                                                  READY   STATUS      RESTARTS   AGE
download-model-9fthc                                  0/1     Completed   0          10m
infra-llmdbench-inference-gateway-6d97dcbfb9-zn8b2    1/1     Running     0          5m14s
meta-lla-8f96c2da-instruct-decode-5944cdc5b7-5n2pq    2/2     Running     0          4m50s
meta-lla-8f96c2da-instruct-gaie-epp-d4d866c65-x5t6b   1/1     Running     0          5m5s
meta-lla-8f96c2da-instruct-prefill-78bbc9ccd-xrpdw    1/1     Running     0          4m51s
==> Fri Sep 26 15:20:46 EDT 2025 - /Users/vezio/IBM/llmd/test/test-llm-d-benchmark/llm-d-benchmark/setup/standup.sh - === Running step: 09_deploy_via_modelservice.sh ===

==> Fri Sep 26 15:20:50 EDT 2025 - /Users/vezio/IBM/llmd/test/test-llm-d-benchmark/llm-d-benchmark/setup/standup.sh - ℹ️ Environment variable LLMDBENCH_VLLM_COMMON_AFFINITY automatically set to "nvidia.com/gpu.product:NVIDIA-H100-80GB-HBM3"
==> Fri Sep 26 15:20:52 EDT 2025 - /Users/vezio/IBM/llmd/test/test-llm-d-benchmark/llm-d-benchmark/setup/standup.sh - 🚀 Installing helm chart "ms-llmdbench" via helmfile...
==> Fri Sep 26 15:21:02 EDT 2025 - /Users/vezio/IBM/llmd/test/test-llm-d-benchmark/llm-d-benchmark/setup/standup.sh - ✅ vezio-test-meta-lla-8f96c2da-instruct-ms helm chart deployed successfully
==> Fri Sep 26 15:21:02 EDT 2025 - /Users/vezio/IBM/llmd/test/test-llm-d-benchmark/llm-d-benchmark/setup/standup.sh - ⏳ waiting for (decode) pods serving model meta-llama/Llama-3.1-70B-Instruct to be created...
==> Fri Sep 26 15:21:03 EDT 2025 - /Users/vezio/IBM/llmd/test/test-llm-d-benchmark/llm-d-benchmark/setup/standup.sh - ✅ (decode) pods serving model meta-llama/Llama-3.1-70B-Instruct created
==> Fri Sep 26 15:21:03 EDT 2025 - /Users/vezio/IBM/llmd/test/test-llm-d-benchmark/llm-d-benchmark/setup/standup.sh - ⏳ waiting for (prefill) pods serving model meta-llama/Llama-3.1-70B-Instruct to be created...
==> Fri Sep 26 15:21:03 EDT 2025 - /Users/vezio/IBM/llmd/test/test-llm-d-benchmark/llm-d-benchmark/setup/standup.sh - ✅ (prefill) pods serving model meta-llama/Llama-3.1-70B-Instruct created
==> Fri Sep 26 15:21:04 EDT 2025 - /Users/vezio/IBM/llmd/test/test-llm-d-benchmark/llm-d-benchmark/setup/standup.sh - ⏳ Waiting for (decode) pods serving model meta-llama/Llama-3.1-70B-Instruct to be in "Running" state (timeout=900000s)...
==> Fri Sep 26 15:27:25 EDT 2025 - /Users/vezio/IBM/llmd/test/test-llm-d-benchmark/llm-d-benchmark/setup/standup.sh - 🚀 (decode) pods serving model meta-llama/Llama-3.1-70B-Instruct running
==> Fri Sep 26 15:27:25 EDT 2025 - /Users/vezio/IBM/llmd/test/test-llm-d-benchmark/llm-d-benchmark/setup/standup.sh - ⏳ Waiting for (prefill) pods serving model meta-llama/Llama-3.1-70B-Instruct to be in "Running" state (timeout=900000s)...
==> Fri Sep 26 15:27:26 EDT 2025 - /Users/vezio/IBM/llmd/test/test-llm-d-benchmark/llm-d-benchmark/setup/standup.sh - 🚀 (prefill) pods serving model meta-llama/Llama-3.1-70B-Instruct running
==> Fri Sep 26 15:27:26 EDT 2025 - /Users/vezio/IBM/llmd/test/test-llm-d-benchmark/llm-d-benchmark/setup/standup.sh - ⏳ Waiting for (decode) pods serving meta-llama/Llama-3.1-70B-Instruct to be Ready (timeout=900000s)...



==> Fri Sep 26 15:31:23 EDT 2025 - /Users/vezio/IBM/llmd/test/test-llm-d-benchmark/llm-d-benchmark/setup/standup.sh - 🚀 (decode) pods serving model meta-llama/Llama-3.1-70B-Instruct ready
==> Fri Sep 26 15:31:24 EDT 2025 - /Users/vezio/IBM/llmd/test/test-llm-d-benchmark/llm-d-benchmark/setup/standup.sh - ⏳ Waiting for (prefill) pods serving meta-llama/Llama-3.1-70B-Instruct to be Ready (timeout=900000s)...
==> Fri Sep 26 15:31:25 EDT 2025 - /Users/vezio/IBM/llmd/test/test-llm-d-benchmark/llm-d-benchmark/setup/standup.sh - 🚀 (prefill) pods serving model meta-llama/Llama-3.1-70B-Instruct ready
==> Fri Sep 26 15:31:27 EDT 2025 - /Users/vezio/IBM/llmd/test/test-llm-d-benchmark/llm-d-benchmark/setup/standup.sh - 📜 Exposing pods serving model meta-llama/Llama-3.1-70B-Instruct as service...
==> Fri Sep 26 15:31:27 EDT 2025 - /Users/vezio/IBM/llmd/test/test-llm-d-benchmark/llm-d-benchmark/setup/standup.sh - ✅ Service for pods service model meta-llama/Llama-3.1-70B-Instruct created
==> Fri Sep 26 15:31:27 EDT 2025 - /Users/vezio/IBM/llmd/test/test-llm-d-benchmark/llm-d-benchmark/setup/standup.sh - ✅ Model "meta-llama/Llama-3.1-70B-Instruct" and associated service deployed.
==> Fri Sep 26 15:31:27 EDT 2025 - /Users/vezio/IBM/llmd/test/test-llm-d-benchmark/llm-d-benchmark/setup/standup.sh - ✅ modelservice completed model deployment

I've seen other issues but looks unrelated to this PR.

@maugustosilva maugustosilva merged commit f505618 into llm-d:main Sep 29, 2025
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Convert step 09 from bash to python
4 participants