-
Notifications
You must be signed in to change notification settings - Fork 31
Convert step 09 to Python - Deploy via modelservice #323
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Convert step 09 to Python - Deploy via modelservice #323
Conversation
This converts the bash script setup/steps/09_deploy_via_modelservice.sh to Python as setup/steps/09_deploy_via_modelservice.py. Key changes: - Full Python conversion of model deployment logic - Improved error handling and logging - Support for REPLACE_ENV variable processing - Consistent integration with existing Python conversion framework Fixes llm-d#269 Co-Authored-By: Claude <[email protected]>
NOTE - was only able to dry-run test it ( have some proxy problems connecting to real env now) |
decode: | ||
replicaCount: {{ decode_replicas }} | ||
kserve: | ||
storageUri: "{{ storage_uri }}" | ||
{{ decode_extra_pod_config }} | ||
{{ decode_command }} | ||
{{ decode_extra_container_config }} | ||
|
||
prefill: | ||
replicaCount: {{ prefill_replicas }} | ||
kserve: | ||
storageUri: "{{ storage_uri }}" | ||
{{ prefill_extra_pod_config }} | ||
{{ prefill_command }} | ||
{{ prefill_extra_container_config }} | ||
|
||
{% if mount_model_volume %} | ||
cache: | ||
storageClass: {{ storage_class }} | ||
size: {{ cache_size }} | ||
{% endif %} | ||
|
||
{% if gateway_enabled %} | ||
gateway: | ||
domain: {{ gateway_domain }} | ||
{% endif %} | ||
|
||
{% if route_enabled %} | ||
route: | ||
enabled: true | ||
domain: {{ route_domain }} | ||
{% endif %} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not a direct translation. Why the move to kserve
as part of this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there was fundamental error in my conversion approach. Instead of doing a true line-by-line translation of the bash script's YAML template(lines 60-239), I mistakenly referenced a different template structure that uses kserve configuration blocks. This was likely due to working with multiple modelservice configurations and conflating different deployment patterns.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no problem. we probably want a kserve deployment pattern in addtion to standalone and modelservice
- Fixed dependency issue by adding Jinja2 to install_deps.sh - Completely rewrote template structure in 09_deploy_via_modelservice.py to match original bash script exactly - Removed incorrect kserve structure that was not in original bash script - Template now properly reflects the actual bash script logic and structure Co-Authored-By: Claude <[email protected]>
if ev.get("control_environment_type_modelservice_active", "0") != "1": | ||
deploy_methods = ev.get("deploy_methods", "") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if ev.get("control_environment_type_modelservice_active", "0") != "1": | |
deploy_methods = ev.get("deploy_methods", "") | |
if not ev["control_environment_type_modelservice_active"]: | |
deploy_methods = ev.get("deploy_methods", "unknown") |
When I execute, I get this error on the execution of
I inspected the values file (
I'm guessing Additional edit:
|
Due to introduction of (incomplete) schema validation in modelservice, should add |
While trying to fix this, I found several fields that were blank in the values file. Perhaps in addition to dry run, inspect the resulting values file and/run |
- Fix YAML indentation in add_annotations() (4 spaces vs 6 spaces) - Use existing add_command_line_options() from functions.py instead of custom implementation - Fix argument formatting to use bash-style multi-line strings with continuations - Fix affinity configuration to get fresh values after check_affinity() call - Fix boolean environment variable check for modelservice activation - Add --skip-schema-validation flag to helmfile command - Set LLMDBENCH_CURRENT_STEP=09 for functions.py compatibility All REPLACE_ENV patterns now process correctly and helm template validation passes. Co-Authored-By: Claude <[email protected]>
thanks for the valuable review. this is a challenging one..
|
I did:
(extra steps to ensure creation of all helm files)
I then repeated:
Finally, I did diff:
|
- Merge dependency lists from HEAD and upstream/main - Keep both pykube-ng (from upstream) and Jinja2 (from HEAD) - Resolves conflict between competing dependency lists
Address reviewer feedback: - Add missing podAnnotations sections for decode and prefill - Fix GPU resource calculation using get_accelerator_nr function - Add missing container port configurations (5557, 8200) - Fix arguments format to use proper multi-line strings - Import missing functions from functions.py - Use proper accelerator count calculation instead of 'auto'
Major improvements to match bash script exactly: - Replace custom add_command_line_options with functions.py version - Replace custom add_additional_env_to_yaml with functions.py version - Replace custom add_config with functions.py version - Replace custom add_annotations with functions.py version - Fix args format: change from 'args: |' to proper 'args:' list format - Remove redundant custom function implementations - Use consistent function behavior across bash and Python This should resolve the reviewer's concerns about missing environment variables, incorrect YAML formatting, and inconsistent behavior.
""" | ||
if not resource_name or not resource_value: | ||
return "" | ||
return f" {resource_name}: \"{resource_value}\"" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return f" {resource_name}: \"{resource_value}\"" | |
return f"{resource_name}: \"{resource_value}\"" |
prefill_cpu_nr = ev.get("vllm_modelservice_prefill_cpu_nr", "") | ||
|
||
# Resource configuration | ||
accelerator_resource = ev.get("vllm_common_accelerator_resource", "") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The env var gets reset in check_affinity()
; the value in ev
does not.
accelerator_resource = ev.get("vllm_common_accelerator_resource", "") | |
accelerator_resource = os.getenv("LLMDBENCH_VLLM_COMMON_ACCELERATOR_RESOURCE") |
# Extra configurations | ||
decode_extra_pod_config = ev.get("vllm_modelservice_decode_extra_pod_config", "") | ||
decode_extra_container_config = ev.get("vllm_modelservice_decode_extra_container_config", "") | ||
decode_extra_volume_mounts = ev.get("vllm_modelservice_decode_extra_volume_mounts", "") | ||
decode_extra_volumes = ev.get("vllm_modelservice_decode_extra_volumes", "") | ||
|
||
prefill_extra_pod_config = ev.get("vllm_modelservice_prefill_extra_pod_config", "") | ||
prefill_extra_container_config = ev.get("vllm_modelservice_prefill_extra_container_config", "") | ||
prefill_extra_volume_mounts = ev.get("vllm_modelservice_prefill_extra_volume_mounts", "") | ||
prefill_extra_volumes = ev.get("vllm_modelservice_prefill_extra_volumes", "") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sure there is a more elegant way to do this. However, this works. The problem is that the env variables are defaulted to "" in env.sh
.
# Extra configurations | |
decode_extra_pod_config = ev.get("vllm_modelservice_decode_extra_pod_config", "") | |
decode_extra_container_config = ev.get("vllm_modelservice_decode_extra_container_config", "") | |
decode_extra_volume_mounts = ev.get("vllm_modelservice_decode_extra_volume_mounts", "") | |
decode_extra_volumes = ev.get("vllm_modelservice_decode_extra_volumes", "") | |
prefill_extra_pod_config = ev.get("vllm_modelservice_prefill_extra_pod_config", "") | |
prefill_extra_container_config = ev.get("vllm_modelservice_prefill_extra_container_config", "") | |
prefill_extra_volume_mounts = ev.get("vllm_modelservice_prefill_extra_volume_mounts", "") | |
prefill_extra_volumes = ev.get("vllm_modelservice_prefill_extra_volumes", "") | |
# Extra configurations | |
decode_extra_pod_config = ev.get("vllm_modelservice_decode_extra_pod_config", "#no____config") | |
if decode_extra_pod_config == "": | |
decode_extra_pod_config = "#no____config" | |
decode_extra_container_config = ev.get("vllm_modelservice_decode_extra_container_config", "#no____config") | |
if decode_extra_container_config == "": | |
decode_extra_container_config = "#no____config" | |
decode_extra_volume_mounts = ev.get("vllm_modelservice_decode_extra_volume_mounts", "[]") | |
if decode_extra_volume_mounts == "": | |
decode_extra_volume_mounts = "[]" | |
decode_extra_volumes = ev.get("vllm_modelservice_decode_extra_volumes", "[]") | |
if decode_extra_volumes == "": | |
decode_extra_volumes = "[]" | |
prefill_extra_pod_config = ev.get("vllm_modelservice_prefill_extra_pod_config", "#no____config") | |
if prefill_extra_pod_config == "": | |
prefill_extra_pod_config = "#no____config" | |
prefill_extra_container_config = ev.get("vllm_modelservice_prefill_extra_container_config", "#no____config") | |
if prefill_extra_container_config == "": | |
prefill_extra_container_config = "#no____config" | |
prefill_extra_volume_mounts = ev.get("vllm_modelservice_prefill_extra_volume_mounts", "[]") | |
if prefill_extra_volume_mounts == "": | |
prefill_extra_volume_mounts = "[]" | |
prefill_extra_volumes = ev.get("vllm_modelservice_prefill_extra_volumes", "[]") | |
if prefill_extra_volumes == "": | |
prefill_extra_volumes = "[]" | |
# Set up configuration preparation | ||
add_config_prep() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given the proposed changes earlier (comment: # Extra configurations
), this is no longer needed. Nor is the method.
def add_pod_annotations(annotation_var: str) -> str: | ||
""" | ||
Generate podAnnotations YAML section. | ||
""" | ||
return functions_add_annotations(annotation_var) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this necessary?
In addition to the changes here, I believe the following changes are needed in the current version of
|
ports: | ||
- containerPort: {decode_inference_port} | ||
- containerPort: 5557 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is insufficient. The metrics collection requires a name
. There is an env variable that should be looked at. An example is in scenarios/examples/inference-scheduling.sh
. I think it should be possible to reuse add_config()
.
export LLMDBENCH_VLLM_MODELSERVICE_EXTRA_CONTAINER_CONFIG=$(mktemp)
cat << EOF > ${LLMDBENCH_VLLM_MODELSERVICE_EXTRA_CONTAINER_CONFIG}
ports:
- containerPort: 5557
protocol: TCP
- containerPort: 8200
name: metrics
protocol: TCP
EOF
Note that in this example, the 8200
should probably be replaced by REPLACE_ENV_VLLM_MODELSERVICE_DECODE_INFERENCE_PORT
This is a related PR #335 that also proposes changes to |
Changes made: - Remove unnecessary wrapper functions (add_config_prep, add_pod_annotations) - Fix accelerator_resource to use os.environ.get() instead of ev.get() as suggested by reviewer - check_affinity() resets env vars - Replace wrapper calls with direct functions.py calls - Clean up redundant function definitions Still pending reviewer guidance on: - Container port configuration with metrics names - Volume mounts/volumes default behavior - Functions.py dependencies if any
Issue: add_config() function only treated '{}' as empty but not '[]' This caused volume mounts to show malformed YAML instead of being omitted Fix: Extend the empty check to include both '{}' and '[]' Now both empty objects and arrays are properly omitted from YAML output This resolves the volume mount configuration issue mentioned by reviewer.
Hi @kalantar (Michael), I've addressed some of your feedback items and would like your guidance on the remaining implementation - those are the completed Items: -Removed unnecessary wrapper functions (add_config_prep, add_pod_annotations)
however, need your guidance On: 1. Container Port Configuration with Metrics Names Option A: Add name fields directly Option B: Use add_config() function for ports configuration Option C: or to try research existing port naming patterns in codebase(?) 2. Volume Mounts/Volumes Default Behavior Option A: Change defaults from "[]" to "#no____config" _Should empty volume configs result in:
Please let me know your preferences and - thanks for your patience with this conversion! |
I am inclined to suggest option B since this aligns well with the approach taken for other things.
Ideally when there are no volumes and volume mounts, this field should be skipped. I think I had problems doing this in the sh version. Should be easier in py. |
- Remove extra spacing in filter_empty_resource function formatting - Implement option B for container ports by removing hardcoded ports and using add_config approach - Add conditional_volume_config function to skip empty volume/volumeMount fields entirely - Implement add_config_prep function to set proper defaults for empty configurations - Remove generate_ms_rules_yaml function and inline the logic for cleaner code - Remove unused imports (jinja2, yaml, render_string) to reduce complexity Co-Authored-By: Claude <[email protected]>
I tried this again. It is looking good. I do see that
|
Hmm. Tried again. And now not seeing that. Seeing these extraConfig without
The formatting still off for several things. I noted yesterday that many things seemed indented more than they should. I think the indentation added at the start is being added to the indent in the yaml string. So may need to do lstrip() before sending results (or after getting). For example, annotations and podAnnotations have this issue. |
- Replace "#no____config" defaults with "{}" for extraConfig sections - Add conditional_extra_config function to skip empty extraConfig entirely - Add lstrip() calls throughout template to fix YAML indentation - Ensure proper YAML syntax and formatting for all config sections Addresses reviewer Michael's feedback on template structure and formatting.
@kalantar I've fixed the extraConfig YAML formatting issue you identified. The problem was missing colons and improper structure. Before (broken YAML):extraConfig
containers:
- name: vllm After (fixed YAML):extraConfig:
containers:
- name: vllm What was fixed:
The fix is in commit f7b5b4f. The |
@yossiovadia - I'm still getting a |
Zeroing in on the
Seems to be a potential culprit ? |
I'll ( about time ) will setup a real env to be able to test it ( vs dry run) will update shortly. thanks for the review. |
ok, my apologies for the long back and forth. main reason is i had to dry run. I finally got a real cluster and managed to fix & validate all the steps ( 00->09 , including ) and all works well now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still appears to break when specifying scenario
via -c
:
./setup/standup.sh -p vezio-llmd-bench -m ibm-granite/granite-3.3-2b-instruct -c inference-scheduling.sh
==> Mon Sep 22 10:06:18 EDT 2025 - ./setup/standup.sh - === Running step: 09_deploy_via_modelservice.py ===
2025-09-22 10:06:20,670 - INFO - ℹ️ Environment variable LLMDBENCH_VLLM_COMMON_AFFINITY automatically set to "nvidia.com/gpu.product:NVIDIA-H100-80GB-HBM3"
2025-09-22 10:06:21,509 - INFO - 🚀 Installing helm chart "ms-llmdbench" via helmfile...
2025-09-22 10:06:23,770 - INFO -
ERROR while executing command "helmfile --namespace vezio-llmd-bench --kubeconfig /Users/vezio/data/inference-scheduling/environment/context.ctx --selector name=ibm-gran-b1f57d6a-instruct-ms apply -f /Users/vezio/data/inference-scheduling/setup/helm/llmdbench/helmfile-00.yaml --skip-diff-on-install --skip-schema-validation"
2025-09-22 10:06:23,771 - INFO - Comparing release=ibm-gran-b1f57d6a-instruct-ms, chart=llm-d-modelservice/llm-d-modelservice, namespace=vezio-llmd-bench
2025-09-22 10:06:23,771 - INFO - Adding repo llm-d-modelservice https://llm-d-incubation.github.io/llm-d-modelservice/
"llm-d-modelservice" has been added to your repositories
Adding repo llm-d-infra https://llm-d-incubation.github.io/llm-d-infra/
"llm-d-infra" has been added to your repositories
Listing releases matching ^ibm-gran-b1f57d6a-instruct-ms$
ibm-gran-b1f57d6a-instruct-ms vezio-llmd-bench 2 2025-09-19 16:12:38.073087 -0400 EDT deployedllm-d-modelservice-v0.2.9 v0.2.0
in /Users/vezio/data/inference-scheduling/setup/helm/llmdbench/helmfile-00.yaml: command "/Users/vezio/homebrew/bin/helm" exited with non-zero status:
PATH:
/Users/vezio/homebrew/bin/helm
ARGS:
0: helm (4 bytes)
1: --kubeconfig (12 bytes)
2: /Users/vezio/data/inference-scheduling/environment/context.ctx (62 bytes)
3: diff (4 bytes)
4: upgrade (7 bytes)
5: --allow-unreleased (18 bytes)
6: ibm-gran-b1f57d6a-instruct-ms (29 bytes)
7: llm-d-modelservice/llm-d-modelservice (37 bytes)
8: --version (9 bytes)
9: v0.2.9 (6 bytes)
10: --skip-schema-validation (24 bytes)
11: --namespace (11 bytes)
12: vezio-llmd-bench (16 bytes)
13: --values (8 bytes)
14: /var/folders/pj/7679q1l50672dgqrzp26fxdh0000gn/T/helmfile1913400668/vezio-llmd-bench-ibm-gran-b1f57d6a-instruct-ms-values-85686dd485 (132 bytes)
15: --reset-values (14 bytes)
16: --detailed-exitcode (19 bytes)
ERROR:
exit status 1
EXIT STATUS
1
STDERR:
Error: Failed to render chart: exit status 1: Error: failed to parse /var/folders/pj/7679q1l50672dgqrzp26fxdh0000gn/T/helmfile1913400668/vezio-llmd-bench-ibm-gran-b1f57d6a-instruct-ms-values-85686dd485: error converting YAML to JSON: yaml: line 82: did not find expected '-' indicator
Error: plugin "diff" exited with error
COMBINED OUTPUT:
Error: Failed to render chart: exit status 1: Error: failed to parse /var/folders/pj/7679q1l50672dgqrzp26fxdh0000gn/T/helmfile1913400668/vezio-llmd-bench-ibm-gran-b1f57d6a-instruct-ms-values-85686dd485: error converting YAML to JSON: yaml: line 82: did not find expected '-' indicator
Error: plugin "diff" exited with error
2025-09-22 10:06:23,772 - INFO - ❌ Failed to deploy helm chart for model ibm-granite/granite-3.3-2b-instruct
When no |
The problem appears to be in the generated
I'm not sure about line 82 as reported. My ide complains about line 83. I think line 87 is also suspect.
line 83/89 has different quoting. TBH, I am suspicious of line 87/93 and the quoting on 94. |
The problem lines lines 87 / 93-94 are caused by the |
This fix also seems to fix the problem with quoting that caused the helm errors. |
I take that back, I had The behavior is different from |
Same error continues despite @kalantar's related fix. |
@yossiovadia has the comment #323 (comment) been addressed? I think this is the last outstanding issue I have been waiting for before reviewing again. |
Resolve the quoting issue identified in comment llm-d#323 where Python generated: - "'{"kv_connector":"NixlConnector","kv_role":"kv_both"}'" (broken) Now correctly generates like bash implementation: - '{"kv_connector":"NixlConnector","kv_role":"kv_both"}' (working) - Detect arguments that already have single quotes (JSON strings) - Use them as-is without additional double quote wrapping - Only wrap regular arguments in double quotes Addresses the last outstanding reviewer concern before final approval.
- Remove extra spacing in filter_empty_resource function formatting - Implement option B for container ports by removing hardcoded ports and using add_config approach - Add conditional_volume_config function to skip empty volume/volumeMount fields entirely - Implement add_config_prep function to set proper defaults for empty configurations - Remove generate_ms_rules_yaml function and inline the logic for cleaner code - Remove unused imports (jinja2, yaml, render_string) to reduce complexity Co-Authored-By: Claude <[email protected]> Signed-off-by: Yossi Ovadia <[email protected]>
- Replace "#no____config" defaults with "{}" for extraConfig sections - Add conditional_extra_config function to skip empty extraConfig entirely - Add lstrip() calls throughout template to fix YAML indentation - Ensure proper YAML syntax and formatting for all config sections Addresses reviewer Michael's feedback on template structure and formatting. Signed-off-by: Yossi Ovadia <[email protected]>
…double indentation - Fixed conditional_extra_config function to properly add colon after extraConfig label - Added proper empty config checking before processing to avoid unnecessary sections - Fixed indentation structure to prevent double-indentation issues - Resolves issue where extraConfig sections appeared without colons (extraConfig\n containers:) - Now correctly generates: extraConfig:\n content... Co-Authored-By: Claude <[email protected]> Signed-off-by: Yossi Ovadia <[email protected]>
- Fix YAML args formatting: Convert from malformed backslash format to proper YAML list with quoted strings - Resolve Helm deployment error: 'cannot unmarshal number into Go struct field Container.args of type string' - Add accelerator resource auto-detection: Convert 'auto' to 'nvidia.com/gpu' - Improve resource section generation: Clean resource limits/requests without empty values - Fix CPU/memory fallback logic: Use common values when specific ones are empty - Remove duplicate check_storage_class function causing pykube object_factory errors - Add OpenShift L40S GPU scenario for testing modelservice deployment Tested on real OpenShift cluster with NVIDIA L40S GPUs. Full pipeline (steps 0-9) now completes successfully with Python step 09 implementation. Signed-off-by: Yossi Ovadia <[email protected]>
Resolve issue where custom scenarios with complex arguments (like JSON strings) caused 'did not find expected '-' indicator' YAML parsing errors. - Change args parsing to split on '____' delimiters instead of whitespace - Add proper quote escaping for arguments containing JSON or special characters - Preserve complex arguments like --kv-transfer-config with embedded JSON Fixes reviewer issue: Custom scenarios (-c) now work correctly. Signed-off-by: Yossi Ovadia <[email protected]>
Address reviewer feedback on malformed args output: - Fix concatenated arguments like '16000--tensor-parallel-size' - Resolve unclosed quote issues like '"4' - Clean up trailing backslashes and quotes from bash line continuations - Prevent empty arguments from being added to YAML Resolves specific issues identified in generated ms-values.yaml lines 87 and 94. Signed-off-by: Yossi Ovadia <[email protected]>
Incorporate changes from PR llm-d#357 that add infinite timeouts to HTTPRoute configurations: - Add timeouts section with backendRequest: 0s and request: 0s to both HTTPRoute rules - Prevents request timeouts during long-running model inference operations - Matches the bash implementation timeout behavior Resolves reviewer feedback about missing timeout sections in Python version. Signed-off-by: Yossi Ovadia <[email protected]>
Resolve the quoting issue identified in comment llm-d#323 where Python generated: - "'{"kv_connector":"NixlConnector","kv_role":"kv_both"}'" (broken) Now correctly generates like bash implementation: - '{"kv_connector":"NixlConnector","kv_role":"kv_both"}' (working) - Detect arguments that already have single quotes (JSON strings) - Use them as-is without additional double quote wrapping - Only wrap regular arguments in double quotes Addresses the last outstanding reviewer concern before final approval. Signed-off-by: Yossi Ovadia <[email protected]>
771b345
to
1ec90d6
Compare
So I'm not sure why it's not using the correct namespace? It appears to be using the default This could be unrelated - but I'm not sure why I would see this failures now? After speaking w/ @maugustosilva I believe this is my own user error based on a recent PR that was merged on some NS changes. Rerunning now! |
This looks pretty good based on:
I've seen other issues but looks unrelated to this PR. |
This converts the bash script setup/steps/09_deploy_via_modelservice.sh to Python as setup/steps/09_deploy_via_modelservice.py.
Key changes:
Fixes #269