Conversation
Add support for configuring container images via environment variables: - VLLM_CONTAINER_IMAGE: vLLM server image - GUIDELLM_CONTAINER_IMAGE: GuideLLM benchmark tool image - VLLM_BENCH_CONTAINER_IMAGE: vLLM bench tool image All variables include sensible defaults matching current configuration, allowing users to easily override images without editing config files. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Add container image display to GuideLLM configuration output to match the vLLM server display, making it easier to verify which image is being used during test execution. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> Signed-off-by: Maryam Tahhan <mtahhan@redhat.com>
Replace hardcoded workload type validation with dynamic check against test_configs.keys(), matching the approach used in llm-benchmark-auto.yml. This allows users to add custom workloads to test-workloads.yml and automatically use them in concurrent load testing without modifying the playbook validation logic. Changes: - base_workload validation now checks test_configs.keys() - variable workload check now dynamic instead of hardcoded list - Updated documentation to reflect workload flexibility Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> Signed-off-by: Maryam Tahhan <mtahhan@redhat.com>
📝 WalkthroughWalkthroughThis PR introduces environment variable overrides for container images ( Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
automation/test-execution/ansible/inventory/group_vars/all/benchmark-tools.yml (1)
18-19: Consider using a versioned tag instead oflatestfor reproducibility.Using
latesttag can lead to non-reproducible benchmark results if the image is updated between runs. The role's internal default at line 46 usesv0.5.3, creating an inconsistency.Suggested change for consistency
# Using GuideLLM official container image # Can be overridden with environment variable: export GUIDELLM_CONTAINER_IMAGE=... - container_image: "{{ lookup('env', 'GUIDELLM_CONTAINER_IMAGE') | default('ghcr.io/vllm-project/guidellm:latest', true) }}" + container_image: "{{ lookup('env', 'GUIDELLM_CONTAINER_IMAGE') | default('ghcr.io/vllm-project/guidellm:v0.5.3', true) }}"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@automation/test-execution/ansible/inventory/group_vars/all/benchmark-tools.yml` around lines 18 - 19, The container_image variable currently defaults to the unpinned 'ghcr.io/vllm-project/guidellm:latest', which harms reproducibility; update the default in the container_image definition (while keeping the GUIDELLM_CONTAINER_IMAGE env lookup override) to use the versioned tag used by the role (e.g., 'ghcr.io/vllm-project/guidellm:v0.5.3') so container_image and the role default are consistent and benchmark runs are reproducible.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@automation/test-execution/ansible/llm-benchmark-concurrent-load.yml`:
- Around line 94-97: The validation currently allows any key from test_configs
(via base_workload in test_configs.keys()) which lets invalid types through;
change it to only accept true base workloads (e.g., restrict to ['chat','code'])
by replacing the generic membership test with an explicit allowed list or by
deriving allowed_base_workloads = ['chat','code'] and checking base_workload
against that; also update the fail_msg to list those allowed base workloads and
ensure the later variable-workload logic that references 'chat_var' and
'code_var' remains consistent with this restriction.
---
Nitpick comments:
In
`@automation/test-execution/ansible/inventory/group_vars/all/benchmark-tools.yml`:
- Around line 18-19: The container_image variable currently defaults to the
unpinned 'ghcr.io/vllm-project/guidellm:latest', which harms reproducibility;
update the default in the container_image definition (while keeping the
GUIDELLM_CONTAINER_IMAGE env lookup override) to use the versioned tag used by
the role (e.g., 'ghcr.io/vllm-project/guidellm:v0.5.3') so container_image and
the role default are consistent and benchmark runs are reproducible.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 684522b9-f263-44b4-b2e8-f6a2147363b8
📒 Files selected for processing (6)
automation/test-execution/ansible/ansible.mdautomation/test-execution/ansible/inventory/group_vars/all/benchmark-tools.ymlautomation/test-execution/ansible/inventory/group_vars/all/infrastructure.ymlautomation/test-execution/ansible/llm-benchmark-concurrent-load.ymlautomation/test-execution/ansible/roles/benchmark_guidellm/tasks/main.ymldocs/getting-started.md
| - base_workload in test_configs.keys() | ||
| fail_msg: | | ||
| Unsupported base_workload: {{ base_workload }} | ||
| Supported workloads: {{ test_configs.keys() | list | join(', ') }} |
There was a problem hiding this comment.
Validation may accept inappropriate workloads for this playbook.
The dynamic validation accepts all test_configs keys including embedding, baseline, production, chat_var, and code_var. However, this playbook is designed for 3-phase concurrent load testing where:
embeddinguses a different backend and won't work with GuideLLMchat_var/code_varare variable workloads, not base workloadsbaseline/productionappear to be caching mode configs, not workload types
Line 101 still hardcodes the variable workload check to ['chat_var', 'code_var'], suggesting only chat and code are truly supported as base workloads.
Consider restricting to appropriate base workloads
- name: Validate base_workload parameter
ansible.builtin.assert:
that:
- - base_workload in test_configs.keys()
+ - base_workload in ['chat', 'rag', 'code', 'summarization', 'reasoning']
fail_msg: |
Unsupported base_workload: {{ base_workload }}
- Supported workloads: {{ test_configs.keys() | list | join(', ') }}
+ Supported base workloads for concurrent load testing: chat, rag, code, summarization, reasoning📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| - base_workload in test_configs.keys() | |
| fail_msg: | | |
| Unsupported base_workload: {{ base_workload }} | |
| Supported workloads: {{ test_configs.keys() | list | join(', ') }} | |
| - base_workload in ['chat', 'rag', 'code', 'summarization', 'reasoning'] | |
| fail_msg: | | |
| Unsupported base_workload: {{ base_workload }} | |
| Supported base workloads for concurrent load testing: chat, rag, code, summarization, reasoning |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@automation/test-execution/ansible/llm-benchmark-concurrent-load.yml` around
lines 94 - 97, The validation currently allows any key from test_configs (via
base_workload in test_configs.keys()) which lets invalid types through; change
it to only accept true base workloads (e.g., restrict to ['chat','code']) by
replacing the generic membership test with an explicit allowed list or by
deriving allowed_base_workloads = ['chat','code'] and checking base_workload
against that; also update the fail_msg to list those allowed base workloads and
ensure the later variable-workload logic that references 'chat_var' and
'code_var' remains consistent with this restriction.
|
I reviewed this PR in two steps
THEN use of this syntax resulted in failure
NOTE that I made no edits to vllm-cpu-perf-eval/automation/test-execution/ansible/llm-benchmark-concurrent-load.yml:94 |
Hi John, I think I saw something similar when I just added the workload to the end of the file. But it needs to be in the test_configs section. Was your new definition in that section? |
|
yes, I added it in test_configs using this syntax
$ vi
automation/test-execution/ansible/inventory/group_vars/all/test-workloads.yml
test_configs:
# NEW Chat-lite workload
chat_lite:
workload_type: "chat_lite"
isl: 256 # Standard input: user query
osl: 128 # Standard output: assistant
response
variability: false
backend: "openai-chat"
vllm_args:
- "--dtype=auto" # Fallback dtype - overridden by
model-specific dtype
- "--no-enable-prefix-caching" # Baseline mode: no prefix caching
- "--max-model-len=2048" # Limit model context to workload
needs (2x total tokens for headroom)
kv_cache_space: "40GiB" # Fallback value - should be
overridden by model-specific kv_cache_sizes
Eventually I hijacked 'chat_var', reduced the ISL and OSL values and ran a
test using that
workload-type.
# NEW Chat-lite workload
chat_var:
workload_type: "chat_var"
isl: 256 # Standard input: user query
osl: 128 # Standard output: assistant
response
variability: false
backend: "openai-chat"
vllm_args:
- "--dtype=auto" # Fallback dtype - overridden by
model-specific dtype
- "--no-enable-prefix-caching" # Baseline mode: no prefix caching
- "--max-model-len=2048" # Limit model context to workload
needs (2x total tokens for headroom)
kv_cache_space: "40GiB" # Fallback value - should be
overridden by model-specific kv_cache_sizes
- John
…On Mon, Apr 6, 2026 at 12:26 AM Maryam Tahhan ***@***.***> wrote:
*maryamtahhan* left a comment (redhat-et/vllm-cpu-perf-eval#94)
<#94 (comment)>
I reviewed this PR in two steps
1. "-e $IMAGE" support
Using this syntax I had success
for VLLM_IMAGE in "${image_array[@]}"; do
-e "VLLM_CONTAINER_IMAGE={'image': '${VLLM_IMAGE}'}"
2. New workload_type support
I added a new workload_type to:
automation/test-execution/ansible/inventory/group_vars/all/test-workloads.yml
THEN use of this syntax resulted in failure llm-benchmark-concurrent-load.yml
\ -e "base_workload=chat_lite" \
TASK [vllm_server : Validate workload type]
************************************ fatal: [vllm-server]: FAILED! => {
"assertion": "workload_type in ['summarization', 'chat', 'code', 'rag',
'embedding', 'chat_var', 'code_var']", "changed": false, "evaluated_to":
false, "msg": "Invalid workload_type 'chat_lite'. Must be one of:
summarization, chat, code, rag, embedding, chat_var, code_var" }
NOTE that I made no edits to
vllm-cpu-perf-eval/automation/test-execution/ansible/llm-benchmark-concurrent-load.yml:94
Hi John, I think I saw something similar when I just added the workload to
the end of the file. But it needs to be in the test_configs section. Was
your new definition in that section?
—
Reply to this email directly, view it on GitHub
<#94 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AFNXSAWC3FJQS5SMHEKZVI34UMWVRAVCNFSM6AAAAACXL5U7COVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHM2DCOJQGI4TCMRSHA>
.
You are receiving this because your review was requested.Message ID:
***@***.***>
|
Summary by CodeRabbit
New Features
Documentation
Chores