Container image env var by maryamtahhan · Pull Request #94 · redhat-et/vllm-cpu-perf-eval

maryamtahhan · 2026-04-03T16:00:27Z

Summary by CodeRabbit

New Features
- Container images now configurable via environment variables, enabling flexible deployment customization without modifying core configurations.
Documentation
- Updated setup guide with new environment variable options for custom container image selection during testing.
Chores
- Enhanced test infrastructure with improved configuration flexibility and dynamic workload validation support.

Add support for configuring container images via environment variables: - VLLM_CONTAINER_IMAGE: vLLM server image - GUIDELLM_CONTAINER_IMAGE: GuideLLM benchmark tool image - VLLM_BENCH_CONTAINER_IMAGE: vLLM bench tool image All variables include sensible defaults matching current configuration, allowing users to easily override images without editing config files. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Add container image display to GuideLLM configuration output to match the vLLM server display, making it easier to verify which image is being used during test execution. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> Signed-off-by: Maryam Tahhan <mtahhan@redhat.com>

Replace hardcoded workload type validation with dynamic check against test_configs.keys(), matching the approach used in llm-benchmark-auto.yml. This allows users to add custom workloads to test-workloads.yml and automatically use them in concurrent load testing without modifying the playbook validation logic. Changes: - base_workload validation now checks test_configs.keys() - variable workload check now dynamic instead of hardcoded list - Updated documentation to reflect workload flexibility Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> Signed-off-by: Maryam Tahhan <mtahhan@redhat.com>

coderabbitai · 2026-04-03T16:00:43Z

📝 Walkthrough

Walkthrough

This PR introduces environment variable overrides for container images (VLLM_CONTAINER_IMAGE and GUIDELLM_CONTAINER_IMAGE) across Ansible configuration and documentation, enabling tests to specify custom container images while maintaining backward compatibility through default fallbacks. The workload validation in the benchmark playbook is also made dynamic instead of hardcoded.

Changes

Cohort / File(s)	Summary
Documentation `automation/test-execution/ansible/ansible.md`, `docs/getting-started.md`	Added environment variable exports for container image overrides (`VLLM_CONTAINER_IMAGE`, `GUIDELLM_CONTAINER_IMAGE`) with documented defaults.
Ansible Infrastructure Configuration `automation/test-execution/ansible/inventory/group_vars/all/infrastructure.yml`, `automation/test-execution/ansible/inventory/group_vars/all/benchmark-tools.yml`	Replaced hardcoded container image values with Jinja2 environment variable lookups that default to previous hardcoded values when variables are unset.
Ansible Playbook Logic `automation/test-execution/ansible/llm-benchmark-concurrent-load.yml`	Changed workload validation from a fixed static list (`chat`, `rag`, `code`, `summarization`) to dynamic validation against keys in `test_configs`, with improved error messaging that reports supported workloads.
Ansible Task Output `automation/test-execution/ansible/roles/benchmark_guidellm/tasks/main.yml`	Updated debug output to display the GuideLLM container image in container mode or `N/A` in host mode.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'Container image env var' accurately describes the main change: making container images configurable via environment variables across multiple files.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

automation/test-execution/ansible/inventory/group_vars/all/benchmark-tools.yml (1)

18-19: Consider using a versioned tag instead of latest for reproducibility.

Using latest tag can lead to non-reproducible benchmark results if the image is updated between runs. The role's internal default at line 46 uses v0.5.3, creating an inconsistency.

Suggested change for consistency

    # Using GuideLLM official container image
    # Can be overridden with environment variable: export GUIDELLM_CONTAINER_IMAGE=...
-   container_image: "{{ lookup('env', 'GUIDELLM_CONTAINER_IMAGE') | default('ghcr.io/vllm-project/guidellm:latest', true) }}"
+   container_image: "{{ lookup('env', 'GUIDELLM_CONTAINER_IMAGE') | default('ghcr.io/vllm-project/guidellm:v0.5.3', true) }}"

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In
`@automation/test-execution/ansible/inventory/group_vars/all/benchmark-tools.yml`
around lines 18 - 19, The container_image variable currently defaults to the
unpinned 'ghcr.io/vllm-project/guidellm:latest', which harms reproducibility;
update the default in the container_image definition (while keeping the
GUIDELLM_CONTAINER_IMAGE env lookup override) to use the versioned tag used by
the role (e.g., 'ghcr.io/vllm-project/guidellm:v0.5.3') so container_image and
the role default are consistent and benchmark runs are reproducible.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@automation/test-execution/ansible/llm-benchmark-concurrent-load.yml`:
- Around line 94-97: The validation currently allows any key from test_configs
(via base_workload in test_configs.keys()) which lets invalid types through;
change it to only accept true base workloads (e.g., restrict to ['chat','code'])
by replacing the generic membership test with an explicit allowed list or by
deriving allowed_base_workloads = ['chat','code'] and checking base_workload
against that; also update the fail_msg to list those allowed base workloads and
ensure the later variable-workload logic that references 'chat_var' and
'code_var' remains consistent with this restriction.

---

Nitpick comments:
In
`@automation/test-execution/ansible/inventory/group_vars/all/benchmark-tools.yml`:
- Around line 18-19: The container_image variable currently defaults to the
unpinned 'ghcr.io/vllm-project/guidellm:latest', which harms reproducibility;
update the default in the container_image definition (while keeping the
GUIDELLM_CONTAINER_IMAGE env lookup override) to use the versioned tag used by
the role (e.g., 'ghcr.io/vllm-project/guidellm:v0.5.3') so container_image and
the role default are consistent and benchmark runs are reproducible.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 684522b9-f263-44b4-b2e8-f6a2147363b8

📥 Commits

Reviewing files that changed from the base of the PR and between c991528 and a995583.

📒 Files selected for processing (6)

automation/test-execution/ansible/ansible.md
automation/test-execution/ansible/inventory/group_vars/all/benchmark-tools.yml
automation/test-execution/ansible/inventory/group_vars/all/infrastructure.yml
automation/test-execution/ansible/llm-benchmark-concurrent-load.yml
automation/test-execution/ansible/roles/benchmark_guidellm/tasks/main.yml
docs/getting-started.md

coderabbitai · 2026-04-03T16:04:29Z

automation/test-execution/ansible/llm-benchmark-concurrent-load.yml

+          - base_workload in test_configs.keys()
+        fail_msg: |
+          Unsupported base_workload: {{ base_workload }}
+          Supported workloads: {{ test_configs.keys() | list | join(', ') }}


⚠️ Potential issue | 🟡 Minor

Validation may accept inappropriate workloads for this playbook.

The dynamic validation accepts all test_configs keys including embedding, baseline, production, chat_var, and code_var. However, this playbook is designed for 3-phase concurrent load testing where:

embedding uses a different backend and won't work with GuideLLM

chat_var/code_var are variable workloads, not base workloads

baseline/production appear to be caching mode configs, not workload types

Line 101 still hardcodes the variable workload check to ['chat_var', 'code_var'], suggesting only chat and code are truly supported as base workloads.

Consider restricting to appropriate base workloads

- name: Validate base_workload parameter ansible.builtin.assert: that: - - base_workload in test_configs.keys() + - base_workload in ['chat', 'rag', 'code', 'summarization', 'reasoning'] fail_msg: | Unsupported base_workload: {{ base_workload }} - Supported workloads: {{ test_configs.keys() | list | join(', ') }} + Supported base workloads for concurrent load testing: chat, rag, code, summarization, reasoning

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

- base_workload in test_configs.keys()

fail_msg: |

Unsupported base_workload: {{ base_workload }}

Supported workloads: {{ test_configs.keys() | list | join(', ') }}

- base_workload in ['chat', 'rag', 'code', 'summarization', 'reasoning']

fail_msg: |

Unsupported base_workload: {{ base_workload }}

Supported base workloads for concurrent load testing: chat, rag, code, summarization, reasoning

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@automation/test-execution/ansible/llm-benchmark-concurrent-load.yml` around lines 94 - 97, The validation currently allows any key from test_configs (via base_workload in test_configs.keys()) which lets invalid types through; change it to only accept true base workloads (e.g., restrict to ['chat','code']) by replacing the generic membership test with an explicit allowed list or by deriving allowed_base_workloads = ['chat','code'] and checking base_workload against that; also update the fail_msg to list those allowed base workloads and ensure the later variable-workload logic that references 'chat_var' and 'code_var' remains consistent with this restriction.

jharriga · 2026-04-03T18:26:09Z

I reviewed this PR in two steps

"-e $IMAGE" support
Using this syntax I had success
for VLLM_IMAGE in "${image_array[@]}"; do
-e "VLLM_CONTAINER_IMAGE={'image': '${VLLM_IMAGE}'}"
New workload_type support
I added a new workload_type to: automation/test-execution/ansible/inventory/group_vars/all/test-workloads.yml

THEN use of this syntax resulted in failure
llm-benchmark-concurrent-load.yml \ -e "base_workload=chat_lite" \

TASK [vllm_server : Validate workload type] ************************************ fatal: [vllm-server]: FAILED! => { "assertion": "workload_type in ['summarization', 'chat', 'code', 'rag', 'embedding', 'chat_var', 'code_var']", "changed": false, "evaluated_to": false, "msg": "Invalid workload_type 'chat_lite'. Must be one of: summarization, chat, code, rag, embedding, chat_var, code_var" }

NOTE that I made no edits to vllm-cpu-perf-eval/automation/test-execution/ansible/llm-benchmark-concurrent-load.yml:94

maryamtahhan · 2026-04-06T04:25:38Z

I reviewed this PR in two steps

"-e $IMAGE" support
Using this syntax I had success
for VLLM_IMAGE in "${image_array[@]}"; do
-e "VLLM_CONTAINER_IMAGE={'image': '${VLLM_IMAGE}'}"

New workload_type support
I added a new workload_type to: automation/test-execution/ansible/inventory/group_vars/all/test-workloads.yml

THEN use of this syntax resulted in failure llm-benchmark-concurrent-load.yml \ -e "base_workload=chat_lite" \

TASK [vllm_server : Validate workload type] ************************************ fatal: [vllm-server]: FAILED! => { "assertion": "workload_type in ['summarization', 'chat', 'code', 'rag', 'embedding', 'chat_var', 'code_var']", "changed": false, "evaluated_to": false, "msg": "Invalid workload_type 'chat_lite'. Must be one of: summarization, chat, code, rag, embedding, chat_var, code_var" }

NOTE that I made no edits to vllm-cpu-perf-eval/automation/test-execution/ansible/llm-benchmark-concurrent-load.yml:94

Hi John, I think I saw something similar when I just added the workload to the end of the file. But it needs to be in the test_configs section. Was your new definition in that section?

jharriga · 2026-04-06T14:50:41Z

yes, I added it in test_configs using this syntax $ vi automation/test-execution/ansible/inventory/group_vars/all/test-workloads.yml test_configs: # NEW Chat-lite workload chat_lite: workload_type: "chat_lite" isl: 256 # Standard input: user query osl: 128 # Standard output: assistant response variability: false backend: "openai-chat" vllm_args: - "--dtype=auto" # Fallback dtype - overridden by model-specific dtype - "--no-enable-prefix-caching" # Baseline mode: no prefix caching - "--max-model-len=2048" # Limit model context to workload needs (2x total tokens for headroom) kv_cache_space: "40GiB" # Fallback value - should be overridden by model-specific kv_cache_sizes Eventually I hijacked 'chat_var', reduced the ISL and OSL values and ran a test using that workload-type. # NEW Chat-lite workload chat_var: workload_type: "chat_var" isl: 256 # Standard input: user query osl: 128 # Standard output: assistant response variability: false backend: "openai-chat" vllm_args: - "--dtype=auto" # Fallback dtype - overridden by model-specific dtype - "--no-enable-prefix-caching" # Baseline mode: no prefix caching - "--max-model-len=2048" # Limit model context to workload needs (2x total tokens for headroom) kv_cache_space: "40GiB" # Fallback value - should be overridden by model-specific kv_cache_sizes - John

…

On Mon, Apr 6, 2026 at 12:26 AM Maryam Tahhan ***@***.***> wrote: *maryamtahhan* left a comment (redhat-et/vllm-cpu-perf-eval#94) <#94 (comment)> I reviewed this PR in two steps 1. "-e $IMAGE" support Using this syntax I had success for VLLM_IMAGE in "${image_array[@]}"; do -e "VLLM_CONTAINER_IMAGE={'image': '${VLLM_IMAGE}'}" 2. New workload_type support I added a new workload_type to: automation/test-execution/ansible/inventory/group_vars/all/test-workloads.yml THEN use of this syntax resulted in failure llm-benchmark-concurrent-load.yml \ -e "base_workload=chat_lite" \ TASK [vllm_server : Validate workload type] ************************************ fatal: [vllm-server]: FAILED! => { "assertion": "workload_type in ['summarization', 'chat', 'code', 'rag', 'embedding', 'chat_var', 'code_var']", "changed": false, "evaluated_to": false, "msg": "Invalid workload_type 'chat_lite'. Must be one of: summarization, chat, code, rag, embedding, chat_var, code_var" } NOTE that I made no edits to vllm-cpu-perf-eval/automation/test-execution/ansible/llm-benchmark-concurrent-load.yml:94 Hi John, I think I saw something similar when I just added the workload to the end of the file. But it needs to be in the test_configs section. Was your new definition in that section? — Reply to this email directly, view it on GitHub <#94 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AFNXSAWC3FJQS5SMHEKZVI34UMWVRAVCNFSM6AAAAACXL5U7COVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHM2DCOJQGI4TCMRSHA> . You are receiving this because your review was requested.Message ID: ***@***.***>

maryamtahhan and others added 3 commits April 3, 2026 16:43

maryamtahhan requested a review from jharriga April 3, 2026 16:00

coderabbitai bot reviewed Apr 3, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Container image env var#94

Container image env var#94
maryamtahhan wants to merge 3 commits intoredhat-et:mainfrom
maryamtahhan:container-image-env-var

maryamtahhan commented Apr 3, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Apr 3, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Apr 3, 2026

Uh oh!

jharriga commented Apr 3, 2026

Uh oh!

maryamtahhan commented Apr 6, 2026

Uh oh!

jharriga commented Apr 6, 2026 via email

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

maryamtahhan commented Apr 3, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

jharriga commented Apr 3, 2026

Uh oh!

maryamtahhan commented Apr 6, 2026

Uh oh!

jharriga commented Apr 6, 2026 via email

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

maryamtahhan commented Apr 3, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Apr 3, 2026 •

edited

Loading