Skip to content

Container image env var#94

Open
maryamtahhan wants to merge 3 commits intoredhat-et:mainfrom
maryamtahhan:container-image-env-var
Open

Container image env var#94
maryamtahhan wants to merge 3 commits intoredhat-et:mainfrom
maryamtahhan:container-image-env-var

Conversation

@maryamtahhan
Copy link
Copy Markdown
Collaborator

@maryamtahhan maryamtahhan commented Apr 3, 2026

Summary by CodeRabbit

  • New Features

    • Container images now configurable via environment variables, enabling flexible deployment customization without modifying core configurations.
  • Documentation

    • Updated setup guide with new environment variable options for custom container image selection during testing.
  • Chores

    • Enhanced test infrastructure with improved configuration flexibility and dynamic workload validation support.

maryamtahhan and others added 3 commits April 3, 2026 16:43
Add support for configuring container images via environment variables:
- VLLM_CONTAINER_IMAGE: vLLM server image
- GUIDELLM_CONTAINER_IMAGE: GuideLLM benchmark tool image
- VLLM_BENCH_CONTAINER_IMAGE: vLLM bench tool image

All variables include sensible defaults matching current configuration,
allowing users to easily override images without editing config files.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Add container image display to GuideLLM configuration output to match
the vLLM server display, making it easier to verify which image is
being used during test execution.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Signed-off-by: Maryam Tahhan <mtahhan@redhat.com>
Replace hardcoded workload type validation with dynamic check against
test_configs.keys(), matching the approach used in llm-benchmark-auto.yml.

This allows users to add custom workloads to test-workloads.yml and
automatically use them in concurrent load testing without modifying
the playbook validation logic.

Changes:
- base_workload validation now checks test_configs.keys()
- variable workload check now dynamic instead of hardcoded list
- Updated documentation to reflect workload flexibility

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Signed-off-by: Maryam Tahhan <mtahhan@redhat.com>
@maryamtahhan maryamtahhan requested a review from jharriga April 3, 2026 16:00
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 3, 2026

📝 Walkthrough

Walkthrough

This PR introduces environment variable overrides for container images (VLLM_CONTAINER_IMAGE and GUIDELLM_CONTAINER_IMAGE) across Ansible configuration and documentation, enabling tests to specify custom container images while maintaining backward compatibility through default fallbacks. The workload validation in the benchmark playbook is also made dynamic instead of hardcoded.

Changes

Cohort / File(s) Summary
Documentation
automation/test-execution/ansible/ansible.md, docs/getting-started.md
Added environment variable exports for container image overrides (VLLM_CONTAINER_IMAGE, GUIDELLM_CONTAINER_IMAGE) with documented defaults.
Ansible Infrastructure Configuration
automation/test-execution/ansible/inventory/group_vars/all/infrastructure.yml, automation/test-execution/ansible/inventory/group_vars/all/benchmark-tools.yml
Replaced hardcoded container image values with Jinja2 environment variable lookups that default to previous hardcoded values when variables are unset.
Ansible Playbook Logic
automation/test-execution/ansible/llm-benchmark-concurrent-load.yml
Changed workload validation from a fixed static list (chat, rag, code, summarization) to dynamic validation against keys in test_configs, with improved error messaging that reports supported workloads.
Ansible Task Output
automation/test-execution/ansible/roles/benchmark_guidellm/tasks/main.yml
Updated debug output to display the GuideLLM container image in container mode or N/A in host mode.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Container image env var' accurately describes the main change: making container images configurable via environment variables across multiple files.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
automation/test-execution/ansible/inventory/group_vars/all/benchmark-tools.yml (1)

18-19: Consider using a versioned tag instead of latest for reproducibility.

Using latest tag can lead to non-reproducible benchmark results if the image is updated between runs. The role's internal default at line 46 uses v0.5.3, creating an inconsistency.

Suggested change for consistency
    # Using GuideLLM official container image
    # Can be overridden with environment variable: export GUIDELLM_CONTAINER_IMAGE=...
-   container_image: "{{ lookup('env', 'GUIDELLM_CONTAINER_IMAGE') | default('ghcr.io/vllm-project/guidellm:latest', true) }}"
+   container_image: "{{ lookup('env', 'GUIDELLM_CONTAINER_IMAGE') | default('ghcr.io/vllm-project/guidellm:v0.5.3', true) }}"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@automation/test-execution/ansible/inventory/group_vars/all/benchmark-tools.yml`
around lines 18 - 19, The container_image variable currently defaults to the
unpinned 'ghcr.io/vllm-project/guidellm:latest', which harms reproducibility;
update the default in the container_image definition (while keeping the
GUIDELLM_CONTAINER_IMAGE env lookup override) to use the versioned tag used by
the role (e.g., 'ghcr.io/vllm-project/guidellm:v0.5.3') so container_image and
the role default are consistent and benchmark runs are reproducible.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@automation/test-execution/ansible/llm-benchmark-concurrent-load.yml`:
- Around line 94-97: The validation currently allows any key from test_configs
(via base_workload in test_configs.keys()) which lets invalid types through;
change it to only accept true base workloads (e.g., restrict to ['chat','code'])
by replacing the generic membership test with an explicit allowed list or by
deriving allowed_base_workloads = ['chat','code'] and checking base_workload
against that; also update the fail_msg to list those allowed base workloads and
ensure the later variable-workload logic that references 'chat_var' and
'code_var' remains consistent with this restriction.

---

Nitpick comments:
In
`@automation/test-execution/ansible/inventory/group_vars/all/benchmark-tools.yml`:
- Around line 18-19: The container_image variable currently defaults to the
unpinned 'ghcr.io/vllm-project/guidellm:latest', which harms reproducibility;
update the default in the container_image definition (while keeping the
GUIDELLM_CONTAINER_IMAGE env lookup override) to use the versioned tag used by
the role (e.g., 'ghcr.io/vllm-project/guidellm:v0.5.3') so container_image and
the role default are consistent and benchmark runs are reproducible.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 684522b9-f263-44b4-b2e8-f6a2147363b8

📥 Commits

Reviewing files that changed from the base of the PR and between c991528 and a995583.

📒 Files selected for processing (6)
  • automation/test-execution/ansible/ansible.md
  • automation/test-execution/ansible/inventory/group_vars/all/benchmark-tools.yml
  • automation/test-execution/ansible/inventory/group_vars/all/infrastructure.yml
  • automation/test-execution/ansible/llm-benchmark-concurrent-load.yml
  • automation/test-execution/ansible/roles/benchmark_guidellm/tasks/main.yml
  • docs/getting-started.md

Comment on lines +94 to +97
- base_workload in test_configs.keys()
fail_msg: |
Unsupported base_workload: {{ base_workload }}
Supported workloads: {{ test_configs.keys() | list | join(', ') }}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Validation may accept inappropriate workloads for this playbook.

The dynamic validation accepts all test_configs keys including embedding, baseline, production, chat_var, and code_var. However, this playbook is designed for 3-phase concurrent load testing where:

  • embedding uses a different backend and won't work with GuideLLM
  • chat_var/code_var are variable workloads, not base workloads
  • baseline/production appear to be caching mode configs, not workload types

Line 101 still hardcodes the variable workload check to ['chat_var', 'code_var'], suggesting only chat and code are truly supported as base workloads.

Consider restricting to appropriate base workloads
    - name: Validate base_workload parameter
      ansible.builtin.assert:
        that:
-         - base_workload in test_configs.keys()
+         - base_workload in ['chat', 'rag', 'code', 'summarization', 'reasoning']
        fail_msg: |
          Unsupported base_workload: {{ base_workload }}
-         Supported workloads: {{ test_configs.keys() | list | join(', ') }}
+         Supported base workloads for concurrent load testing: chat, rag, code, summarization, reasoning
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
- base_workload in test_configs.keys()
fail_msg: |
Unsupported base_workload: {{ base_workload }}
Supported workloads: {{ test_configs.keys() | list | join(', ') }}
- base_workload in ['chat', 'rag', 'code', 'summarization', 'reasoning']
fail_msg: |
Unsupported base_workload: {{ base_workload }}
Supported base workloads for concurrent load testing: chat, rag, code, summarization, reasoning
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@automation/test-execution/ansible/llm-benchmark-concurrent-load.yml` around
lines 94 - 97, The validation currently allows any key from test_configs (via
base_workload in test_configs.keys()) which lets invalid types through; change
it to only accept true base workloads (e.g., restrict to ['chat','code']) by
replacing the generic membership test with an explicit allowed list or by
deriving allowed_base_workloads = ['chat','code'] and checking base_workload
against that; also update the fail_msg to list those allowed base workloads and
ensure the later variable-workload logic that references 'chat_var' and
'code_var' remains consistent with this restriction.

@jharriga
Copy link
Copy Markdown
Collaborator

jharriga commented Apr 3, 2026

I reviewed this PR in two steps

  1. "-e $IMAGE" support
    Using this syntax I had success
    for VLLM_IMAGE in "${image_array[@]}"; do
    -e "VLLM_CONTAINER_IMAGE={'image': '${VLLM_IMAGE}'}"

  2. New workload_type support
    I added a new workload_type to: automation/test-execution/ansible/inventory/group_vars/all/test-workloads.yml

THEN use of this syntax resulted in failure
llm-benchmark-concurrent-load.yml \ -e "base_workload=chat_lite" \

TASK [vllm_server : Validate workload type] ************************************ fatal: [vllm-server]: FAILED! => { "assertion": "workload_type in ['summarization', 'chat', 'code', 'rag', 'embedding', 'chat_var', 'code_var']", "changed": false, "evaluated_to": false, "msg": "Invalid workload_type 'chat_lite'. Must be one of: summarization, chat, code, rag, embedding, chat_var, code_var" }

NOTE that I made no edits to vllm-cpu-perf-eval/automation/test-execution/ansible/llm-benchmark-concurrent-load.yml:94

@maryamtahhan
Copy link
Copy Markdown
Collaborator Author

I reviewed this PR in two steps

  1. "-e $IMAGE" support
    Using this syntax I had success
    for VLLM_IMAGE in "${image_array[@]}"; do
    -e "VLLM_CONTAINER_IMAGE={'image': '${VLLM_IMAGE}'}"
  2. New workload_type support
    I added a new workload_type to: automation/test-execution/ansible/inventory/group_vars/all/test-workloads.yml

THEN use of this syntax resulted in failure llm-benchmark-concurrent-load.yml \ -e "base_workload=chat_lite" \

TASK [vllm_server : Validate workload type] ************************************ fatal: [vllm-server]: FAILED! => { "assertion": "workload_type in ['summarization', 'chat', 'code', 'rag', 'embedding', 'chat_var', 'code_var']", "changed": false, "evaluated_to": false, "msg": "Invalid workload_type 'chat_lite'. Must be one of: summarization, chat, code, rag, embedding, chat_var, code_var" }

NOTE that I made no edits to vllm-cpu-perf-eval/automation/test-execution/ansible/llm-benchmark-concurrent-load.yml:94

Hi John, I think I saw something similar when I just added the workload to the end of the file. But it needs to be in the test_configs section. Was your new definition in that section?

@jharriga
Copy link
Copy Markdown
Collaborator

jharriga commented Apr 6, 2026 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants