Conversation
|
Warning Rate limit exceeded
Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 0 minutes and 47 seconds. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (3)
📝 WalkthroughWalkthroughAdds external vLLM endpoint support: environment-driven endpoint configuration, playbook mode detection and optional endpoint model auto-detection via /v1/models, conditional /metrics probing for server metrics, dashboard filtering for deployment mode, and updated docs/tests for external-endpoint workflows. Changes
Sequence Diagram(s)sequenceDiagram
actor User
participant Playbook as Ansible Playbook
participant ExtEndpoint as External Endpoint
participant ModelsAPI as /v1/models
participant BenchTask as Benchmark Task
User->>Playbook: Run with VLLM_ENDPOINT_MODE=external\nVLLM_ENDPOINT_URL=<endpoint>
Playbook->>Playbook: Detect vllm_mode = 'external'
Playbook->>Playbook: Validate endpoint URL & workload_type
alt test_model not provided
Playbook->>ExtEndpoint: GET /v1/models
ExtEndpoint->>ModelsAPI: Query available models
ModelsAPI-->>Playbook: Return model list
Playbook->>Playbook: Auto-detect endpoint_model -> actual_model
else test_model provided
Playbook->>Playbook: Validate test_model matches endpoint model
end
Playbook->>ExtEndpoint: Probe /metrics (optional)
Playbook->>BenchTask: Execute benchmark using actual_model
BenchTask->>ExtEndpoint: Send load (no vLLM container management)
ExtEndpoint-->>BenchTask: Return responses
BenchTask-->>Playbook: Collect client metrics
Playbook->>Playbook: Store results with vllm_mode=external
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
automation/test-execution/ansible/filter_plugins/cpu_utils.py (1)
599-606:⚠️ Potential issue | 🟠 MajorPotential silent under-allocation when physical_cpus_list is shorter than cores_per_node.
If
physical_cpus_listcontains fewer CPUs thancores_per_node, the slice on line 602 silently returns a shorter list. The downstream validation instart-llm.yml(lines 75-127 in the context snippet) only checks format and range—not that the allocated CPU count matchesrequested_cores. This could result in containers running with fewer cores than requested without any error.Consider adding a length check after slicing:
🛡️ Proposed fix to validate allocated CPU count
# Take first N physical cores allocated_cpus = [int(cpu.strip()) for cpu in physical_cpus_list[:cores_per_node]] + + if len(allocated_cpus) < cores_per_node: + raise AnsibleFilterError( + f"Node {node['id']} has only {len(allocated_cpus)} CPUs in list, " + f"but {cores_per_node} cores requested per node" + ) # Convert to range format cpu_range = cpu_list_to_range(allocated_cpus)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@automation/test-execution/ansible/filter_plugins/cpu_utils.py` around lines 599 - 606, The slice of physical_cpus_list into allocated_cpus can silently produce fewer CPUs than requested (cores_per_node); after computing allocated_cpus (and before converting with cpu_list_to_range) add a validation that len(allocated_cpus) == cores_per_node and if not, raise a clear exception or return an error (including the values of physical_cpus_list and cores_per_node) so callers (e.g., start-llm.yml) cannot proceed with under-allocation; reference allocated_cpus, physical_cpus_list, cores_per_node, and cpu_list_to_range when adding this check and error handling.
🧹 Nitpick comments (2)
automation/test-execution/ansible/tests/unit/test_cpu_utils.py (1)
21-32: Mockpytest.raisesdoesn't verify exception types when pytest is unavailable.The fallback
pytest.raisescontext manager doesn't actually catch or verify exceptions—it just returnsFalsefrom__exit__. This means exception-testing methods won't run in the non-pytest fallback path (which is fine since__main__only runs smoke tests), but if someone tries to run the full test class without pytest, tests will fail rather than skip.This is acceptable for the current design since the non-pytest path explicitly runs only a subset of smoke tests.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@automation/test-execution/ansible/tests/unit/test_cpu_utils.py` around lines 21 - 32, The fallback pytest.raises context manager (class pytest -> class raises) doesn't verify or suppress exceptions because __exit__ always returns False; update raises.__exit__ to inspect the exception tuple passed in, check whether the exception type matches self.exc, and return True to suppress the exception when it matches (otherwise return False so it propagates). Ensure __init__ still stores the expected exception (self.exc) and __enter__ returns self; reference the raises class and its __exit__/__init__/__enter__ methods when making the change.automation/test-execution/ansible/llm-benchmark-auto.yml (1)
155-157: Redundantvllm_modeassignment.
vllm_modeis already set at lines 24-26. This second assignment at line 155-157 is identical and redundant.♻️ Proposed fix to remove redundant assignment
- - name: Set vLLM mode - ansible.builtin.set_fact: - vllm_mode: "{{ vllm_endpoint.mode | default('managed') }}" - - name: Set fallback core configuration for external mode🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@automation/test-execution/ansible/llm-benchmark-auto.yml` around lines 155 - 157, Remove the redundant ansible task "Set vLLM mode" that sets the vllm_mode fact (vllm_mode: "{{ vllm_endpoint.mode | default('managed') }}") because the same fact is already defined earlier; locate the duplicate task by the task name "Set vLLM mode" or the variable vllm_mode and delete the second occurrence (lines around the repeated set_fact) so only the initial assignment remains, ensuring no other tasks rely specifically on the later duplicate.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@automation/test-execution/ansible/llm-benchmark-auto.yml`:
- Around line 105-140: The external endpoint URL (vllm_endpoint.external.url)
must be validated before attempting to call the "Get model from external
endpoint" task; add an early guard (e.g., an ansible.builtin.assert or
ansible.builtin.fail task run on localhost when vllm_mode == 'external') that
checks vllm_endpoint.external.url is defined and not null/empty and returns a
clear error message if invalid, then proceed with the existing model-detection
block that sets endpoint_model and optionally sets/validates test_model.
In `@automation/test-execution/ansible/llm-benchmark-concurrent-load.yml`:
- Around line 63-64: The validation allows base_workload value 'short_codegen'
but no corresponding workload is defined, causing lookup failures; either add a
'short_codegen' entry to the test workload definitions or remove it from the
allowed list. Update test-workloads.yml (the test_configs mapping) to include a
'short_codegen' workload key with the same required fields/structure as other
workloads (e.g., name, params, steps) or delete 'short_codegen' from the
base_workload validation in llm-benchmark-concurrent-load.yml (and mirror the
change in start-llm.yml) so the validator and test_configs remain consistent.
Ensure the key is exactly 'short_codegen' to match the validation.
In `@automation/test-execution/ansible/roles/vllm_server/tasks/start-llm.yml`:
- Around line 55-59: The assertion allowing 'short_codegen' in the workload_type
validation is inconsistent with the available test_configs and causes a KeyError
when test_configs[workload_type] is accessed; either remove 'short_codegen' from
the validation list in the Validate workload type assertion (update the list in
the ansible task that references workload_type) or add a corresponding
'short_codegen' entry to test-workloads.yml under test_configs so that
test_configs contains a key for workload_type; update the same symbols
workload_type and test_configs accordingly to keep them in sync.
In `@collections/requirements.yml`:
- Around line 6-10: Replace the permissive version ranges for the Ansible
collections with exact pinned versions to ensure reproducible AWX/CI runs:
update the collections entry for containers.podman to pin version "1.9.4"
(instead of ">=1.9.0,<2.0.0") and update ansible.posix to pin version "1.5.4"
(instead of ">=1.4.0,<1.6.0"); ensure the values for the name keys
containers.podman and ansible.posix are changed accordingly so future runs use
the exact, stable releases.
---
Outside diff comments:
In `@automation/test-execution/ansible/filter_plugins/cpu_utils.py`:
- Around line 599-606: The slice of physical_cpus_list into allocated_cpus can
silently produce fewer CPUs than requested (cores_per_node); after computing
allocated_cpus (and before converting with cpu_list_to_range) add a validation
that len(allocated_cpus) == cores_per_node and if not, raise a clear exception
or return an error (including the values of physical_cpus_list and
cores_per_node) so callers (e.g., start-llm.yml) cannot proceed with
under-allocation; reference allocated_cpus, physical_cpus_list, cores_per_node,
and cpu_list_to_range when adding this check and error handling.
---
Nitpick comments:
In `@automation/test-execution/ansible/llm-benchmark-auto.yml`:
- Around line 155-157: Remove the redundant ansible task "Set vLLM mode" that
sets the vllm_mode fact (vllm_mode: "{{ vllm_endpoint.mode | default('managed')
}}") because the same fact is already defined earlier; locate the duplicate task
by the task name "Set vLLM mode" or the variable vllm_mode and delete the second
occurrence (lines around the repeated set_fact) so only the initial assignment
remains, ensuring no other tasks rely specifically on the later duplicate.
In `@automation/test-execution/ansible/tests/unit/test_cpu_utils.py`:
- Around line 21-32: The fallback pytest.raises context manager (class pytest ->
class raises) doesn't verify or suppress exceptions because __exit__ always
returns False; update raises.__exit__ to inspect the exception tuple passed in,
check whether the exception type matches self.exc, and return True to suppress
the exception when it matches (otherwise return False so it propagates). Ensure
__init__ still stores the expected exception (self.exc) and __enter__ returns
self; reference the raises class and its __exit__/__init__/__enter__ methods
when making the change.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: c0be321b-7c7d-4135-ac0b-fad97257545a
📒 Files selected for processing (17)
automation/test-execution/ansible/ansible.mdautomation/test-execution/ansible/filter_plugins/cpu_utils.pyautomation/test-execution/ansible/inventory/README.mdautomation/test-execution/ansible/inventory/group_vars/all/endpoints.ymlautomation/test-execution/ansible/inventory/group_vars/all/infrastructure.ymlautomation/test-execution/ansible/llm-benchmark-auto.ymlautomation/test-execution/ansible/llm-benchmark-concurrent-load.ymlautomation/test-execution/ansible/llm-benchmark.ymlautomation/test-execution/ansible/llm-core-sweep-auto.ymlautomation/test-execution/ansible/roles/benchmark_guidellm/tasks/main.ymlautomation/test-execution/ansible/roles/common/tasks/allocate-cores-from-count.ymlautomation/test-execution/ansible/roles/common/tasks/detect-numa-topology.ymlautomation/test-execution/ansible/roles/vllm_server/tasks/start-embedding.ymlautomation/test-execution/ansible/roles/vllm_server/tasks/start-llm.ymlautomation/test-execution/ansible/tests/unit/test_cpu_utils.pycollections/requirements.ymltests/concurrent-load/concurrent-load.md
| - name: Handle external endpoint model detection/validation | ||
| when: vllm_mode == 'external' | ||
| block: | ||
| - name: Get model from external endpoint | ||
| ansible.builtin.uri: | ||
| url: "{{ vllm_endpoint.external.url }}/v1/models" | ||
| method: GET | ||
| status_code: 200 | ||
| register: external_models_response | ||
| delegate_to: localhost | ||
|
|
||
| - name: Extract model name from endpoint | ||
| ansible.builtin.set_fact: | ||
| endpoint_model: "{{ external_models_response.json.data[0].id }}" | ||
|
|
||
| - name: Use endpoint model if test_model not provided or set to auto-detect | ||
| ansible.builtin.set_fact: | ||
| test_model: "{{ endpoint_model }}" | ||
| when: test_model is not defined or test_model == '__AUTO_DETECT__' | ||
|
|
||
| - name: Validate test_model matches endpoint if provided | ||
| ansible.builtin.assert: | ||
| that: | ||
| - test_model == endpoint_model | ||
| fail_msg: | | ||
| Model mismatch detected! | ||
| Specified: {{ test_model }} | ||
| Endpoint serving: {{ endpoint_model }} | ||
|
|
||
| Either omit test_model to auto-detect, or ensure it matches the running model. | ||
| when: test_model is defined and test_model != '__AUTO_DETECT__' and test_model != endpoint_model | ||
|
|
||
| - name: Display model source | ||
| ansible.builtin.debug: | ||
| msg: "Detected model from endpoint: {{ test_model }}" | ||
|
|
There was a problem hiding this comment.
External endpoint URL should be validated before use.
The code queries the external endpoint at line 110 using vllm_endpoint.external.url before the URL is validated. The validation happens later at lines 253-259 on the load_generator host.
Per the context snippet from endpoints.yml, the URL defaults to Python None when not configured. This means if a user sets mode=external but forgets to set VLLM_ENDPOINT_URL, line 110 will attempt to connect to None/v1/models, resulting in a confusing connection error rather than a clear validation message.
Consider adding URL validation before the model detection block:
🛡️ Proposed fix to add early URL validation
- name: Handle external endpoint model detection/validation
when: vllm_mode == 'external'
block:
+ - name: Validate external endpoint URL is configured
+ ansible.builtin.assert:
+ that:
+ - vllm_endpoint.external.url is defined
+ - vllm_endpoint.external.url is not none
+ - vllm_endpoint.external.url | length > 0
+ fail_msg: |
+ External endpoint URL is required when mode=external.
+ Set via: export VLLM_ENDPOINT_URL=http://your-vllm-instance:8000
+
- name: Get model from external endpoint
ansible.builtin.uri:
url: "{{ vllm_endpoint.external.url }}/v1/models"📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| - name: Handle external endpoint model detection/validation | |
| when: vllm_mode == 'external' | |
| block: | |
| - name: Get model from external endpoint | |
| ansible.builtin.uri: | |
| url: "{{ vllm_endpoint.external.url }}/v1/models" | |
| method: GET | |
| status_code: 200 | |
| register: external_models_response | |
| delegate_to: localhost | |
| - name: Extract model name from endpoint | |
| ansible.builtin.set_fact: | |
| endpoint_model: "{{ external_models_response.json.data[0].id }}" | |
| - name: Use endpoint model if test_model not provided or set to auto-detect | |
| ansible.builtin.set_fact: | |
| test_model: "{{ endpoint_model }}" | |
| when: test_model is not defined or test_model == '__AUTO_DETECT__' | |
| - name: Validate test_model matches endpoint if provided | |
| ansible.builtin.assert: | |
| that: | |
| - test_model == endpoint_model | |
| fail_msg: | | |
| Model mismatch detected! | |
| Specified: {{ test_model }} | |
| Endpoint serving: {{ endpoint_model }} | |
| Either omit test_model to auto-detect, or ensure it matches the running model. | |
| when: test_model is defined and test_model != '__AUTO_DETECT__' and test_model != endpoint_model | |
| - name: Display model source | |
| ansible.builtin.debug: | |
| msg: "Detected model from endpoint: {{ test_model }}" | |
| - name: Handle external endpoint model detection/validation | |
| when: vllm_mode == 'external' | |
| block: | |
| - name: Validate external endpoint URL is configured | |
| ansible.builtin.assert: | |
| that: | |
| - vllm_endpoint.external.url is defined | |
| - vllm_endpoint.external.url is not none | |
| - vllm_endpoint.external.url | length > 0 | |
| fail_msg: | | |
| External endpoint URL is required when mode=external. | |
| Set via: export VLLM_ENDPOINT_URL=http://your-vllm-instance:8000 | |
| - name: Get model from external endpoint | |
| ansible.builtin.uri: | |
| url: "{{ vllm_endpoint.external.url }}/v1/models" | |
| method: GET | |
| status_code: 200 | |
| register: external_models_response | |
| delegate_to: localhost | |
| - name: Extract model name from endpoint | |
| ansible.builtin.set_fact: | |
| endpoint_model: "{{ external_models_response.json.data[0].id }}" | |
| - name: Use endpoint model if test_model not provided or set to auto-detect | |
| ansible.builtin.set_fact: | |
| test_model: "{{ endpoint_model }}" | |
| when: test_model is not defined or test_model == '__AUTO_DETECT__' | |
| - name: Validate test_model matches endpoint if provided | |
| ansible.builtin.assert: | |
| that: | |
| - test_model == endpoint_model | |
| fail_msg: | | |
| Model mismatch detected! | |
| Specified: {{ test_model }} | |
| Endpoint serving: {{ endpoint_model }} | |
| Either omit test_model to auto-detect, or ensure it matches the running model. | |
| when: test_model is defined and test_model != '__AUTO_DETECT__' and test_model != endpoint_model | |
| - name: Display model source | |
| ansible.builtin.debug: | |
| msg: "Detected model from endpoint: {{ test_model }}" |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@automation/test-execution/ansible/llm-benchmark-auto.yml` around lines 105 -
140, The external endpoint URL (vllm_endpoint.external.url) must be validated
before attempting to call the "Get model from external endpoint" task; add an
early guard (e.g., an ansible.builtin.assert or ansible.builtin.fail task run on
localhost when vllm_mode == 'external') that checks vllm_endpoint.external.url
is defined and not null/empty and returns a clear error message if invalid, then
proceed with the existing model-detection block that sets endpoint_model and
optionally sets/validates test_model.
automation/test-execution/ansible/llm-benchmark-concurrent-load.yml
Outdated
Show resolved
Hide resolved
automation/test-execution/ansible/roles/vllm_server/tasks/start-llm.yml
Outdated
Show resolved
Hide resolved
Changes based on PR redhat-et#85 feedback: 1. llm-benchmark-auto.yml: - Add validation for vllm_endpoint.external.url before API calls - Remove duplicate 'Set vLLM mode' task - Pass test_model explicitly to child plays to ensure proper propagation 2. llm-benchmark-concurrent-load.yml: - Remove 'short_codegen' from allowed workload types (not defined in test_configs) Note: Other review comments (cpu_utils.py, test_cpu_utils.py, start-llm.yml, collections/requirements.yml) apply to ansible-playbook-enhancements branch and will be addressed separately. Signed-off-by: Maryam Tahhan <mtahhan@redhat.com>
d83af3d to
faf80d7
Compare
|
rebasing now |
- Add environment variable support for external endpoints - VLLM_ENDPOINT_MODE: set to 'external' to use existing endpoint - VLLM_ENDPOINT_URL: URL of external vLLM instance - Auto-detect model name from /v1/models endpoint when test_model not provided - Skip NUMA topology detection and vLLM container management in external mode - Update documentation with external endpoint usage examples - Make test_model optional in concurrent load tests (auto-detected in external mode) This enables testing against cloud/K8s/production vLLM deployments without managing containers, simplifying test orchestration and better reflecting production deployment patterns. Signed-off-by: Maryam Tahhan <mtahhan@redhat.com>
Add pre_tasks with meta: end_play to NUMA detection, core allocation, and vLLM startup plays to prevent Ansible from trying to connect to DUT when using external endpoint mode. This fixes the issue where Ansible would fail trying to gather facts from DUT even when all tasks were properly skipped. Signed-off-by: Maryam Tahhan <mtahhan@redhat.com>
Add the following fields to test-metadata.json: - vllm_mode: 'managed' or 'external' - vllm_endpoint_url: External endpoint URL (or 'n/a' if managed) - model_source: 'auto-detected' or 'specified' This allows analysis tools to identify which tests were run against external endpoints and whether the model was auto-detected from the endpoint or explicitly specified. Signed-off-by: Maryam Tahhan <mtahhan@redhat.com>
In external mode, while DUT connection is not needed (vLLM runs externally), the load generator connection is still required to run GuideLLM benchmarks. Updates: - Add load generator env vars to external endpoint examples - Clarify that only load generator access is needed (not DUT) - Use generic hostnames/IPs in documentation examples Signed-off-by: Maryam Tahhan <mtahhan@redhat.com>
Enable API key authentication via environment variables for secured external vLLM endpoints: - VLLM_API_KEY_ENABLED: set to 'true' to enable API key auth - VLLM_API_KEY: the API key value This makes it easier to test against secured production vLLM deployments without editing configuration files. The existing API key infrastructure already supported multiple sources (env, file, vault, prompt), but the enabled flag was hardcoded to false. Updates documentation to show API key usage in all external endpoint examples. Signed-off-by: Maryam Tahhan <mtahhan@redhat.com>
Issues fixed: 1. Collect Logs and Cleanup plays were still trying to connect to DUT in external mode - added gather_facts: false and meta: end_play guards 2. Auto-detected model name was not propagating to results paths - added cacheable: true to set_fact so the detected test_model persists across plays and shows up in results paths instead of '__AUTO_DETECT__' These fixes ensure that in external mode: - No DUT connection attempts occur after benchmarking completes - Results are stored with the actual detected model name, not the sentinel value Signed-off-by: Maryam Tahhan <mtahhan@redhat.com>
Changes based on PR redhat-et#85 feedback: 1. llm-benchmark-auto.yml: - Add validation for vllm_endpoint.external.url before API calls - Remove duplicate 'Set vLLM mode' task - Pass test_model explicitly to child plays to ensure proper propagation 2. llm-benchmark-concurrent-load.yml: - Remove 'short_codegen' from allowed workload types (not defined in test_configs) Note: Other review comments (cpu_utils.py, test_cpu_utils.py, start-llm.yml, collections/requirements.yml) apply to ansible-playbook-enhancements branch and will be addressed separately. Signed-off-by: Maryam Tahhan <mtahhan@redhat.com>
When test_model is passed as '__AUTO_DETECT__' via import_playbook, it has extra vars precedence which cannot be overridden. Introduce actual_model variable to properly resolve the model name from external endpoints and use it in all result paths and metadata files. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> Signed-off-by: Maryam Tahhan <mtahhan@redhat.com>
bee4a5b to
ab07a91
Compare
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (2)
automation/test-execution/ansible/llm-benchmark-concurrent-load.yml (1)
59-64: Duplicatebase_workloadvalidation blocks.The same assertion for
base_workloadappears twice (lines 59-64 and 99-104). This is redundant and should be consolidated.♻️ Remove the duplicate validation
- name: Validate base_workload parameter ansible.builtin.assert: that: - base_workload is defined - base_workload in ['chat', 'rag', 'code', 'summarization'] fail_msg: "base_workload must be one of: chat, rag, code, summarization" - - name: Validate required parameters - ansible.builtin.assert: - that: ... - - name: Validate base_workload parameter - ansible.builtin.assert: - that: - - base_workload is defined - - base_workload in ['chat', 'rag', 'code', 'summarization'] - fail_msg: "base_workload must be one of: chat, rag, code, summarization"Also applies to: 99-104
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@automation/test-execution/ansible/llm-benchmark-concurrent-load.yml` around lines 59 - 64, Remove the duplicate "Validate base_workload parameter" task so the assertion appears only once; locate the two tasks named "Validate base_workload parameter" (one at lines around the first occurrence and the other near the second) and delete the redundant block, keeping a single assertion that checks base_workload is defined and in ['chat','rag','code','summarization'] and preserves the fail_msg text; ensure no other tasks rely on the deleted duplicate.automation/test-execution/ansible/llm-benchmark-auto.yml (1)
431-431: Inconsistent variable usage:test_modelvsactual_modelin metrics paths.Lines 431 and 458 use
test_modelfor the results path while other result-related paths (lines 511, 517, 525-526, etc.) useactual_model. Whiletest_modelgets set in external auto-detect mode, usingactual_modelconsistently would be clearer and more maintainable.♻️ Use actual_model consistently
- name: Start metrics collector in background ansible.builtin.include_role: name: vllm_metrics_collector vars: vllm_url: "{{ hostvars['localhost']['vllm_metrics_url'] }}" - results_path: "{{ playbook_dir }}/../../../results/llm/{{ test_model | replace('/', '__') }}/{{ workload_type }}-{{ hostvars['localhost']['test_run_id'] }}/{{ core_configuration.name }}" + results_path: "{{ playbook_dir }}/../../../results/llm/{{ hostvars['localhost']['actual_model'] | replace('/', '__') }}/{{ workload_type }}-{{ hostvars['localhost']['test_run_id'] }}/{{ core_configuration.name }}"Apply similar change to line 458.
Also applies to: 458-458
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@automation/test-execution/ansible/llm-benchmark-auto.yml` at line 431, The results_path currently uses the variable test_model which is inconsistent with other metrics paths that use actual_model; update the results_path entries (the YAML key results_path where it references {{ test_model | replace('/', '__') }}) to use {{ actual_model | replace('/', '__') }} instead, and make the same replacement for the second occurrence noted (the other results_path at the same block), ensuring all result-related paths uniformly use actual_model.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@automation/test-execution/ansible/llm-benchmark-auto.yml`:
- Around line 129-131: The task that sets endpoint_model from
external_models_response.json.data[0].id can crash if data is empty; add a guard
before or instead of the set_fact to validate external_models_response.json.data
has length > 0 and emit a clear error. Use ansible.builtin.assert or
ansible.builtin.fail to check "external_models_response.json.data | length > 0"
(referencing the task name "Extract model name from endpoint" and the variable
external_models_response.json.data) and only then set endpoint_model from
data[0].id; if the check fails, fail with a descriptive message like "No models
returned from external endpoint".
In `@automation/test-execution/dashboard-examples/vllm_dashboard/Home.py`:
- Around line 156-157: The note string in Home.py that currently states
"External endpoint runs show client metrics only (no server-side metrics)" is
inaccurate; update that docstring/text block (the triple-quoted note near the
end of the Home.py dashboard text) to a conditional statement that clarifies
external runs normally show only client metrics but will include server-side
metrics if the endpoint exposes a reachable /metrics path (i.e., change the
absolute assertion to a conditional phrasing mentioning the /metrics exception).
In
`@automation/test-execution/dashboard-examples/vllm_dashboard/pages/2_`🖥️_Server_Metrics.py:
- Around line 98-100: The loop currently unconditionally skips runs where
metadata.get('vllm_mode') == 'external', which hides valid runs that expose
/metrics (vllm-metrics.json); change the checks (the metadata.get('vllm_mode')
== 'external' conditions around the Server Metrics page) to only skip when the
run is external AND there are no server metrics present—i.e., check for the
existence of the vllm-metrics.json (or equivalent server metrics payload) for
that run before continuing; apply the same change to the other occurrences
referenced (the checks around lines with metadata.get('vllm_mode') at the two
other blocks) so external runs with metrics are included.
In `@tests/concurrent-load/concurrent-load.md`:
- Around line 388-455: The Phase 3 run steps in the "Using External vLLM
Endpoint" section contradict the document's earlier "Phase 3 not supported"
statement; update the section that shows Phase 3 commands (the block that
restarts vLLM with caching and runs Phase 3) to align with the doc: either
remove that Phase 3 example entirely or replace it with an explicit warning that
Phase 3 is unsupported in this mode and show the safe alternative (e.g.,
instruct users to skip_phase_3=true or only run Phases 1&2), and clarify the
meaning of VLLM_ENDPOINT_MODE and VLLM_API_KEY variables so the external-mode
workflow cannot be misinterpreted as supporting Phase 3.
---
Nitpick comments:
In `@automation/test-execution/ansible/llm-benchmark-auto.yml`:
- Line 431: The results_path currently uses the variable test_model which is
inconsistent with other metrics paths that use actual_model; update the
results_path entries (the YAML key results_path where it references {{
test_model | replace('/', '__') }}) to use {{ actual_model | replace('/', '__')
}} instead, and make the same replacement for the second occurrence noted (the
other results_path at the same block), ensuring all result-related paths
uniformly use actual_model.
In `@automation/test-execution/ansible/llm-benchmark-concurrent-load.yml`:
- Around line 59-64: Remove the duplicate "Validate base_workload parameter"
task so the assertion appears only once; locate the two tasks named "Validate
base_workload parameter" (one at lines around the first occurrence and the other
near the second) and delete the redundant block, keeping a single assertion that
checks base_workload is defined and in ['chat','rag','code','summarization'] and
preserves the fail_msg text; ensure no other tasks rely on the deleted
duplicate.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: a5845cb1-f3ea-4f12-9f57-3b05babb46e4
📒 Files selected for processing (12)
automation/test-execution/ansible/inventory/README.mdautomation/test-execution/ansible/inventory/group_vars/all/endpoints.ymlautomation/test-execution/ansible/llm-benchmark-auto.ymlautomation/test-execution/ansible/llm-benchmark-concurrent-load.ymlautomation/test-execution/ansible/roles/benchmark_guidellm/tasks/main.ymlautomation/test-execution/dashboard-examples/vllm_dashboard/Home.pyautomation/test-execution/dashboard-examples/vllm_dashboard/pages/1_📊_Client_Metrics.pyautomation/test-execution/dashboard-examples/vllm_dashboard/pages/2_🖥️_Server_Metrics.pydocs/dashboards-quickstart.mddocs/getting-started.mddocs/metrics-collection.mdtests/concurrent-load/concurrent-load.md
| - name: Extract model name from endpoint | ||
| ansible.builtin.set_fact: | ||
| endpoint_model: "{{ external_models_response.json.data[0].id }}" |
There was a problem hiding this comment.
No validation that external endpoint returns at least one model.
If the endpoint returns an empty data array, accessing data[0].id will fail with an unclear index error. Add a guard to provide a clearer error message.
🛡️ Proposed fix to validate models array
+ - name: Validate endpoint returned at least one model
+ ansible.builtin.assert:
+ that:
+ - external_models_response.json.data | length > 0
+ fail_msg: |
+ External endpoint returned no models.
+ URL: {{ vllm_endpoint.external.url }}/v1/models
+ Ensure the vLLM server has a model loaded.
+
- name: Extract model name from endpoint
ansible.builtin.set_fact:
endpoint_model: "{{ external_models_response.json.data[0].id }}"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@automation/test-execution/ansible/llm-benchmark-auto.yml` around lines 129 -
131, The task that sets endpoint_model from
external_models_response.json.data[0].id can crash if data is empty; add a guard
before or instead of the set_fact to validate external_models_response.json.data
has length > 0 and emit a clear error. Use ansible.builtin.assert or
ansible.builtin.fail to check "external_models_response.json.data | length > 0"
(referencing the task name "Extract model name from endpoint" and the variable
external_models_response.json.data) and only then set endpoint_model from
data[0].id; if the check fails, fail with a descriptive message like "No models
returned from external endpoint".
| **Note**: External endpoint runs show client metrics only (no server-side metrics). | ||
| """) |
There was a problem hiding this comment.
External-mode note is inaccurate for endpoints that expose /metrics.
The current text says external runs have no server-side metrics, but external mode can include server metrics when /metrics is reachable. Please update this to a conditional statement.
Suggested wording
- **Note**: External endpoint runs show client metrics only (no server-side metrics).
+ **Note**: External endpoint runs always show client metrics. Server-side metrics are
+ available when the external endpoint exposes `/metrics`.📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| **Note**: External endpoint runs show client metrics only (no server-side metrics). | |
| """) | |
| **Note**: External endpoint runs always show client metrics. Server-side metrics are | |
| available when the external endpoint exposes `/metrics`. | |
| """) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@automation/test-execution/dashboard-examples/vllm_dashboard/Home.py` around
lines 156 - 157, The note string in Home.py that currently states "External
endpoint runs show client metrics only (no server-side metrics)" is inaccurate;
update that docstring/text block (the triple-quoted note near the end of the
Home.py dashboard text) to a conditional statement that clarifies external runs
normally show only client metrics but will include server-side metrics if the
endpoint exposes a reachable /metrics path (i.e., change the absolute assertion
to a conditional phrasing mentioning the /metrics exception).
automation/test-execution/dashboard-examples/vllm_dashboard/pages/2_🖥️_Server_Metrics.py
Outdated
Show resolved
Hide resolved
| #### Using External vLLM Endpoint | ||
|
|
||
| To test against an existing vLLM instance: | ||
|
|
||
| **1. Start your vLLM instance:** | ||
|
|
||
| ```bash | ||
| # For Phases 1 & 2 (baseline/realistic - no caching) | ||
| vllm serve meta-llama/Llama-3.2-1B-Instruct \ | ||
| --host 0.0.0.0 \ | ||
| --port 8000 \ | ||
| --dtype bfloat16 \ | ||
| --max-model-len 8192 \ | ||
| --no-enable-prefix-caching | ||
|
|
||
| # For Phase 3 (production - with caching) | ||
| vllm serve meta-llama/Llama-3.2-1B-Instruct \ | ||
| --host 0.0.0.0 \ | ||
| --port 8000 \ | ||
| --dtype bfloat16 \ | ||
| --max-model-len 8192 \ | ||
| --enable-prefix-caching | ||
| ``` | ||
|
|
||
| **2. Configure load generator and external endpoint:** | ||
|
|
||
| ```bash | ||
| # Load generator connection (required - GuideLLM runs here) | ||
| export LOADGEN_HOSTNAME=192.168.1.200 # or your load generator hostname/IP | ||
| export ANSIBLE_SSH_USER=ec2-user | ||
| export ANSIBLE_SSH_KEY=~/.ssh/my-key.pem | ||
|
|
||
| # External vLLM endpoint (replaces DUT configuration) | ||
| export VLLM_ENDPOINT_MODE=external | ||
| export VLLM_ENDPOINT_URL=http://your-vllm-host:8000 | ||
|
|
||
| # Optional: API key for secured vLLM endpoints | ||
| export VLLM_API_KEY_ENABLED=true | ||
| export VLLM_API_KEY=your-api-key | ||
|
|
||
| # Optional: HuggingFace token (if needed by GuideLLM) | ||
| export HF_TOKEN=hf_xxxxx | ||
| ``` | ||
|
|
||
| **Note:** In external mode, you only need to configure the load generator. | ||
| DUT connection is not required since vLLM is already running externally. | ||
|
|
||
| **3. Run tests (model name auto-detected from endpoint):** | ||
|
|
||
| ```bash | ||
| cd ../../automation/test-execution/ansible | ||
|
|
||
| # Run Phases 1 & 2 (against baseline vLLM instance) | ||
| # Note: test_model is optional - will be queried from endpoint | ||
| ansible-playbook -i inventory/hosts.yml \ | ||
| llm-benchmark-concurrent-load.yml \ | ||
| -e "base_workload=chat" \ | ||
| -e "requested_cores=16" \ | ||
| -e "skip_phase_3=true" | ||
|
|
||
| # Restart vLLM with caching enabled, then run Phase 3 | ||
| ansible-playbook -i inventory/hosts.yml \ | ||
| llm-benchmark-concurrent-load.yml \ | ||
| -e "base_workload=chat" \ | ||
| -e "requested_cores=16" \ | ||
| -e "skip_phase_1=true" \ | ||
| -e "skip_phase_2=true" | ||
| ``` |
There was a problem hiding this comment.
Phase 3 external instructions conflict with this file’s “Phase 3 not supported” status.
This section provides runnable Phase 3 steps, but the same document declares Phase 3 unsupported (Line 123). Please align these instructions to avoid sending users into a known unsupported path.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/concurrent-load/concurrent-load.md` around lines 388 - 455, The Phase 3
run steps in the "Using External vLLM Endpoint" section contradict the
document's earlier "Phase 3 not supported" statement; update the section that
shows Phase 3 commands (the block that restarts vLLM with caching and runs Phase
3) to align with the doc: either remove that Phase 3 example entirely or replace
it with an explicit warning that Phase 3 is unsupported in this mode and show
the safe alternative (e.g., instruct users to skip_phase_3=true or only run
Phases 1&2), and clarify the meaning of VLLM_ENDPOINT_MODE and VLLM_API_KEY
variables so the external-mode workflow cannot be misinterpreted as supporting
Phase 3.
…nal endpoints Enhances external endpoint testing with intelligent metrics collection and comprehensive dashboard support. **Metrics Collection:** - Auto-detect if external endpoint exposes /metrics endpoint - Enable server metrics collection when available (HTTP 200) - Gracefully skip when not available (HTTP 404/403) - Display clear status messages during test execution **Dashboard Updates:** - Add vLLM Mode filter to Client Metrics (managed/external) - Skip external runs in Server Metrics (no data available) - Display helpful info messages for external endpoint runs - Update Home page with external mode examples **Documentation:** - Add external endpoint testing to getting-started.md - Document metrics availability in dashboards-quickstart.md - Add comprehensive external metrics guide in metrics-collection.md - Include troubleshooting and best practices This allows users to benchmark production/cloud/K8s vLLM deployments while automatically collecting server metrics when endpoints expose them. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> Signed-off-by: Maryam Tahhan <mtahhan@redhat.com>
ab07a91 to
f0e2dbc
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
automation/test-execution/dashboard-examples/vllm_dashboard/pages/2_🖥️_Server_Metrics.py (1)
98-100:⚠️ Potential issue | 🟠 MajorDon't discard external runs that already have a metrics file.
This loader is already walking
vllm-metrics.json, so thevllm_mode == 'external'guard hides valid external runs that expose/metrics. The empty-state note and skipped-run counter below then reinforce the same false assumption, and the bareexcept: passcan silently undercount malformed metadata.Possible fix
- # Skip external endpoint runs (no server metrics available) - if metadata.get('vllm_mode') == 'external': - continue - data['platform'] = metadata.get('platform', 'unknown') data['model'] = metadata.get('model', 'unknown') data['workload'] = metadata.get('workload', 'unknown') @@ - st.info("**Note**: External endpoint runs do not have server metrics (only client-side metrics available)") + st.info( + "No server metrics were found. External runs only appear here when " + "`vllm-metrics.json` was collected." + ) @@ -# Check if any external runs were skipped +# Check if any external runs are missing a server-metrics file results_path = Path(results_dir) -external_count = 0 +external_without_metrics_count = 0 if results_path.exists(): for metadata_file in results_path.rglob("test-metadata.json"): try: with open(metadata_file) as f: metadata = json.load(f) if metadata.get('vllm_mode') == 'external': - external_count += 1 - except: - pass + if not (metadata_file.parent / "vllm-metrics.json").exists(): + external_without_metrics_count += 1 + except Exception as e: + st.sidebar.warning(f"Failed to read {metadata_file.name}: {e}") -if external_count > 0: - st.info(f"ℹ️ Skipped {external_count} external endpoint runs (no server metrics available). View them in the Client Metrics dashboard.") +if external_without_metrics_count > 0: + st.info( + f"Skipped {external_without_metrics_count} external endpoint runs " + "without server metrics. View them in the Client Metrics dashboard." + )Also applies to: 128-129, 133-147
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@automation/test-execution/dashboard-examples/vllm_dashboard/pages/2_`🖥️_Server_Metrics.py around lines 98 - 100, The code currently unconditionally skips runs where metadata.get('vllm_mode') == 'external', which hides valid external runs that include a vllm-metrics.json; remove or change this guard so external runs are only skipped if no metrics file is present (i.e., check for the existence/parseability of the vllm-metrics.json before continuing), and replace the bare except: pass around the metadata/metrics parsing with specific exception handling (e.g., FileNotFoundError, JSONDecodeError, KeyError) so malformed metadata/metrics are counted/logged correctly; ensure any skipped-run counter and the empty-state note are updated only when no metrics file is found rather than for all external runs.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In
`@automation/test-execution/dashboard-examples/vllm_dashboard/pages/1_`📊_Client_Metrics.py:
- Around line 109-111: The efficiency calculation currently divides
throughput_mean by cores and can produce inf/NaN when core_count is missing;
update the computation that sets df["efficiency"] (and any place computing
per-core metrics) to coerce cores to numeric (e.g., pd.to_numeric(df["cores"],
errors="coerce")) and compute efficiency only when cores > 0 (e.g., use a
conditional like np.where(cores > 0, df["throughput_mean"] / cores, np.nan)).
After that, gate all per-core views and cards (the efficiency plot, comparison
card and data table) by checking filtered_df["efficiency"].notna().any() or hide
them when test_mode == "external" so external runs with no core_count do not
render per-core metrics.
---
Duplicate comments:
In
`@automation/test-execution/dashboard-examples/vllm_dashboard/pages/2_`🖥️_Server_Metrics.py:
- Around line 98-100: The code currently unconditionally skips runs where
metadata.get('vllm_mode') == 'external', which hides valid external runs that
include a vllm-metrics.json; remove or change this guard so external runs are
only skipped if no metrics file is present (i.e., check for the
existence/parseability of the vllm-metrics.json before continuing), and replace
the bare except: pass around the metadata/metrics parsing with specific
exception handling (e.g., FileNotFoundError, JSONDecodeError, KeyError) so
malformed metadata/metrics are counted/logged correctly; ensure any skipped-run
counter and the empty-state note are updated only when no metrics file is found
rather than for all external runs.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: a5c68ac8-2265-4b2a-996a-2e93d5edd990
📒 Files selected for processing (7)
automation/test-execution/ansible/llm-benchmark-auto.ymlautomation/test-execution/dashboard-examples/vllm_dashboard/Home.pyautomation/test-execution/dashboard-examples/vllm_dashboard/pages/1_📊_Client_Metrics.pyautomation/test-execution/dashboard-examples/vllm_dashboard/pages/2_🖥️_Server_Metrics.pydocs/dashboards-quickstart.mddocs/getting-started.mddocs/metrics-collection.md
✅ Files skipped from review due to trivial changes (1)
- docs/dashboards-quickstart.md
🚧 Files skipped from review as they are similar to previous changes (4)
- docs/getting-started.md
- automation/test-execution/dashboard-examples/vllm_dashboard/Home.py
- docs/metrics-collection.md
- automation/test-execution/ansible/llm-benchmark-auto.yml
automation/test-execution/dashboard-examples/vllm_dashboard/pages/1_📊_Client_Metrics.py
Show resolved
Hide resolved
…dology Adopt sophisticated performance comparison methodology from the GPU-focused performance-dashboard, adapted for CPU-specific analysis: - Add geometric mean aggregation as alternative to peak-only comparison - Track at which request rate/concurrency peak performance occurs - Implement separate views for managed (DUT) vs external endpoint tests - Remove efficiency metric (tokens/sec/core) for external endpoints where core count is unknown - Add flexible comparison modes: Peak (best value) vs Geometric Mean (average across all load points) - Enhance summary tables to show load point where optimal performance occurs This provides more statistically rigorous performance comparisons while adapting the methodology for CPU inference workloads and multi-deployment scenarios (managed container vs cloud/K8s endpoints). Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> Signed-off-by: Maryam Tahhan <mtahhan@redhat.com>
There was a problem hiding this comment.
Actionable comments posted: 3
♻️ Duplicate comments (1)
automation/test-execution/dashboard-examples/vllm_dashboard/pages/1_📊_Client_Metrics.py (1)
160-162:⚠️ Potential issue | 🟠 Major
efficiencyis still computed without guarding missing/zero cores.This still produces
inf/NaNfor rows lacking validcore_count(notably external/legacy metadata), which can leak into downstream table/export logic even if charts are mode-gated.Suggested fix
- # Calculate efficiency (throughput per core) - df['efficiency'] = df['throughput_mean'] / df['cores'] + # Calculate efficiency (throughput per core) only when cores are valid + cores = pd.to_numeric(df['cores'], errors='coerce') + df['efficiency'] = np.where(cores > 0, df['throughput_mean'] / cores, np.nan)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@automation/test-execution/dashboard-examples/vllm_dashboard/pages/1_`📊_Client_Metrics.py around lines 160 - 162, Compute efficiency only when core counts are present and non-zero: coerce df['cores'] to numeric, replace or mask zeros/invalid entries (e.g., set zeros or NaNs to np.nan), then set df['efficiency'] = df['throughput_mean'] / df['cores'] only for valid cores (or use df['efficiency'] = df['throughput_mean'].where(df['cores']>0) / df['cores']). Ensure df['efficiency'] remains NaN for rows with missing/zero/invalid core counts so no inf/NaN leak into downstream exports.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In
`@automation/test-execution/dashboard-examples/vllm_dashboard/pages/1_`📊_Client_Metrics.py:
- Around line 579-581: The selectbox options can be empty when all
vllm_endpoint_url values are 'n/a', causing errors; update the logic around
baseline_endpoints / baseline_endpoint /baseline_df (and the analogous
compare_endpoints/compare_endpoint/compare_df at lines ~621-623) to handle empty
option lists by replacing an empty filtered list with a safe default (e.g.,
['n/a']) and ensuring the selectbox has a valid default selection before
filtering the DataFrame; in short, when computing baseline_endpoints (and
compare_endpoints), if the list is empty set a fallback option and then build
baseline_df (and compare_df) using that safe selected value so the page doesn't
error when no valid URLs exist.
- Around line 338-347: The current exact set intersection of a_loads and b_loads
is brittle for floating-point request_rate; replace the strict equality check
with tolerant matching (e.g., use numpy.isclose or rounding/quantization) to
consider values equal within a small atol/rtol so small numeric drift doesn't
produce an empty intersection; implement this by computing matches between
unique values in data_a and data_b (or by rounding x_axis_column to a fixed
number of decimals or using pandas.merge_asof after sorting) and then build
common matched values (or a mapping) to filter data_a and data_b (replace the
a_loads/b_loads intersection and the a_common/b_common filtering logic).
- Around line 745-755: The displayed compare_val must be computed from the same
common-subset used to compute pct_diff, not from full compare_data; update the
logic around pct_diff/compare_val to use the common subset (e.g., use
compare_common or the common-load result returned by compare_two_datasets) and
compute compare_val = (max or min for aggregation=="peak", or geometric_mean) on
compare_common[config["column"]]. If compare_two_datasets currently doesn't
return the common subset or a basis value, modify it to return that value (or
the subset) and read that returned value here instead of recomputing from
compare_data.
---
Duplicate comments:
In
`@automation/test-execution/dashboard-examples/vllm_dashboard/pages/1_`📊_Client_Metrics.py:
- Around line 160-162: Compute efficiency only when core counts are present and
non-zero: coerce df['cores'] to numeric, replace or mask zeros/invalid entries
(e.g., set zeros or NaNs to np.nan), then set df['efficiency'] =
df['throughput_mean'] / df['cores'] only for valid cores (or use
df['efficiency'] = df['throughput_mean'].where(df['cores']>0) / df['cores']).
Ensure df['efficiency'] remains NaN for rows with missing/zero/invalid core
counts so no inf/NaN leak into downstream exports.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: ec671a4e-9038-4c84-9e7d-7f05a4802641
📒 Files selected for processing (1)
automation/test-execution/dashboard-examples/vllm_dashboard/pages/1_📊_Client_Metrics.py
| a_loads = set(data_a[x_axis_column].dropna().unique()) | ||
| b_loads = set(data_b[x_axis_column].dropna().unique()) | ||
| common = a_loads.intersection(b_loads) | ||
|
|
||
| if not common: | ||
| return None, None, None, None, None | ||
|
|
||
| # Filter to common load points | ||
| a_common = data_a[data_a[x_axis_column].isin(common)] | ||
| b_common = data_b[data_b[x_axis_column].isin(common)] |
There was a problem hiding this comment.
Common-load matching is brittle for request_rate comparisons.
Using exact set intersection on raw float load values can drop all overlap due to tiny numeric drift between runs, causing false “no comparable data”.
Suggested fix
- a_loads = set(data_a[x_axis_column].dropna().unique())
- b_loads = set(data_b[x_axis_column].dropna().unique())
+ if x_axis_column == "request_rate":
+ a_keys = np.round(pd.to_numeric(data_a[x_axis_column], errors="coerce"), 2)
+ b_keys = np.round(pd.to_numeric(data_b[x_axis_column], errors="coerce"), 2)
+ else:
+ a_keys = pd.to_numeric(data_a[x_axis_column], errors="coerce")
+ b_keys = pd.to_numeric(data_b[x_axis_column], errors="coerce")
+
+ a_loads = set(a_keys.dropna().unique())
+ b_loads = set(b_keys.dropna().unique())
common = a_loads.intersection(b_loads)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@automation/test-execution/dashboard-examples/vllm_dashboard/pages/1_`📊_Client_Metrics.py
around lines 338 - 347, The current exact set intersection of a_loads and
b_loads is brittle for floating-point request_rate; replace the strict equality
check with tolerant matching (e.g., use numpy.isclose or rounding/quantization)
to consider values equal within a small atol/rtol so small numeric drift doesn't
produce an empty intersection; implement this by computing matches between
unique values in data_a and data_b (or by rounding x_axis_column to a fixed
number of decimals or using pandas.merge_asof after sorting) and then build
common matched values (or a mapping) to filter data_a and data_b (replace the
a_loads/b_loads intersection and the a_common/b_common filtering logic).
| baseline_endpoints = [e for e in df['vllm_endpoint_url'].unique() if e != 'n/a'] | ||
| baseline_endpoint = st.selectbox("Endpoint URL", baseline_endpoints, key="baseline_endpoint") | ||
| baseline_df = df[df['vllm_endpoint_url'] == baseline_endpoint] |
There was a problem hiding this comment.
External compare selectors can fail when no valid endpoint URLs exist.
If external rows only have 'n/a' endpoint metadata, both st.selectbox("Endpoint URL", ...) calls receive empty options and the page can error.
Suggested fix
else:
# External mode: filter by endpoint
baseline_endpoints = [e for e in df['vllm_endpoint_url'].unique() if e != 'n/a']
+ if not baseline_endpoints:
+ st.warning("No valid endpoint URLs found for external runs.")
+ return
baseline_endpoint = st.selectbox("Endpoint URL", baseline_endpoints, key="baseline_endpoint")
baseline_df = df[df['vllm_endpoint_url'] == baseline_endpoint]
...
else:
# External mode: filter by endpoint
compare_endpoints = [e for e in df['vllm_endpoint_url'].unique() if e != 'n/a']
+ if not compare_endpoints:
+ st.warning("No valid endpoint URLs found for external runs.")
+ return
compare_endpoint = st.selectbox("Endpoint URL", compare_endpoints, key="compare_endpoint")
compare_df = df[df['vllm_endpoint_url'] == compare_endpoint]Also applies to: 621-623
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@automation/test-execution/dashboard-examples/vllm_dashboard/pages/1_`📊_Client_Metrics.py
around lines 579 - 581, The selectbox options can be empty when all
vllm_endpoint_url values are 'n/a', causing errors; update the logic around
baseline_endpoints / baseline_endpoint /baseline_df (and the analogous
compare_endpoints/compare_endpoint/compare_df at lines ~621-623) to handle empty
option lists by replacing an empty filtered list with a safe default (e.g.,
['n/a']) and ensuring the selectbox has a valid default selection before
filtering the DataFrame; in short, when computing baseline_endpoints (and
compare_endpoints), if the list is empty set a fallback option and then build
baseline_df (and compare_df) using that safe selected value so the page doesn't
error when no valid URLs exist.
| if pct_diff is not None: | ||
| # Get the actual values for display | ||
| if aggregation == "peak": | ||
| if config["higher_is_better"]: | ||
| compare_val = compare_data[config["column"]].max() | ||
| else: | ||
| compare_val = compare_data[config["column"]].min() | ||
| else: | ||
| compare_val = geometric_mean( | ||
| compare_data[config["column"]].dropna().tolist() | ||
| ) |
There was a problem hiding this comment.
Comparison card value can disagree with its delta calculation.
pct_diff is computed from common-load subsets in compare_two_datasets, but the displayed compare_val here is recomputed from full compare_data, which can show a different value than the delta basis.
Suggested fix
- Returns:
- (pct_diff, a_is_better, a_peak_load, b_peak_load, is_similar)
+ Returns:
+ (pct_diff, a_is_better, a_peak_load, b_peak_load, is_similar, a_val, b_val)
...
- return pct_diff, a_better, a_peak_load, b_peak_load, is_similar
+ return pct_diff, a_better, a_peak_load, b_peak_load, is_similar, a_val, b_val- pct_diff, better, a_load, b_load, similar = (
+ pct_diff, better, a_load, b_load, similar, _a_val, compare_val = (
comparison_results[metric_name]
)
...
- if aggregation == "peak":
- if config["higher_is_better"]:
- compare_val = compare_data[config["column"]].max()
- else:
- compare_val = compare_data[config["column"]].min()
- else:
- compare_val = geometric_mean(
- compare_data[config["column"]].dropna().tolist()
- )
+ # compare_val comes from the same subset used for pct_diff🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@automation/test-execution/dashboard-examples/vllm_dashboard/pages/1_`📊_Client_Metrics.py
around lines 745 - 755, The displayed compare_val must be computed from the same
common-subset used to compute pct_diff, not from full compare_data; update the
logic around pct_diff/compare_val to use the common subset (e.g., use
compare_common or the common-load result returned by compare_two_datasets) and
compute compare_val = (max or min for aggregation=="peak", or geometric_mean) on
compare_common[config["column"]]. If compare_two_datasets currently doesn't
return the common subset or a basis value, modify it to return that value (or
the subset) and read that returned value here instead of recomputing from
compare_data.
- Remove skip logic for external endpoints with server metrics - Add radio buttons for managed/external mode selection (similar to Client Metrics) - Support filtering by endpoint URL in external mode - Update comparison views to handle both deployment modes - Show appropriate metadata (endpoint vs platform/cores) based on mode Fixes the issue where external endpoints with available /metrics endpoint were being excluded from the Server Metrics dashboard. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> Signed-off-by: Maryam Tahhan <mtahhan@redhat.com>
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
automation/test-execution/dashboard-examples/vllm_dashboard/pages/2_🖥️_Server_Metrics.py (1)
93-108:⚠️ Potential issue | 🟠 MajorDefault deployment metadata before the
test-metadata.jsonguard.If
vllm-metrics.jsonexists buttest-metadata.jsondoes not, this function still appends the record withoutvllm_modeorvllm_endpoint_url. The later mode filter then treats it as neither managed nor external and drops it entirely, so a metrics-only run shows up as “loaded” and then disappears from the page.Suggested fix
# Load test metadata metadata_file = metrics_file.parent / "test-metadata.json" + data['vllm_mode'] = 'managed' + data['vllm_endpoint_url'] = 'n/a' if metadata_file.exists(): with open(metadata_file) as f: metadata = json.load(f) # Load metadata for both managed and external endpoints @@ - data['vllm_mode'] = metadata.get('vllm_mode', 'managed') - data['vllm_endpoint_url'] = metadata.get('vllm_endpoint_url', 'n/a') + data['vllm_mode'] = metadata.get('vllm_mode', data['vllm_mode']) + data['vllm_endpoint_url'] = metadata.get('vllm_endpoint_url', data['vllm_endpoint_url'])🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@automation/test-execution/dashboard-examples/vllm_dashboard/pages/2_`🖥️_Server_Metrics.py around lines 93 - 108, The bug is that when metrics_file exists but metadata_file (test-metadata.json) does not, the data dict never gets vllm_mode or vllm_endpoint_url set, causing later filters to drop the record; fix by ensuring default deployment metadata are assigned to data before or when skipping the metadata_file guard: set data['vllm_mode']='managed' and data['vllm_endpoint_url']='n/a' (and any other defaults like 'platform','model','workload','test_run_id','cores','backend','vllm_version','core_config') whenever metadata_file.exists() is False so metrics-only runs still produce a complete data record (update the block around metadata_file, metrics_file and the data dict initialization to apply these defaults).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In
`@automation/test-execution/dashboard-examples/vllm_dashboard/pages/2_`🖥️_Server_Metrics.py:
- Around line 300-303: The endpoint-shortening logic collapses host:port and
breaks IPv6 by splitting on ':' after '//'—update the logic that computes
endpoint_short (currently set from endpoint_url variable) to preserve host:port
and correctly handle IPv6: parse endpoint_url to extract the netloc
(host[:port]) rather than naive string splits, then use that netloc for
cols[0].metric and other uses (replace the same logic at the other occurrences
noted) so labels keep distinct host:port and full IPv6 addresses.
---
Outside diff comments:
In
`@automation/test-execution/dashboard-examples/vllm_dashboard/pages/2_`🖥️_Server_Metrics.py:
- Around line 93-108: The bug is that when metrics_file exists but metadata_file
(test-metadata.json) does not, the data dict never gets vllm_mode or
vllm_endpoint_url set, causing later filters to drop the record; fix by ensuring
default deployment metadata are assigned to data before or when skipping the
metadata_file guard: set data['vllm_mode']='managed' and
data['vllm_endpoint_url']='n/a' (and any other defaults like
'platform','model','workload','test_run_id','cores','backend','vllm_version','core_config')
whenever metadata_file.exists() is False so metrics-only runs still produce a
complete data record (update the block around metadata_file, metrics_file and
the data dict initialization to apply these defaults).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 58be081b-4044-4b41-9f95-b84ac08382cf
📒 Files selected for processing (1)
automation/test-execution/dashboard-examples/vllm_dashboard/pages/2_🖥️_Server_Metrics.py
| endpoint_url = test_data.get('vllm_endpoint_url', 'N/A') | ||
| # Shorten endpoint for display | ||
| endpoint_short = endpoint_url.split('//')[1].split(':')[0] if '//' in endpoint_url else endpoint_url | ||
| cols[0].metric("Endpoint", endpoint_short) |
There was a problem hiding this comment.
Preserve host:port when shortening external endpoint URLs.
The current split('//')[1].split(':')[0] logic collapses https://host:8000 and https://host:9000 into the same label, and it truncates IPv6 endpoints at the first colon. In compare mode, that makes different external targets indistinguishable in the header, legend, and summary table.
Suggested fix
- endpoint_short = endpoint_url.split('//')[1].split(':')[0] if '//' in endpoint_url else endpoint_url
+ endpoint_short = endpoint_url.split('//', 1)[-1].rsplit('@', 1)[-1].split('/', 1)[0]
@@
- baseline_endpoint_short = baseline_endpoint_short.split('//')[1].split(':')[0]
+ baseline_endpoint_short = baseline_endpoint_short.split('//', 1)[-1].rsplit('@', 1)[-1].split('/', 1)[0]
@@
- compare_endpoint_short = compare_endpoint_short.split('//')[1].split(':')[0]
+ compare_endpoint_short = compare_endpoint_short.split('//', 1)[-1].rsplit('@', 1)[-1].split('/', 1)[0]
@@
- endpoint_short = endpoint_url.split('//')[1].split(':')[0] if '//' in endpoint_url else endpoint_url
+ endpoint_short = endpoint_url.split('//', 1)[-1].rsplit('@', 1)[-1].split('/', 1)[0]Also applies to: 977-984, 1094-1096
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@automation/test-execution/dashboard-examples/vllm_dashboard/pages/2_`🖥️_Server_Metrics.py
around lines 300 - 303, The endpoint-shortening logic collapses host:port and
breaks IPv6 by splitting on ':' after '//'—update the logic that computes
endpoint_short (currently set from endpoint_url variable) to preserve host:port
and correctly handle IPv6: parse endpoint_url to extract the netloc
(host[:port]) rather than naive string splits, then use that netloc for
cols[0].metric and other uses (replace the same logic at the other occurrences
noted) so labels keep distinct host:port and full IPv6 addresses.
The benchmark_guidellm role fetches guidellm.log from the load generator to the controller's ~/benchmark-results directory. However, the synchronize task only pulls files from the load_generator's ~/benchmark-results to the final results directory, missing the log file that was fetched to the controller. This adds: 1. A copy task to copy guidellm.log from the controller's ~/benchmark-results to the final results directory (results/llm/...) so it appears alongside the other benchmark files 2. A cleanup task to remove the entire test run directory from the controller's ~/benchmark-results after copying is complete Fixes the issue where guidellm.log was missing from the final results directory after test execution, and prevents accumulation of old test results in ~/benchmark-results. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Python buffers stdout when redirected to a file (via nohup > file), causing print() statements to not appear in metrics-collector.log until the buffer is flushed or the script completes. Adding the `-u` flag runs Python in unbuffered mode, ensuring that log messages appear immediately in metrics-collector.log for real-time monitoring of the collection process. Fixes the issue where metrics-collector.log appears empty even though the collection script is running. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
The metrics collector was running for the full estimated duration (test_duration + 120s buffer) even when the test completed early, causing the stop task to timeout waiting for vllm-metrics.json. Changes: 1. Added signal handlers (SIGTERM/SIGINT) to the metrics collector script for graceful shutdown 2. Modified the main loop to check should_stop flag 3. Updated stop task to: - Send SIGTERM first for graceful shutdown - Wait up to 60s for the metrics file to be written - Force kill (SIGKILL) only if still running after timeout 4. Added file existence check and better status reporting This allows the metrics collector to be stopped cleanly when the benchmark completes, while still writing the collected samples to the metrics file. Fixes the timeout and missing vllm-metrics.json issues. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> Signed-off-by: Maryam Tahhan <mtahhan@redhat.com>
|
@lmzuccarelli @PoolPooer this is how I tested Setup vllm on a host. then on the controller node: then run a test check the dashboard after... |
PR Summary: External vLLM Endpoint Support for Concurrent Load Testing
Overview
This PR adds support for running concurrent load tests against existing vLLM endpoints (cloud, Kubernetes, production deployments) instead of requiring Ansible to manage vLLM containers. This simplifies test orchestration, reduces test execution time, and better reflects production deployment patterns.
Key Features
1. Environment Variable Configuration
VLLM_ENDPOINT_MODE: Set toexternalto use existing endpoint (default:managed)VLLM_ENDPOINT_URL: Full URL to external vLLM instance2. Automatic Model Detection
/v1/modelsendpoint whentest_modelnot providedtest_modelis specified3. Smart DUT Handling
4. Enhanced Test Metadata
New fields in
test-metadata.json:vllm_mode: "managed" or "external"vllm_endpoint_url: External endpoint URLmodel_source: "auto-detected" or "specified"Usage
Quick Start
Alternative: Configuration File
Edit
inventory/group_vars/all/endpoints.yml:Benefits
Changes Summary
Modified Files
automation/test-execution/ansible/inventory/group_vars/all/endpoints.yml
VLLM_ENDPOINT_MODEandVLLM_ENDPOINT_URLautomation/test-execution/ansible/llm-benchmark-auto.yml
/v1/modelsendpointvllm_mode,vllm_endpoint_url, andmodel_sourceto test metadataautomation/test-execution/ansible/llm-benchmark-concurrent-load.yml
test_modelparameter in external modeautomation/test-execution/ansible/inventory/README.md
tests/concurrent-load/concurrent-load.md
test_modelautomation/test-execution/ansible/inventory/group_vars/all/infrastructure.yml
Commits
fix: add registry prefix to vLLM container image for podmanfeat: add external vLLM endpoint support with auto-detectionfix: skip DUT connection entirely in external modemeta: end_playto cleanly skip entire playsfeat: add vLLM deployment mode to test metadataTesting
All changes tested with:
Validation Checklist
/v1/modelsworksDocumentation
Complete documentation added:
Backward Compatibility
✅ Fully backward compatible
managedFuture Enhancements
Potential follow-ups (not in this PR):
Related Issues
Addresses the need to test against existing vLLM deployments without managing infrastructure through Ansible.
Summary by CodeRabbit
New Features
Documentation