MGMT-23234: [BE] Improve the UX around the amd/nvidia operators selection - add profile to selection#9993
Conversation
|
@shay23bra: This pull request references MGMT-23234 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the sub-task to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughAdds AMD/NVIDIA GPU vendor toggles across API, client, server, manager, mocks, URL builders, and OpenAPI: new AmdEnabled/NvidiaEnabled parameters (default true) flow from request binding through handler branching to new Manager GPU-filtering methods that filter bundles and their operator dependencies. Changes
Sequence DiagramsequenceDiagram
participant Client
participant Handler
participant Manager
participant BundleStorage
Client->>Handler: GET /v2/operators/bundles?amd_enabled=false&nvidia_enabled=true
Handler->>Manager: ListBundlesWithGPUFilter(filters, featureIDs, GPUFilter{AmdEnabled:false, NvidiaEnabled:true})
Manager->>BundleStorage: Fetch bundles
loop per bundle
Manager->>Manager: Evaluate bundle operators
Manager->>Manager: shouldSkipOperatorForGPUFilter(operator, GPUFilter)
alt skip operator
Manager->>Manager: Remove operator and dependents
else
Manager->>Manager: Keep operator
end
Manager->>Manager: Assemble filtered bundle
end
Manager-->>Handler: Return filtered bundles
Handler-->>Client: 200 OK with filtered bundles
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes 🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
@shay23bra: This pull request references MGMT-23234 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the sub-task to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (2)
swagger.yaml (1)
2785-2794: Consider extracting the GPU filters into shared Swagger parameters.These two blocks are now identical and need to stay in sync across both endpoints. Moving them to top-level reusable parameter definitions would reduce drift between the spec and the generated client/server code.
Also applies to: 2833-2842
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@swagger.yaml` around lines 2785 - 2794, Extract the duplicated query parameter blocks for nvidia_enabled and amd_enabled into reusable top-level Swagger parameter definitions (e.g., components.parameters.NvidiaEnabled and components.parameters.AmdEnabled) and replace each inline occurrence (the two identical blocks at/around the current query params) with $ref references to those component parameters; ensure the parameter names, descriptions, types (boolean) and defaults are preserved in the shared definitions and update both endpoints (the blocks at lines around 2785–2794 and 2833–2842) to reference the new shared parameters so they stay in sync.internal/operators/manager.go (1)
743-790: Unify bundle construction behind the filter-aware path.These methods now duplicate the existing
ListBundles/GetBundleloops. Please make the legacy methods delegate here with anilfilter so omitted params and explicitamd_enabled=true&nvidia_enabled=truecannot drift over time.♻️ Proposed refactor
func (mgr *Manager) ListBundles(filters *featuresupport.SupportLevelFilters, featureIDs []models.FeatureSupportLevelID) []*models.Bundle { - var ret []*models.Bundle - - for _, basicBundleDetails := range operatorscommon.Bundles { - completeBundleDetails, err := mgr.GetBundle(basicBundleDetails.ID, featureIDs) - if err != nil { - mgr.log.Error(err) - continue - } - - if mgr.isBundleSupported(completeBundleDetails, filters, featureIDs) { - ret = append(ret, completeBundleDetails) - } - } - - return ret + return mgr.ListBundlesWithGPUFilter(filters, featureIDs, nil) } func (mgr *Manager) GetBundle(bundleID string, featureIDs []models.FeatureSupportLevelID) (*models.Bundle, error) { - bundle, ok := mgr.lookupBundle(bundleID) - if !ok { - return nil, fmt.Errorf("bundle '%s' is not supported", bundleID) - } - - for _, operator := range mgr.olmOperators { - operatorBundles := operator.GetBundleLabels(featureIDs) - for _, operatorBundle := range operatorBundles { - if operatorBundle == bundleID { - bundle.Operators = append(bundle.Operators, operator.GetName()) - break - } - } - } - - return bundle, nil + return mgr.GetBundleWithGPUFilter(bundleID, featureIDs, nil) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/operators/manager.go` around lines 743 - 790, Refactor the legacy ListBundles and GetBundle implementations to delegate to the GPU-aware paths by calling ListBundlesWithGPUFilter(..., nil) and GetBundleWithGPUFilter(..., nil) respectively so that bundle construction is unified and GPU filtering defaults (nil -> no filter / both vendors) cannot drift; update references to operatorscommon.Bundles usage only in ListBundlesWithGPUFilter/GetBundleWithGPUFilter, remove duplicated operator iteration logic from the legacy methods, and ensure function signatures remain compatible (use the existing Manager.ListBundles, Manager.GetBundle names to call into Manager.ListBundlesWithGPUFilter and Manager.GetBundleWithGPUFilter with a nil GPUFilter).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@internal/operators/manager.go`:
- Around line 765-789: GetBundleWithGPUFilter currently filters operators
transiently but never persists the GPUFilter, so ResolveDependencies and
openshift_ai.GetDependencies lose the user's selection; persist the GPU filter
on the Cluster model (e.g., add a Cluster.GPUFilter/SelectedGPUVendor field)
when the user selects it, then update Manager.ResolveDependencies (and any
callers, e.g., refresh_status_preprocessor and inventory flows) to accept and
forward the GPUFilter; modify openshift_ai.GetDependencies signature to accept
the GPUFilter and have its logic (ClusterHasGPU checks) respect the passed
filter rather than only physical GPU presence; locate helper
shouldSkipOperatorForGPUFilter and lookupBundle to ensure consistent operator
filtering semantics across GetBundleWithGPUFilter and dependency expansion.
In `@restapi/embedded_spec.go`:
- Around line 5844-5855: Update the parameter descriptions for nvidia_enabled
and amd_enabled to explicitly document combined and non-applicable behaviors:
state that both flags apply only to the openshift-ai bundle, that when the
request targets a non-openshift-ai bundle the flags are ignored, and that if
both nvidia_enabled and amd_enabled are false no GPUs will be enabled for
openshift-ai (resulting in CPU-only deployment or an error as applicable). Make
identical clarifying edits for all occurrences of these parameters (the blocks
that define "nvidia_enabled" and "amd_enabled" in embedded_spec.go) so the
public API contract is unambiguous to clients.
In `@restapi/operations/operators/v2_get_bundle_parameters.go`:
- Around line 106-128: The current bindAmdEnabled function treats an explicit
empty query value as omitted; change it so when raw == "" it returns a
validation error instead of nil (respecting AllowEmptyValue: false).
Specifically, in bindAmdEnabled, replace the early-return-on-empty with
returning errors.InvalidType("amd_enabled", "query", "bool", raw) (or another
appropriate swagger validation error) so an explicit ?amd_enabled= fails
validation; apply the same change to the corresponding GPU flag binder (e.g.,
bindNvidiaEnabled / the binder covering lines 170-192) so all explicit empty GPU
flags are rejected.
In `@restapi/operations/operators/v2_list_bundles_parameters.go`:
- Around line 136-158: The binder currently treats an empty boolean query like
omission, so functions bindAmdEnabled and bindNvidiaEnabled should reject an
explicit empty value; update each binder to check hasKey and if hasKey && raw ==
"" return a validation error (e.g. errors.InvalidType for the parameter name and
"query" location) instead of silently returning nil, and also propagate this
change back to the swagger/template that generates these binders so future
generations include the same guard.
---
Nitpick comments:
In `@internal/operators/manager.go`:
- Around line 743-790: Refactor the legacy ListBundles and GetBundle
implementations to delegate to the GPU-aware paths by calling
ListBundlesWithGPUFilter(..., nil) and GetBundleWithGPUFilter(..., nil)
respectively so that bundle construction is unified and GPU filtering defaults
(nil -> no filter / both vendors) cannot drift; update references to
operatorscommon.Bundles usage only in
ListBundlesWithGPUFilter/GetBundleWithGPUFilter, remove duplicated operator
iteration logic from the legacy methods, and ensure function signatures remain
compatible (use the existing Manager.ListBundles, Manager.GetBundle names to
call into Manager.ListBundlesWithGPUFilter and Manager.GetBundleWithGPUFilter
with a nil GPUFilter).
In `@swagger.yaml`:
- Around line 2785-2794: Extract the duplicated query parameter blocks for
nvidia_enabled and amd_enabled into reusable top-level Swagger parameter
definitions (e.g., components.parameters.NvidiaEnabled and
components.parameters.AmdEnabled) and replace each inline occurrence (the two
identical blocks at/around the current query params) with $ref references to
those component parameters; ensure the parameter names, descriptions, types
(boolean) and defaults are preserved in the shared definitions and update both
endpoints (the blocks at lines around 2785–2794 and 2833–2842) to reference the
new shared parameters so they stay in sync.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: f7d19782-3e24-4fec-9192-4f63267b8190
📒 Files selected for processing (11)
client/operators/v2_get_bundle_parameters.goclient/operators/v2_list_bundles_parameters.gointernal/operators/handler/handler_v2.gointernal/operators/manager.gointernal/operators/mock_operators_api.gorestapi/embedded_spec.gorestapi/operations/operators/v2_get_bundle_parameters.gorestapi/operations/operators/v2_get_bundle_urlbuilder.gorestapi/operations/operators/v2_list_bundles_parameters.gorestapi/operations/operators/v2_list_bundles_urlbuilder.goswagger.yaml
| func (mgr *Manager) GetBundleWithGPUFilter(bundleID string, featureIDs []models.FeatureSupportLevelID, gpuFilter *GPUFilter) (*models.Bundle, error) { | ||
| bundle, ok := mgr.lookupBundle(bundleID) | ||
| if !ok { | ||
| return nil, fmt.Errorf("bundle '%s' is not supported", bundleID) | ||
| } | ||
|
|
||
| // Get all operators for the bundle based on feature IDs | ||
| for _, operator := range mgr.olmOperators { | ||
| operatorBundles := operator.GetBundleLabels(featureIDs) | ||
| for _, operatorBundle := range operatorBundles { | ||
| if operatorBundle == bundleID { | ||
| operatorName := operator.GetName() | ||
|
|
||
| // Apply GPU filtering | ||
| if mgr.shouldSkipOperatorForGPUFilter(operatorName, gpuFilter) { | ||
| continue | ||
| } | ||
|
|
||
| bundle.Operators = append(bundle.Operators, operatorName) | ||
| break | ||
| } | ||
| } | ||
| } | ||
|
|
||
| return bundle, nil |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "== ResolveDependencies special-case =="
rg -n -C4 'operatorscommon\.HasOperator\(ret, openshiftai\.Operator\.Name\)|nvidiagpu\.Operator\.Name|amdgpu\.Operator\.Name' internal/operators/manager.go
echo
echo "== OpenShift AI dependency implementation =="
fd . internal/operators/openshiftai -e go -t f | while read -r f; do
echo "--- $f ---"
rg -n -C4 '^\s*func .*GetDependencies\s*\(|^\s*func .*GetBundleLabels\s*\(|nvidia-gpu|amd-gpu|nvidiagpu\.Operator\.Name|amdgpu\.Operator\.Name' "$f"
done
echo
echo "== GPU filter usage and dependency expansion call sites =="
rg -n -C3 --type=go '\bGPUFilter\b|\bNvidiaEnabled\b|\bAmdEnabled\b|\bResolveDependencies\s*\(' internalRepository: openshift/assisted-service
Length of output: 630
🏁 Script executed:
# Find ResolveDependencies signature and call sites
rg -n "func.*ResolveDependencies\|ResolveDependencies\(" internal/operators/manager.go -A 2 -B 1 | head -50Repository: openshift/assisted-service
Length of output: 52
🏁 Script executed:
# Check openshiftai GetDependencies implementation
cat internal/operators/openshiftai/operator.go | grep -n "GetDependencies\|GetBundleLabels\|func" | head -30Repository: openshift/assisted-service
Length of output: 144
🏁 Script executed:
# Look for GPU filter usage in the codebase
rg -n "GetBundleWithGPUFilter" internal --type=go -B 2 -A 2Repository: openshift/assisted-service
Length of output: 3737
🏁 Script executed:
# List openshiftai directory
ls -la internal/operators/openshiftai/Repository: openshift/assisted-service
Length of output: 926
🏁 Script executed:
# Find ResolveDependencies in manager.go
grep -n "ResolveDependencies" internal/operators/manager.goRepository: openshift/assisted-service
Length of output: 448
🏁 Script executed:
# Check handler_v2.go to see the full context of GetBundleWithGPUFilter usage
cat -n internal/operators/handler/handler_v2.go | sed -n '160,180p'Repository: openshift/assisted-service
Length of output: 801
🏁 Script executed:
# Search for where GetBundleWithGPUFilter is actually used and what happens with the result
rg -n "GetBundleWithGPUFilter" . --type=go -A 5Repository: openshift/assisted-service
Length of output: 4092
🏁 Script executed:
# Get ResolveDependencies implementation
cat -n internal/operators/manager.go | sed -n '493,570p'Repository: openshift/assisted-service
Length of output: 3016
🏁 Script executed:
# Check openshift_ai_operator.go for GetDependencies implementation
cat internal/operators/openshiftai/openshift_ai_operator.go | grep -n "GetDependencies\|GetBundleLabels\|NvidiaEnabled\|AmdEnabled" -A 10 -B 2Repository: openshift/assisted-service
Length of output: 1733
🏁 Script executed:
# Check if GPU filter is stored/threaded in cluster or elsewhere
rg -n "NvidiaEnabled\|AmdEnabled\|GPUFilter" internal/operators --type=go | grep -E "Cluster|cluster" | head -20Repository: openshift/assisted-service
Length of output: 52
🏁 Script executed:
# Find where ResolveDependencies is called
rg -n "\.ResolveDependencies\(" . --type=go -B 3 -A 3Repository: openshift/assisted-service
Length of output: 15112
🏁 Script executed:
# Check if GPU filter is stored in Cluster model
rg -n "NvidiaEnabled\|AmdEnabled" internal/common --type=go -B 2 -A 2Repository: openshift/assisted-service
Length of output: 52
🏁 Script executed:
# Verify openshift_ai GetDependencies doesn't check any GPU-related fields on cluster
cat internal/operators/openshiftai/openshift_ai_operator.go | sed -n '78,111p'Repository: openshift/assisted-service
Length of output: 821
GPU filter selection is not persisted and will be overridden by ResolveDependencies.
GetBundleWithGPUFilter correctly filters the bundle response, but the GPU filter is never passed to ResolveDependencies, which is called later in the provisioning flow (refresh_status_preprocessor.go:155, inventory.go:3396).
When ResolveDependencies calls openshift_ai.GetDependencies(), the OpenShift AI operator has no knowledge of the user's GPU filter selection. Instead, it returns GPU operators based solely on whether the cluster physically has those GPUs (ClusterHasGPU()). If a cluster has both NVIDIA and AMD GPUs, both will be added as dependencies regardless of which vendor the user selected via the GPU filter.
The workaround at lines 533–539 (marking nvidia-gpu/amd-gpu as DependencyOnly) does not prevent them from being returned. The GPU filter selection must be persisted in the Cluster model and threaded through to ResolveDependencies and openshift_ai.GetDependencies for the filtering to survive dependency expansion.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@internal/operators/manager.go` around lines 765 - 789, GetBundleWithGPUFilter
currently filters operators transiently but never persists the GPUFilter, so
ResolveDependencies and openshift_ai.GetDependencies lose the user's selection;
persist the GPU filter on the Cluster model (e.g., add a
Cluster.GPUFilter/SelectedGPUVendor field) when the user selects it, then update
Manager.ResolveDependencies (and any callers, e.g., refresh_status_preprocessor
and inventory flows) to accept and forward the GPUFilter; modify
openshift_ai.GetDependencies signature to accept the GPUFilter and have its
logic (ClusterHasGPU checks) respect the passed filter rather than only physical
GPU presence; locate helper shouldSkipOperatorForGPUFilter and lookupBundle to
ensure consistent operator filtering semantics across GetBundleWithGPUFilter and
dependency expansion.
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: shay23bra The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
d46660d to
55abe77
Compare
|
@shay23bra: This pull request references MGMT-23234 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the sub-task to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
There was a problem hiding this comment.
♻️ Duplicate comments (4)
restapi/operations/operators/v2_get_bundle_parameters.go (1)
106-128:⚠️ Potential issue | 🟡 MinorReject explicit empty GPU flags here too.
Line 116 and Line 180 still return early on
raw == "", so?amd_enabled=/?nvidia_enabled=behave like omitted params and keep the defaulttrue. That hides client bugs and does not matchAllowEmptyValue: false. Because this file is generated, the durable fix belongs in the swagger source/template too.Suggested fix
func (o *V2GetBundleParams) bindAmdEnabled(rawData []string, hasKey bool, formats strfmt.Registry) error { var raw string if len(rawData) > 0 { raw = rawData[len(rawData)-1] } // Required: false // AllowEmptyValue: false - if raw == "" { // empty values pass all other validations - // Default values have been previously initialized by NewV2GetBundleParams() + if !hasKey { + // Default values have been previously initialized by NewV2GetBundleParams() return nil } + if raw == "" { + return errors.InvalidType("amd_enabled", "query", "bool", raw) + } value, err := swag.ConvertBool(raw) if err != nil { return errors.InvalidType("amd_enabled", "query", "bool", raw) } o.AmdEnabled = &value return nil } func (o *V2GetBundleParams) bindNvidiaEnabled(rawData []string, hasKey bool, formats strfmt.Registry) error { var raw string if len(rawData) > 0 { raw = rawData[len(rawData)-1] } // Required: false // AllowEmptyValue: false - if raw == "" { // empty values pass all other validations - // Default values have been previously initialized by NewV2GetBundleParams() + if !hasKey { + // Default values have been previously initialized by NewV2GetBundleParams() return nil } + if raw == "" { + return errors.InvalidType("nvidia_enabled", "query", "bool", raw) + } value, err := swag.ConvertBool(raw) if err != nil { return errors.InvalidType("nvidia_enabled", "query", "bool", raw) } o.NvidiaEnabled = &value return nil }Also applies to: 170-192
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@restapi/operations/operators/v2_get_bundle_parameters.go` around lines 106 - 128, The bind functions currently treat an explicit empty query value (raw == "") as if the parameter was omitted, preserving defaults; update bindAmdEnabled and the corresponding bindNvidiaEnabled to reject explicit empty values because AllowEmptyValue is false: instead of returning nil when raw == "", return a validation error (e.g., errors.InvalidType("amd_enabled", "query", "bool", raw) and errors.InvalidType("nvidia_enabled", "query", "bool", raw") or an appropriate errors.Required/InvalidType) so that requests like ?amd_enabled= or ?nvidia_enabled= fail validation rather than silently keeping defaults.restapi/operations/operators/v2_list_bundles_parameters.go (1)
136-158:⚠️ Potential issue | 🟡 MinorReject explicit empty GPU flags instead of falling back to the default.
Line 146 and Line 247 still treat
?amd_enabled=/?nvidia_enabled=as omission, so malformed requests silently keep the defaulttrue. That contradictsAllowEmptyValue: falseand can return a broader result set than the caller asked for. Since this file is generated, please fix the swagger source/template as well.Suggested fix
func (o *V2ListBundlesParams) bindAmdEnabled(rawData []string, hasKey bool, formats strfmt.Registry) error { var raw string if len(rawData) > 0 { raw = rawData[len(rawData)-1] } - if raw == "" { // empty values pass all other validations - // Default values have been previously initialized by NewV2ListBundlesParams() + if !hasKey { + // Default values have been previously initialized by NewV2ListBundlesParams() return nil } + if raw == "" { + return errors.InvalidType("amd_enabled", "query", "bool", raw) + } value, err := swag.ConvertBool(raw) if err != nil { return errors.InvalidType("amd_enabled", "query", "bool", raw) } o.AmdEnabled = &value return nil } func (o *V2ListBundlesParams) bindNvidiaEnabled(rawData []string, hasKey bool, formats strfmt.Registry) error { var raw string if len(rawData) > 0 { raw = rawData[len(rawData)-1] } - if raw == "" { // empty values pass all other validations - // Default values have been previously initialized by NewV2ListBundlesParams() + if !hasKey { + // Default values have been previously initialized by NewV2ListBundlesParams() return nil } + if raw == "" { + return errors.InvalidType("nvidia_enabled", "query", "bool", raw) + } value, err := swag.ConvertBool(raw) if err != nil { return errors.InvalidType("nvidia_enabled", "query", "bool", raw) } o.NvidiaEnabled = &value return nil }Also applies to: 237-259
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@restapi/operations/operators/v2_list_bundles_parameters.go` around lines 136 - 158, The bindAmdEnabled (and similarly bindNvidiaEnabled) currently treats an explicit empty query value (e.g. ?amd_enabled=) as omission and returns nil; instead, when hasKey is true and raw == "" you must reject the request because AllowEmptyValue is false. Modify bindAmdEnabled to check if hasKey && raw == "" and return an appropriate validation error (use the same error pattern as other validation failures, e.g. errors.InvalidType/InvalidFormat for "amd_enabled" in "query") so the empty value is not silently treated as the default; apply the same change to bindNvidiaEnabled.restapi/embedded_spec.go (1)
5844-5855:⚠️ Potential issue | 🟡 MinorDocument the combined behavior of
nvidia_enabledandamd_enabled.These descriptions still define each flag separately, but the API contract is still ambiguous for
nvidia_enabled=false+amd_enabled=falseand for requests that target non-openshift-aibundles. Please clarify those cases in all four blocks so clients do not have to infer the behavior.✍️ Suggested wording
- "description": "Enable NVIDIA GPU support for OpenShift AI bundle. Only applies to openshift-ai bundle.", + "description": "Whether to include NVIDIA-specific content in the OpenShift AI bundle. Applies only to the openshift-ai bundle; for non-openshift-ai bundles this flag is ignored. Can be combined independently with amd_enabled. When both nvidia_enabled and amd_enabled are false, the bundle is returned without GPU-specific content.", ... - "description": "Enable AMD GPU support for OpenShift AI bundle. Only applies to openshift-ai bundle.", + "description": "Whether to include AMD-specific content in the OpenShift AI bundle. Applies only to the openshift-ai bundle; for non-openshift-ai bundles this flag is ignored. Can be combined independently with nvidia_enabled. When both nvidia_enabled and amd_enabled are false, the bundle is returned without GPU-specific content.",Also applies to: 5913-5924, 17340-17351, 17409-17420
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@restapi/embedded_spec.go` around lines 5844 - 5855, The parameter docs for nvidia_enabled and amd_enabled are ambiguous about combined and non-openshift-ai behavior; update the descriptions for the nvidia_enabled and amd_enabled query parameters (the parameter entries named "nvidia_enabled" and "amd_enabled") in all four blocks so they explicitly state: (1) what happens when both flags are false (no GPU support will be enabled/requested), (2) that these flags only apply to requests targeting the openshift-ai bundle and will be ignored for other bundles (and should note whether ignored means treated as false or no-op), and (3) the default behavior when omitted; make the same exact clarification in each occurrence referenced (the blocks around the existing "nvidia_enabled" and "amd_enabled" entries at the four locations) so clients can rely on a consistent API contract.internal/operators/manager.go (1)
765-789:⚠️ Potential issue | 🟠 MajorGPU filtering stops at bundle discovery.
This only removes GPU operators from the returned bundle.
ResolveDependencies()/getDependencies()still expand OpenShift AI dependencies fromGetDependencies(cluster)with no GPU-filter input, so a later save/refresh can reintroduce the vendor the user explicitly disabled. The selection needs to be persisted or threaded into dependency resolution too.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/operators/manager.go` around lines 765 - 789, GetBundleWithGPUFilter only filters operators in the returned Bundle but doesn't prevent dependency resolution from reintroducing GPU operators; to fix, thread the GPUFilter into the dependency-resolution path: add a gpuFilter parameter to ResolveDependencies and its internal getDependencies/GetDependencies(cluster) call chain (and update callers), then apply mgr.shouldSkipOperatorForGPUFilter(operatorName, gpuFilter) when expanding/adding dependencies so GPU-disabled operators are never added during dependency resolution; update function signatures (ResolveDependencies, getDependencies, GetDependencies) and any call sites accordingly to preserve the user's GPU selection.
🧹 Nitpick comments (2)
internal/operators/manager.go (1)
792-815: Use the operator constants instead of raw names.Lines 801-808 hard-code
nvidia-gpuandamd-gpueven though this file already importsnvidiagpuandamdgpu. Reusingnvidiagpu.Operator.Name/amdgpu.Operator.Namekeeps this filter aligned with the rest of the operator plumbing if those identifiers ever change.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/operators/manager.go` around lines 792 - 815, The function shouldSkipOperatorForGPUFilter currently compares operatorName to hard-coded strings "nvidia-gpu" and "amd-gpu"; replace those literals with the operator name constants by using nvidiagpu.Operator.Name and amdgpu.Operator.Name respectively in the switch/case comparisons inside shouldSkipOperatorForGPUFilter so the filter uses the canonical operator identifiers and stays in sync with the imported operator definitions.internal/operators/handler/handler_v2.go (1)
127-142: Extract the GPU-filter builder/dispatch branch.Both handlers duplicate the same nil-check and
operators.GPUFilterconstruction. A shared helper would keep the default-handling semantics in one place and reduce drift between list/get behavior.Also applies to: 157-173
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/operators/handler/handler_v2.go` around lines 127 - 142, Duplicate GPU filter construction and nil-check should be extracted into a single helper to avoid drift: create a function (e.g., buildGPUFilter or getGPUFilter) that accepts the params (accessing params.NvidiaEnabled and params.AmdEnabled) and returns *operators.GPUFilter or nil; replace the inline construction and nil-check in both locations that call h.operatorsAPI.ListBundlesWithGPUFilter and h.operatorsAPI.ListBundles so callers simply call the helper and then choose between ListBundlesWithGPUFilter(filter) and ListBundles(filter) based on whether the helper returned nil.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@internal/operators/manager.go`:
- Around line 765-789: GetBundleWithGPUFilter only filters operators in the
returned Bundle but doesn't prevent dependency resolution from reintroducing GPU
operators; to fix, thread the GPUFilter into the dependency-resolution path: add
a gpuFilter parameter to ResolveDependencies and its internal
getDependencies/GetDependencies(cluster) call chain (and update callers), then
apply mgr.shouldSkipOperatorForGPUFilter(operatorName, gpuFilter) when
expanding/adding dependencies so GPU-disabled operators are never added during
dependency resolution; update function signatures (ResolveDependencies,
getDependencies, GetDependencies) and any call sites accordingly to preserve the
user's GPU selection.
In `@restapi/embedded_spec.go`:
- Around line 5844-5855: The parameter docs for nvidia_enabled and amd_enabled
are ambiguous about combined and non-openshift-ai behavior; update the
descriptions for the nvidia_enabled and amd_enabled query parameters (the
parameter entries named "nvidia_enabled" and "amd_enabled") in all four blocks
so they explicitly state: (1) what happens when both flags are false (no GPU
support will be enabled/requested), (2) that these flags only apply to requests
targeting the openshift-ai bundle and will be ignored for other bundles (and
should note whether ignored means treated as false or no-op), and (3) the
default behavior when omitted; make the same exact clarification in each
occurrence referenced (the blocks around the existing "nvidia_enabled" and
"amd_enabled" entries at the four locations) so clients can rely on a consistent
API contract.
In `@restapi/operations/operators/v2_get_bundle_parameters.go`:
- Around line 106-128: The bind functions currently treat an explicit empty
query value (raw == "") as if the parameter was omitted, preserving defaults;
update bindAmdEnabled and the corresponding bindNvidiaEnabled to reject explicit
empty values because AllowEmptyValue is false: instead of returning nil when raw
== "", return a validation error (e.g., errors.InvalidType("amd_enabled",
"query", "bool", raw) and errors.InvalidType("nvidia_enabled", "query", "bool",
raw") or an appropriate errors.Required/InvalidType) so that requests like
?amd_enabled= or ?nvidia_enabled= fail validation rather than silently keeping
defaults.
In `@restapi/operations/operators/v2_list_bundles_parameters.go`:
- Around line 136-158: The bindAmdEnabled (and similarly bindNvidiaEnabled)
currently treats an explicit empty query value (e.g. ?amd_enabled=) as omission
and returns nil; instead, when hasKey is true and raw == "" you must reject the
request because AllowEmptyValue is false. Modify bindAmdEnabled to check if
hasKey && raw == "" and return an appropriate validation error (use the same
error pattern as other validation failures, e.g.
errors.InvalidType/InvalidFormat for "amd_enabled" in "query") so the empty
value is not silently treated as the default; apply the same change to
bindNvidiaEnabled.
---
Nitpick comments:
In `@internal/operators/handler/handler_v2.go`:
- Around line 127-142: Duplicate GPU filter construction and nil-check should be
extracted into a single helper to avoid drift: create a function (e.g.,
buildGPUFilter or getGPUFilter) that accepts the params (accessing
params.NvidiaEnabled and params.AmdEnabled) and returns *operators.GPUFilter or
nil; replace the inline construction and nil-check in both locations that call
h.operatorsAPI.ListBundlesWithGPUFilter and h.operatorsAPI.ListBundles so
callers simply call the helper and then choose between
ListBundlesWithGPUFilter(filter) and ListBundles(filter) based on whether the
helper returned nil.
In `@internal/operators/manager.go`:
- Around line 792-815: The function shouldSkipOperatorForGPUFilter currently
compares operatorName to hard-coded strings "nvidia-gpu" and "amd-gpu"; replace
those literals with the operator name constants by using nvidiagpu.Operator.Name
and amdgpu.Operator.Name respectively in the switch/case comparisons inside
shouldSkipOperatorForGPUFilter so the filter uses the canonical operator
identifiers and stays in sync with the imported operator definitions.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 3cf7327c-fab8-4ad0-9715-1654ad98986d
⛔ Files ignored due to path filters (2)
vendor/github.com/openshift/assisted-service/client/operators/v2_get_bundle_parameters.gois excluded by!**/vendor/**,!vendor/**vendor/github.com/openshift/assisted-service/client/operators/v2_list_bundles_parameters.gois excluded by!**/vendor/**,!vendor/**
📒 Files selected for processing (11)
client/operators/v2_get_bundle_parameters.goclient/operators/v2_list_bundles_parameters.gointernal/operators/handler/handler_v2.gointernal/operators/manager.gointernal/operators/mock_operators_api.gorestapi/embedded_spec.gorestapi/operations/operators/v2_get_bundle_parameters.gorestapi/operations/operators/v2_get_bundle_urlbuilder.gorestapi/operations/operators/v2_list_bundles_parameters.gorestapi/operations/operators/v2_list_bundles_urlbuilder.goswagger.yaml
🚧 Files skipped from review as they are similar to previous changes (2)
- restapi/operations/operators/v2_list_bundles_urlbuilder.go
- swagger.yaml
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #9993 +/- ##
==========================================
- Coverage 44.08% 43.98% -0.11%
==========================================
Files 414 415 +1
Lines 72248 72438 +190
==========================================
+ Hits 31852 31862 +10
- Misses 37517 37689 +172
- Partials 2879 2887 +8
🚀 New features to boost your workflow:
|
|
@shay23bra: This pull request references MGMT-23234 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the sub-task to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
internal/operators/manager.go (2)
834-844: Use constants for GPU operator names.Same as above - replace hardcoded strings with constants.
♻️ Proposed fix
switch operatorName { - case "nvidia-gpu": + case nvidiagpu.Operator.Name: // Skip nvidia-gpu if nvidia is explicitly disabled if gpuFilter.NvidiaEnabled != nil && !*gpuFilter.NvidiaEnabled { return true } - case "amd-gpu": + case amdgpu.Operator.Name: // Skip amd-gpu if amd is explicitly disabled if gpuFilter.AmdEnabled != nil && !*gpuFilter.AmdEnabled { return true } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/operators/manager.go` around lines 834 - 844, Replace the hardcoded operator name strings in the switch (the cases matching "nvidia-gpu" and "amd-gpu") with package-level string constants (e.g., OperatorNvidiaGPU, OperatorAmdGPU) and use those constants in the switch on operatorName; update any other references to these literal names (such as the gpuFilter checks referencing gpuFilter.NvidiaEnabled and gpuFilter.AmdEnabled in the same file) to use the constants so the operator names are centralized and avoid duplication.
784-785: Use constants for GPU operator names.Replace hardcoded strings with the existing constants for consistency and maintainability.
♻️ Proposed fix
- if operatorName == "nvidia-gpu" || operatorName == "amd-gpu" { + if operatorName == nvidiagpu.Operator.Name || operatorName == amdgpu.Operator.Name {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/operators/manager.go` around lines 784 - 785, Replace the hardcoded GPU operator name strings in the comparison with the project constants: instead of comparing operatorName to "nvidia-gpu" and "amd-gpu", use the existing GPU name constants (e.g., NvidiaGPUOperator and AMDGPUOperator) so filteredGPUOperators is appended using the constants; update the comparisons around operatorName and the append to use those constant symbols to ensure consistency with other code.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@internal/operators/manager.go`:
- Around line 796-820: The current removal collects deps from
filteredGPUOperators via mgr.olmOperators[...].GetDependencies and
unconditionally deletes any matching name from includedOperators; instead, for
each candidate dependency (from dependenciesToRemove) check all remaining
included operators (iterate includedOperators) and call GetDependencies on their
olmOperators entries to see if any still depend on that candidate; only remove
the dependency if no remaining included operator lists it. Update the logic
around filteredGPUOperators/gpuFilter and the dependenciesToRemove pass to
perform this per-dependency existence check before filtering includedOperators.
---
Nitpick comments:
In `@internal/operators/manager.go`:
- Around line 834-844: Replace the hardcoded operator name strings in the switch
(the cases matching "nvidia-gpu" and "amd-gpu") with package-level string
constants (e.g., OperatorNvidiaGPU, OperatorAmdGPU) and use those constants in
the switch on operatorName; update any other references to these literal names
(such as the gpuFilter checks referencing gpuFilter.NvidiaEnabled and
gpuFilter.AmdEnabled in the same file) to use the constants so the operator
names are centralized and avoid duplication.
- Around line 784-785: Replace the hardcoded GPU operator name strings in the
comparison with the project constants: instead of comparing operatorName to
"nvidia-gpu" and "amd-gpu", use the existing GPU name constants (e.g.,
NvidiaGPUOperator and AMDGPUOperator) so filteredGPUOperators is appended using
the constants; update the comparisons around operatorName and the append to use
those constant symbols to ensure consistency with other code.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: cacee46a-fd8c-4725-b8e5-ec9eda4782ef
📒 Files selected for processing (1)
internal/operators/manager.go
|
@shay23bra: This pull request references MGMT-23234 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the sub-task to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@shay23bra: This pull request references MGMT-23234 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the sub-task to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
internal/operators/manager.go (1)
595-617:⚠️ Potential issue | 🟠 MajorKeep shared dependencies if a retained operator still requires them.
This pruning step removes every dependency of the filtered GPU operator, even when another operator left in
filteredstill depends on it. That can drop shared deps likenodefeaturediscovery/kmmfrom the resolved install set.Suggested fix
if len(filteredGPUOperators) > 0 { dependenciesToRemove := make(map[string]bool) + dependenciesToKeep := make(map[string]bool) for _, filteredGPU := range filteredGPUOperators { if gpuOperator, exists := mgr.olmOperators[filteredGPU]; exists { deps, err := gpuOperator.GetDependencies(&common.Cluster{}) if err == nil { for _, dep := range deps { dependenciesToRemove[dep] = true mgr.log.Infof("Filtering out dependency %s of GPU operator %s", dep, filteredGPU) } } } } + for _, operator := range filtered { + if olmOperator, exists := mgr.olmOperators[operator.Name]; exists { + deps, err := olmOperator.GetDependencies(&common.Cluster{}) + if err == nil { + for _, dep := range deps { + dependenciesToKeep[dep] = true + } + } + } + } + // Apply dependency filtering finalFiltered := make([]*models.MonitoredOperator, 0, len(filtered)) for _, operator := range filtered { - if !dependenciesToRemove[operator.Name] { + if !dependenciesToRemove[operator.Name] || dependenciesToKeep[operator.Name] { finalFiltered = append(finalFiltered, operator) } } filtered = finalFiltered }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/operators/manager.go` around lines 595 - 617, The current logic builds dependenciesToRemove from filteredGPUOperators and removes them unconditionally, which drops shared deps still needed by retained operators; change it to only remove a dependency when no retained operator in filtered depends on it by: after computing dependenciesToRemove from filteredGPUOperators, compute a requiredDeps map by iterating the remaining operators in filtered (models.MonitoredOperator) and, for each, look up mgr.olmOperators[operator.Name] and call GetDependencies to mark deps that must be kept; then filter out only those dependencies that are not present in requiredDeps before rebuilding filtered (finalFiltered). Ensure you handle GetDependencies errors the same way as existing code and reference filteredGPUOperators, mgr.olmOperators, GetDependencies, dependenciesToRemove, filtered, and models.MonitoredOperator when locating places to change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@internal/operators/manager.go`:
- Around line 542-578: The parsed GPU filter currently uses plain bools so any
missing gpu_filter keys default to false; change
GPUFilterProperties.GPUFilter.NvidiaEnabled and AmdEnabled to *bool so omissions
are distinguishable, then in applyGPUFilterFromProperties only set the resulting
GPUFilter's NvidiaEnabled/AmdEnabled pointers when the parsed *bool is non-nil
(i.e., assign &props.GPUFilter.NvidiaEnabled value only if
props.GPUFilter.NvidiaEnabled != nil), leaving gpuFilter nil or its fields nil
when omitted so downstream ResolveDependencies/ensureAllGPUOperators preserve
default-enabled behavior; update types and the conversion logic in
applyGPUFilterFromProperties accordingly (refer to GPUFilterProperties and
applyGPUFilterFromProperties).
---
Duplicate comments:
In `@internal/operators/manager.go`:
- Around line 595-617: The current logic builds dependenciesToRemove from
filteredGPUOperators and removes them unconditionally, which drops shared deps
still needed by retained operators; change it to only remove a dependency when
no retained operator in filtered depends on it by: after computing
dependenciesToRemove from filteredGPUOperators, compute a requiredDeps map by
iterating the remaining operators in filtered (models.MonitoredOperator) and,
for each, look up mgr.olmOperators[operator.Name] and call GetDependencies to
mark deps that must be kept; then filter out only those dependencies that are
not present in requiredDeps before rebuilding filtered (finalFiltered). Ensure
you handle GetDependencies errors the same way as existing code and reference
filteredGPUOperators, mgr.olmOperators, GetDependencies, dependenciesToRemove,
filtered, and models.MonitoredOperator when locating places to change.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: db36f77d-82f2-489f-b877-d6db294f3ce6
📒 Files selected for processing (2)
docs/user-guide/additional-operator-notes.mdinternal/operators/manager.go
|
@shay23bra: This pull request references MGMT-23234 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the sub-task to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (2)
internal/operators/manager.go (2)
914-938:⚠️ Potential issue | 🔴 CriticalSame shared dependency removal bug exists here.
This dependency removal logic has the same issue flagged above in
applyGPUFilterFromProperties(Lines 598-621). Dependencies are removed without checking if other included operators still need them. Apply the same fix pattern: collect dependencies of remaining included operators and only remove a dependency if no remaining operator depends on it.,
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/operators/manager.go` around lines 914 - 938, The code collects dependencies of filtered GPU operators into dependenciesToRemove and unconditionally removes them from includedOperators, which breaks shared dependencies; update the logic in the block that references filteredGPUOperators, gpuFilter, mgr.olmOperators and GetDependencies so you first compute the set of dependencies still required by the remaining includedOperators (iterate over includedOperators after removing filteredGPUOperators, call GetDependencies on each remaining operator and aggregate neededDeps), then only remove a dependency from includedOperators if it is not present in neededDeps; replace the current dependenciesToRemove-only approach with this two-step: build neededDeps from remaining operators, then filter includedOperators by excluding ops that are both in the filtered set and not required by neededDeps.
598-621:⚠️ Potential issue | 🔴 CriticalDependency removal logic incorrectly removes shared dependencies between operators.
The code removes an operator from the result if its name appears in
dependenciesToRemove, without checking if other included operators still depend on it.Example scenario:
- OpenStack Controller (OSC) and Node Feature Discovery (NFD) are enabled
- User disables NVIDIA GPU operator (which depends on NFD)
- Code marks NFD as a dependency to remove (from NVIDIA)
- NFD gets filtered out, breaking OSC which still needs it
The fix should only remove a dependency if no included operator depends on it.
,
🐛 Proposed fix to preserve shared dependencies
// Remove dependencies of filtered GPU operators if len(filteredGPUOperators) > 0 { dependenciesToRemove := make(map[string]bool) + dependenciesToKeep := make(map[string]bool) + for _, filteredGPU := range filteredGPUOperators { if gpuOperator, exists := mgr.olmOperators[filteredGPU]; exists { deps, err := gpuOperator.GetDependencies(&common.Cluster{}) if err == nil { for _, dep := range deps { dependenciesToRemove[dep] = true - mgr.log.Infof("Filtering out dependency %s of GPU operator %s", dep, filteredGPU) } } } } + // Collect dependencies of remaining included operators + for _, op := range filtered { + if operator, exists := mgr.olmOperators[op.Name]; exists { + deps, err := operator.GetDependencies(&common.Cluster{}) + if err == nil { + for _, dep := range deps { + dependenciesToKeep[dep] = true + } + } + } + } + // Apply dependency filtering finalFiltered := make([]*models.MonitoredOperator, 0, len(filtered)) for _, operator := range filtered { - if !dependenciesToRemove[operator.Name] { + if !dependenciesToRemove[operator.Name] || dependenciesToKeep[operator.Name] { finalFiltered = append(finalFiltered, operator) + } else { + mgr.log.Infof("Filtering out dependency %s of removed GPU operator", operator.Name) } } filtered = finalFiltered }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/operators/manager.go` around lines 598 - 621, The dependency-removal collects dependenciesToRemove from filteredGPUOperators but then unconditionally filters out any operator whose name is in dependenciesToRemove, which incorrectly removes shared dependencies still required by other included operators; change the logic to compute dependency reference counts among the operators that will remain (use the same filtered slice of *models.MonitoredOperator), by calling GetDependencies on mgr.olmOperators for each operator in filtered to build a map[string]int of how many included operators depend on each dependency, then remove an operator only if dependenciesToRemove[name] is true AND the dependency count for that name is zero; update the filtering step that builds finalFiltered to consult that reference-count map instead of the boolean dependenciesToRemove map so shared dependencies are preserved.
🧹 Nitpick comments (2)
internal/operators/manager.go (2)
952-962: Use constants instead of string literals for operator names.Same issue as above - use
nvidiagpu.Operator.Nameandamdgpu.Operator.Nameconstants instead of string literals for consistency and maintainability.♻️ Proposed fix
switch operatorName { - case "nvidia-gpu": + case nvidiagpu.Operator.Name: // Skip nvidia-gpu if nvidia is explicitly disabled if gpuFilter.NvidiaEnabled != nil && !*gpuFilter.NvidiaEnabled { return true } - case "amd-gpu": + case amdgpu.Operator.Name: // Skip amd-gpu if amd is explicitly disabled if gpuFilter.AmdEnabled != nil && !*gpuFilter.AmdEnabled { return true } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/operators/manager.go` around lines 952 - 962, Replace the string literals "nvidia-gpu" and "amd-gpu" in the operatorName switch with the canonical constants exported by the GPU operator packages (use nvidiagpu.Operator.Name and amdgpu.Operator.Name) so the switch matches the single source-of-truth names; update the two case labels in the switch inside manager.go (the block that checks gpuFilter.NvidiaEnabled and gpuFilter.AmdEnabled) to use those constants instead of raw strings.
902-903: Use constants instead of string literals for operator names.Using string literals
"nvidia-gpu"and"amd-gpu"is inconsistent with other parts of the code that usenvidiagpu.Operator.Nameandamdgpu.Operator.Name. If these constants change, this code would silently break.♻️ Proposed fix
- if operatorName == "nvidia-gpu" || operatorName == "amd-gpu" { + if operatorName == nvidiagpu.Operator.Name || operatorName == amdgpu.Operator.Name {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/operators/manager.go` around lines 902 - 903, Replace the string literals "nvidia-gpu" and "amd-gpu" with the corresponding operator name constants to avoid fragile comparisons: update the conditional that checks operatorName (where filteredGPUOperators is appended) to compare against nvidiagpu.Operator.Name and amdgpu.Operator.Name instead of the hard-coded strings so future renames stay consistent.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@internal/operators/manager.go`:
- Around line 626-643: The function ensureAllGPUOperators currently
shallow-copies the operators slice into result, so setting
operator.DependencyOnly = true mutates the original MonitoredOperator pointers;
fix by deep-copying each element into new MonitoredOperator instances before
modifying them (e.g., iterate the input operators and append copies to result),
then perform the DependencyOnly updates and the checks against
nvidiagpu.Operator.Name and amdgpu.Operator.Name on these copies to avoid
mutating the caller's slice.
---
Duplicate comments:
In `@internal/operators/manager.go`:
- Around line 914-938: The code collects dependencies of filtered GPU operators
into dependenciesToRemove and unconditionally removes them from
includedOperators, which breaks shared dependencies; update the logic in the
block that references filteredGPUOperators, gpuFilter, mgr.olmOperators and
GetDependencies so you first compute the set of dependencies still required by
the remaining includedOperators (iterate over includedOperators after removing
filteredGPUOperators, call GetDependencies on each remaining operator and
aggregate neededDeps), then only remove a dependency from includedOperators if
it is not present in neededDeps; replace the current dependenciesToRemove-only
approach with this two-step: build neededDeps from remaining operators, then
filter includedOperators by excluding ops that are both in the filtered set and
not required by neededDeps.
- Around line 598-621: The dependency-removal collects dependenciesToRemove from
filteredGPUOperators but then unconditionally filters out any operator whose
name is in dependenciesToRemove, which incorrectly removes shared dependencies
still required by other included operators; change the logic to compute
dependency reference counts among the operators that will remain (use the same
filtered slice of *models.MonitoredOperator), by calling GetDependencies on
mgr.olmOperators for each operator in filtered to build a map[string]int of how
many included operators depend on each dependency, then remove an operator only
if dependenciesToRemove[name] is true AND the dependency count for that name is
zero; update the filtering step that builds finalFiltered to consult that
reference-count map instead of the boolean dependenciesToRemove map so shared
dependencies are preserved.
---
Nitpick comments:
In `@internal/operators/manager.go`:
- Around line 952-962: Replace the string literals "nvidia-gpu" and "amd-gpu" in
the operatorName switch with the canonical constants exported by the GPU
operator packages (use nvidiagpu.Operator.Name and amdgpu.Operator.Name) so the
switch matches the single source-of-truth names; update the two case labels in
the switch inside manager.go (the block that checks gpuFilter.NvidiaEnabled and
gpuFilter.AmdEnabled) to use those constants instead of raw strings.
- Around line 902-903: Replace the string literals "nvidia-gpu" and "amd-gpu"
with the corresponding operator name constants to avoid fragile comparisons:
update the conditional that checks operatorName (where filteredGPUOperators is
appended) to compare against nvidiagpu.Operator.Name and amdgpu.Operator.Name
instead of the hard-coded strings so future renames stay consistent.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: d4630a39-fc8c-4d65-9504-95b9bef3e5fe
📒 Files selected for processing (1)
internal/operators/manager.go
| // ensureAllGPUOperators ensures both nvidia-gpu and amd-gpu operators are included as dependencies (backward compatibility) | ||
| func (mgr *Manager) ensureAllGPUOperators(operators []*models.MonitoredOperator) []*models.MonitoredOperator { | ||
| result := make([]*models.MonitoredOperator, len(operators)) | ||
| copy(result, operators) | ||
|
|
||
| // Track which GPU operators are already present | ||
| hasNvidia := false | ||
| hasAmd := false | ||
|
|
||
| for _, operator := range result { | ||
| if operator.Name == nvidiagpu.Operator.Name { | ||
| operator.DependencyOnly = true | ||
| hasNvidia = true | ||
| } else if operator.Name == amdgpu.Operator.Name { | ||
| operator.DependencyOnly = true | ||
| hasAmd = true | ||
| } | ||
| } |
There was a problem hiding this comment.
Shallow copy causes unintended mutation of input operators.
The copy() function copies pointers, so modifying operator.DependencyOnly = true in the loop affects the original operators slice passed by the caller. This could cause unexpected side effects.
🛡️ Proposed fix to deep copy operators before modification
func (mgr *Manager) ensureAllGPUOperators(operators []*models.MonitoredOperator) []*models.MonitoredOperator {
- result := make([]*models.MonitoredOperator, len(operators))
- copy(result, operators)
+ result := make([]*models.MonitoredOperator, 0, len(operators))
+ for _, op := range operators {
+ // Create a copy to avoid modifying the original
+ opCopy := *op
+ result = append(result, &opCopy)
+ }
// Track which GPU operators are already present
hasNvidia := false
hasAmd := false
- for _, operator := range result {
+ for i, operator := range result {
if operator.Name == nvidiagpu.Operator.Name {
- operator.DependencyOnly = true
+ result[i].DependencyOnly = true
hasNvidia = true
} else if operator.Name == amdgpu.Operator.Name {
- operator.DependencyOnly = true
+ result[i].DependencyOnly = true
hasAmd = true
}
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@internal/operators/manager.go` around lines 626 - 643, The function
ensureAllGPUOperators currently shallow-copies the operators slice into result,
so setting operator.DependencyOnly = true mutates the original MonitoredOperator
pointers; fix by deep-copying each element into new MonitoredOperator instances
before modifying them (e.g., iterate the input operators and append copies to
result), then perform the DependencyOnly updates and the checks against
nvidiagpu.Operator.Name and amdgpu.Operator.Name on these copies to avoid
mutating the caller's slice.
|
@shay23bra: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Add independent GPU vendor selection for OpenShift AI bundles - Users can now select NVIDIA only, AMD only, both, or neither through nvidia_enabled and amd_enabled
boolean parameters
Maintain backward compatibility - Both parameters default to true, so existing API calls continue to work unchanged and include both GPU operators as before
Support both endpoints - Works for bundle listing (/v2/operators/bundles) and single bundle retrieval (/v2/operators/bundles/{id}) with identical parameter behavior
Installation-time GPU filtering - Enhanced ResolveDependencies to honor GPU vendor selection from operator properties during cluster installation, ensuring
users' GPU filter choices (
{"gpu_filter":{"nvidia_enabled":true,"amd_enabled":false}}) are respected in both bundle discovery and actual operator deploymentSummary by CodeRabbit