Skip to content

Adding Optimizer Integration#77

Merged
dinogun merged 16 commits into
kruize:mvp_demofrom
shekhar316:optimizer
Apr 13, 2026
Merged

Adding Optimizer Integration#77
dinogun merged 16 commits into
kruize:mvp_demofrom
shekhar316:optimizer

Conversation

@shekhar316
Copy link
Copy Markdown
Contributor

@shekhar316 shekhar316 commented Mar 17, 2026

This PR integrates the Kruize Optimizer component into the Kruize operator, enabling advanced optimization capabilities alongside the existing Autotune functionality.

Changes Made

1. API Changes (api/v1alpha1/kruize_types.go)

  • Added Optimizer_image field to KruizeSpec to allow users to specify a custom Optimizer container image
  • Field is optional with omitempty tag for backward compatibility

2. Image Management (internal/constants/kruize_images.go)

  • Added support for Optimizer image configuration
  • Introduced DEFAULT_OPTIMIZER_IMAGE environment variable for overriding default image
  • Default image: quay.io/rh-ee-shesaxen/optimizer:0.1-test
  • Added GetDefaultOptimizerImage() function for retrieving the configured image

3. Resource Generation (internal/utils/kruize_generator.go)

  • Created kruizeOptimizerDeployment() to generate Optimizer deployment with:
    • Single replica configuration
    • Port 8080 exposure
    • Environment variables for Swagger, Kruize URL, scheduling intervals, webhook configuration, and datasource settings
  • Created kruizeOptimizerService() to expose Optimizer on ClusterIP service
  • Updated NewKruizeResourceGenerator() to accept and handle Optimizer image parameter
  • Integrated Optimizer resources into both OpenShift and Kubernetes deployment flows

4. Controller Updates (internal/controller/kruize_controller.go)

  • Updated pod readiness check to wait for 4 pods (added kruize-optimizer to the list)
  • Modified deployment logging to include optimizer_image parameter
  • Passed Optimizer image to resource generator

5. Sample Configuration (config/samples/v1alpha1_kruize.yaml)

  • Added example optimizer_image configuration for reference

Technical Details

Optimizer Configuration:

  • Image: quay.io/rh-ee-shesaxen/optimizer:0.1-test
  • Port: 8080
  • Key Features:
    • Swagger UI enabled
    • Integration with Kruize service at http://kruize:8080
    • State refresh interval: 60 minutes
    • Bulk scheduler interval: 15 minutes
    • Webhook endpoint: http://kruize-optimizer:8080/webhook
    • Default datasource: prometheus-1
    • Target label filtering with kruize/autotune: enabled

Summary by Sourcery

Integrate the Kruize Optimizer component into the operator and deployment flows alongside existing Autotune services.

New Features:

  • Add configurable Optimizer container image to the Kruize custom resource spec with optional override.
  • Introduce default Optimizer image resolution with environment variable override support.
  • Create Kubernetes/OpenShift deployment and ClusterIP service resources for the Kruize Optimizer component.

Enhancements:

  • Extend resource generator and controller wiring to propagate the Optimizer image and deploy the new component, including readiness checks for its pod.
  • Update sample Kruize CR YAML to document usage of the new Optimizer image field.

@sourcery-ai
Copy link
Copy Markdown

sourcery-ai Bot commented Mar 17, 2026

Reviewer's Guide

Integrates the Kruize Optimizer as a first-class component in the operator by extending the CRD/spec, wiring image configuration and defaults, generating Deployment/Service resources for both OpenShift and Kubernetes flows, and updating the controller to deploy and wait on the new optimizer pod.

Sequence diagram for deploying Kruize with Optimizer

sequenceDiagram
    actor Admin
    participant KubeAPI as KubernetesAPI
    participant Reconciler as KruizeReconciler
    participant Generator as KruizeResourceGenerator
    participant Constants
    participant KubeCtrl as KubernetesController

    Admin->>KubeAPI: Apply Kruize CR (includes optimizer_image)
    KubeAPI-->>Reconciler: Create/Update event for Kruize

    Reconciler->>Reconciler: deployKruize()
    Reconciler->>Reconciler: normalize cluster_type

    Reconciler->>Generator: NewKruizeResourceGenerator(namespace, autotune_image, autotune_ui_image, optimizer_image, clusterType)
    alt optimizer_image empty in CR
        Generator->>Constants: GetDefaultOptimizerImage()
        Constants-->>Generator: defaultOptimizerImage
    end

    Reconciler->>Generator: NamespacedResources() / KubernetesNamespacedResources()
    Generator-->>Reconciler: Resources including kruize-optimizer Deployment and Service

    Reconciler->>KubeAPI: Create/Update resources
    KubeAPI-->>KubeCtrl: Reconcile deployments and services
    KubeCtrl-->>KubeAPI: Pods running (kruize, kruize-ui-nginx, kruize-db, kruize-optimizer)

    Reconciler->>Reconciler: waitForKruizePods()
    Reconciler->>KubeAPI: List pods in namespace
    KubeAPI-->>Reconciler: Pod statuses
    Reconciler->>Reconciler: Check readyPods >= 4 and totalPods >= 4
    Reconciler-->>Admin: Deployment complete with optimizer running
Loading

Class diagram for updated KruizeSpec and resource generation

classDiagram
    class KruizeSpec {
        +string cluster_type
        +string autotune_image
        +string autotune_ui_image
        +string optimizer_image
        +string namespace
    }

    class KruizeResourceGenerator {
        +string Namespace
        +string Autotune_image
        +string Autotune_ui_image
        +string Optimizer_image
        +string ClusterType
        +NewKruizeResourceGenerator(namespace string, autotuneImage string, autotuneUIImage string, optimizerImage string, clusterType string) KruizeResourceGenerator
        +NamespacedResources() []client_Object
        +KubernetesNamespacedResources() []client_Object
        +kruizeDeployment() appsv1_Deployment
        +kruizeService() corev1_Service
        +kruizeDeploymentKubernetes() appsv1_Deployment
        +kruizeServiceKubernetes() corev1_Service
        +kruizeOptimizerDeployment() appsv1_Deployment
        +kruizeOptimizerService() corev1_Service
    }

    class KruizeReconciler {
        +deployKruize(ctx context_Context, kruize *Kruize) error
        +deployKruizeComponents(ctx context_Context, namespace string, kruize *Kruize, clusterType string) error
        +waitForKruizePods(ctx context_Context, namespace string, timeout time_Duration) error
    }

    class Constants {
        <<utility>>
        +GetDefaultAutotuneImage() string
        +GetDefaultUIImage() string
        +GetDefaultOptimizerImage() string
    }

    KruizeReconciler --> KruizeSpec : reads
    KruizeReconciler --> KruizeResourceGenerator : constructs
    KruizeResourceGenerator --> Constants : uses
Loading

File-Level Changes

Change Details Files
Extend KruizeSpec and sample CR to accept an optional optimizer image reference so users can configure the optimizer container
  • Added Optimizer_image field to KruizeSpec with omitempty for backward compatibility and CSV metadata for UI integration
  • Updated sample v1alpha1_kruize.yaml to include an example optimizer_image value
api/v1alpha1/kruize_types.go
config/samples/v1alpha1_kruize.yaml
Introduce configurable default image handling for the optimizer, parallel to existing Autotune and UI images
  • Defined DEFAULT_OPTIMIZER_IMAGE env var name and default optimizer repo/tag constants
  • Added cached defaultOptimizerImage resolution in init using env override or defaults
  • Exposed GetDefaultOptimizerImage helper mirroring other image getters
internal/constants/kruize_images.go
Generate Kubernetes/OpenShift Deployment and Service resources for the Kruize Optimizer and plug them into resource generation flows
  • Extended KruizeResourceGenerator to carry an Optimizer_image field and accept optimizerImage in NewKruizeResourceGenerator
  • Fell back to GetDefaultOptimizerImage() when the CR does not specify an optimizer image
  • Implemented kruizeOptimizerDeployment() with single-replica Deployment, port 8080, and all required optimizer env vars
  • Implemented kruizeOptimizerService() as a ClusterIP service exposing port 8080
  • Added optimizer deployment/service to both NamespacedResources() and KubernetesNamespacedResources() outputs so Optimizer is created on all supported cluster types
internal/utils/kruize_generator.go
Update the controller to deploy the optimizer alongside existing components and treat it as required for readiness
  • Expanded waitForKruizePods required pod list to include kruize-optimizer and bumped readiness thresholds from 3 to 4 pods
  • Included optimizer_image in deployKruize logging context for observability
  • Passed Kruize.Spec.Optimizer_image into NewKruizeResourceGenerator so the generator receives the configured optimizer image
internal/controller/kruize_controller.go

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link
Copy Markdown

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've left some high level feedback:

  • The Optimizer deployment hardcodes a lot of behavioral settings (URLs, intervals, target labels, datasource, profiles) in Env; consider wiring these through the CR or config so they can be tuned without modifying the operator.
  • Names and labels for the optimizer component ("kruize-optimizer", port 8080, etc.) are duplicated across deployment, service, and readiness logic; it may be safer to centralize these as constants to avoid subtle drift in future changes.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The Optimizer deployment hardcodes a lot of behavioral settings (URLs, intervals, target labels, datasource, profiles) in `Env`; consider wiring these through the CR or config so they can be tuned without modifying the operator.
- Names and labels for the optimizer component (`"kruize-optimizer"`, port 8080, etc.) are duplicated across deployment, service, and readiness logic; it may be safer to centralize these as constants to avoid subtle drift in future changes.

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment thread config/samples/v1alpha1_kruize.yaml Outdated
@shreyabiradar07 shreyabiradar07 moved this to Under Review in Monitoring Mar 31, 2026
@shekhar316 shekhar316 changed the base branch from main to mvp_demo March 31, 2026 06:00
@shekhar316
Copy link
Copy Markdown
Contributor Author

rebased my PR to mvp_demo

@shreyabiradar07
Copy link
Copy Markdown
Contributor

@shekhar316 Tried building an image on top of PR 77 : quay.io/shbirada/kruize-operator:pr_77

I'm seeing Kruize service is unavailable, connection refused errors in optimizer pod deployed using operator: optimizer_error_logs.txt

oc get pods -n openshift-tuning
NAME                                    READY   STATUS    RESTARTS   AGE
kruize-68dd786fd4-sb6sq                 1/1     Running   0          6m20s
kruize-db-deployment-6755bc44f8-t658v   1/1     Running   0          6m20s
kruize-operator-68c44bbf7-65492         1/1     Running   0          6m49s
kruize-optimizer-dd5f76f9d-46w79        1/1     Running   0          6m20s
kruize-ui-nginx-pod                     1/1     Running   0          6m20s

Comment thread internal/controller/kruize_controller.go
Comment thread internal/controller/kruize_controller.go Outdated
@shreyabiradar07
Copy link
Copy Markdown
Contributor

@shekhar316 The init container approach is working and there are no network errors observed in optimizer pod:
optimizer_logs.txt

Just a follow up question on the labels:
Bulk payload from the logs has include label "kruize/autotune" : "enabled}" which is not really considered as I see 211 experiments created or is it done programmatically currently?

$ curl http://kruize-openshift-tuning.apps.cluster-f629n.f629n.sandbox1570.opentlc.com/bulk?job_id=c37a876d-d174-495c-8174-331020e4b805

{
  "summary": {
    "status": "COMPLETED",
    "total_experiments": 211,
    "processed_experiments": 211,
    "notifications": {},
    "input": {
      "filter": {
        "include": {
          "labels": {
            "kruize/autotune": "enabled}"
          }
        }
      },
      "datasource": "prometheus-1",
      "webhook": {
        "url": "http://kruize-optimizer:8080/webhook"
      },
      "metadata_profile": "cluster-metadata-local-monitoring",
      "measurement_duration": "15min"
    },
    "job_id": "c37a876d-d174-495c-8174-331020e4b805",
    "job_start_time": "2026-04-02T11:31:26.051Z",
    "job_end_time": "2026-04-02T11:33:08.799Z"
  }
}

Also I think there is syntax error with label

current: "kruize/autotune": "enabled}"
actual: "kruize/autotune": "enabled"}

@shekhar316
Copy link
Copy Markdown
Contributor Author

@shekhar316 The init container approach is working and there are no network errors observed in optimizer pod: optimizer_logs.txt

Just a follow up question on the labels: Bulk payload from the logs has include label "kruize/autotune" : "enabled}" which is not really considered as I see 211 experiments created or is it done programmatically currently?

$ curl http://kruize-openshift-tuning.apps.cluster-f629n.f629n.sandbox1570.opentlc.com/bulk?job_id=c37a876d-d174-495c-8174-331020e4b805

{
  "summary": {
    "status": "COMPLETED",
    "total_experiments": 211,
    "processed_experiments": 211,
    "notifications": {},
    "input": {
      "filter": {
        "include": {
          "labels": {
            "kruize/autotune": "enabled}"
          }
        }
      },
      "datasource": "prometheus-1",
      "webhook": {
        "url": "http://kruize-optimizer:8080/webhook"
      },
      "metadata_profile": "cluster-metadata-local-monitoring",
      "measurement_duration": "15min"
    },
    "job_id": "c37a876d-d174-495c-8174-331020e4b805",
    "job_start_time": "2026-04-02T11:31:26.051Z",
    "job_end_time": "2026-04-02T11:33:08.799Z"
  }
}

Also I think there is syntax error with label

current: "kruize/autotune": "enabled}" actual: "kruize/autotune": "enabled"}

Thanks Shreya for the detailed review. I think this issue is a known and needs a fix on kruize side. @pinkygupta-hub actually raised one PR for fixing this kruize/autotune#1733, but it seems the changes are not working for the labels. Follow up discussion - https://ibm-cloud.slack.com/archives/C0949JK1ABS/p1773636401427069

cc: @dinogun , @kusumachalasani

@shreyabiradar07
Copy link
Copy Markdown
Contributor

Thanks Shreya for the detailed review. I think this issue is a known and needs a fix on kruize side. @pinkygupta-hub actually raised one PR for fixing this kruize/autotune#1733, but it seems the changes are not working for the labels. Follow up discussion - https://ibm-cloud.slack.com/archives/C0949JK1ABS/p1773636401427069

cc: @dinogun , @kusumachalasani

For the current PR scope i.e optimizer integration the changes look good, the PR can be merged once the official Kruize-optimizer image is available.

Copy link
Copy Markdown
Contributor

@shreyabiradar07 shreyabiradar07 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shekhar316 Verify and run the unit tests once to ensure existing testcases are not affected
go test ./internal/controller/... -v -ginkgo.v

@shekhar316
Copy link
Copy Markdown
Contributor Author

@shekhar316 Verify and run the unit tests once to ensure existing testcases are not affected go test ./internal/controller/... -v -ginkgo.v

Sure, I've updated the unit tests. Here are the results - https://pastebin.com/fGMSninf

Copy link
Copy Markdown
Contributor

@shreyabiradar07 shreyabiradar07 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@kusumachalasani kusumachalasani moved this from Under Review to Ready for merge in Monitoring Apr 8, 2026
Comment thread config/samples/v1alpha1_kruize.yaml Outdated
Comment thread internal/constants/kruize_images.go Outdated
defaultUIImageTag = "0.1.0"

// defaultOptimizerImageTag is the default tag for Kruize Optimizer image
defaultOptimizerImageTag = "0.1_mvp"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update the default optimizer tag

@shreyabiradar07
Copy link
Copy Markdown
Contributor

@shekhar316 please resolve the conflicts


// Container image for Kruize Optimizer
// +operator-sdk:csv:customresourcedefinitions:type=spec,displayName="Optimizer Image",xDescriptors={"urn:alm:descriptor:com.tectonic.ui:text"}
Optimizer_image string `json:"optimizer_image,omitempty"`
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since a new field is being added, run cmd once locally make manifests generate
to update the Kruize CRD i.e config/crd/bases/kruize.io_kruizes.yaml

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done ✅

Copy link
Copy Markdown
Contributor

@shreyabiradar07 shreyabiradar07 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

},
Env: []corev1.EnvVar{
{Name: "KRUIZE_URL", Value: "http://kruize:8080"},
{Name: "KRUIZE_STATE_REFRESH_INTERVAL", Value: "60m"},
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider making optimizer environment variables user-configurable through the KruizeSpec (similar to how autotune_image is configurable) rather than hardcoding them. This allows users to customize optimizer behavior without rebuilding the operator.

@kusumachalasani
Copy link
Copy Markdown

@shreyabiradar07 I see both camelCase and snake_case being used in the Kruize Spec struct at api/v1alpha1/kruize_types.go.
Can we standardize on a single format, preferably camelCase, as that is the standard convention in Go?

@shreyabiradar07
Copy link
Copy Markdown
Contributor

@shreyabiradar07 I see both camelCase and snake_case being used in the Kruize Spec struct at api/v1alpha1/kruize_types.go. Can we standardize on a single format, preferably camelCase, as that is the standard convention in Go?

camelCase is the recommended standard convention in Go, will fix rest of the fields in a separate PR. Thanks for pointing this!

@dinogun
Copy link
Copy Markdown
Contributor

dinogun commented Apr 10, 2026

@shekhar316 Please fix the conflicts, thanks

Signed-off-by: SHEKHAR SAXENA <shekhar.saxena@ibm.com>
Signed-off-by: Shekhar Saxena <shekhar.saxena@ibm.com>
Signed-off-by: SHEKHAR SAXENA <shekhar.saxena@ibm.com>
Signed-off-by: SHEKHAR SAXENA <shekhar.saxena@ibm.com>
Signed-off-by: SHEKHAR SAXENA <shekhar.saxena@ibm.com>
Signed-off-by: SHEKHAR SAXENA <shekhar.saxena@ibm.com>
Signed-off-by: SHEKHAR SAXENA <shekhar.saxena@ibm.com>
Signed-off-by: SHEKHAR SAXENA <shekhar.saxena@ibm.com>
Signed-off-by: SHEKHAR SAXENA <shekhar.saxena@ibm.com>
Signed-off-by: SHEKHAR SAXENA <shekhar.saxena@ibm.com>
Signed-off-by: SHEKHAR SAXENA <shekhar.saxena@ibm.com>
Signed-off-by: SHEKHAR SAXENA <shekhar.saxena@ibm.com>
Signed-off-by: SHEKHAR SAXENA <shekhar.saxena@ibm.com>
Signed-off-by: SHEKHAR SAXENA <shekhar.saxena@ibm.com>
Signed-off-by: SHEKHAR SAXENA <shekhar.saxena@ibm.com>
Signed-off-by: SHEKHAR SAXENA <shekhar.saxena@ibm.com>
@shekhar316
Copy link
Copy Markdown
Contributor Author

@shekhar316 Please fix the conflicts, thanks

Rebased to mvp_demo and resolved conflicts.

@shekhar316
Copy link
Copy Markdown
Contributor Author

A different PR #83 will add the support for making optimizer related properties configurable with the help of CR manifest files.

cc: @shreyabiradar07 , @kusumachalasani, @dinogun

Copy link
Copy Markdown

@kusumachalasani kusumachalasani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

Copy link
Copy Markdown
Contributor

@shreyabiradar07 shreyabiradar07 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dinogun dinogun merged commit 8053c4f into kruize:mvp_demo Apr 13, 2026
2 of 3 checks passed
@github-project-automation github-project-automation Bot moved this from Ready for merge to Done in Monitoring Apr 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

5 participants