Skip to content

Conversation

@pedjak
Copy link
Contributor

@pedjak pedjak commented Nov 13, 2025

Description

Replaces the monolithic 170-line reconcile() method with a flexible step-based architecture that executes discrete reconciliation phases in sequence. Each phase (HandleFinalizers, RetrieveRevisionStates, RetrieveRevisionMetadata, UnpackBundle, ApplyBundle) is now a standalone function that can be composed differently for Helm vs Boxcutter workflows.

The motivation for this PR came out the work in #2281 where we have realized that handling conditions with Boxcutter differently would require introducing conditional executions depending on the backend type in one part of the reconcile loop, although the rest of the reconciliation is the same. In order to keep things easy to understand and test, I have divided the reconciliation into steps so that some of them could be swapped out/replaced depending on the used backend. Moreover, we are opening a possibility to add and write simpler unit tests.

Changes:

  • Introduce ReconcileStepFunc type and ReconcileSteps executor
  • Extract reconcile logic into individual step functions in new file clusterextension_reconcile_steps.go
  • Move BoxcutterRevisionStatesGetter to boxcutter_reconcile_steps.go alongside MigrateStorage step
  • Configure step pipelines in main.go for each applier type
  • Refactor tests to use functional options pattern for reconciler setup

Reviewer Checklist

  • API Go Documentation
  • Tests: Unit Tests (and E2E Tests, if appropriate)
  • Comprehensive Commit Messages
  • Links to related GitHub Issue(s)

Copilot AI review requested due to automatic review settings November 13, 2025 15:54
@pedjak pedjak requested a review from a team as a code owner November 13, 2025 15:54
@netlify
Copy link

netlify bot commented Nov 13, 2025

Deploy Preview for olmv1 ready!

Name Link
🔨 Latest commit ea86263
🔍 Latest deploy log https://app.netlify.com/projects/olmv1/deploys/691f2eff2793130008df6ede
😎 Deploy Preview https://deploy-preview-2332--olmv1.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@openshift-ci
Copy link

openshift-ci bot commented Nov 13, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign grokspawn, thetechnick for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@pedjak pedjak changed the title Refactor ClusterExtension reconciler to use composable step-based pipeline 🌱 Refactor ClusterExtension reconciler to use composable step-based pipeline Nov 13, 2025
Copilot finished reviewing on behalf of pedjak November 13, 2025 15:59
@pedjak pedjak requested review from perdasilva and removed request for bentito and thetechnick November 13, 2025 16:00
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR refactors the ClusterExtension reconciler from a monolithic 170-line reconcile method into a composable step-based pipeline architecture. Each reconciliation phase (finalizer handling, revision state retrieval, metadata resolution, bundle unpacking, and bundle application) is now a standalone function that can be configured differently for Helm vs Boxcutter workflows.

  • Introduces ReconcileStepFunc type and ReconcileSteps executor for sequential step execution
  • Extracts reconciliation logic into individual step functions with context-based data passing
  • Moves Boxcutter-specific code (BoxcutterRevisionStatesGetter, MigrateStorage) into dedicated file
  • Configures applier-specific pipelines in main.go
  • Updates tests to use functional options pattern for reconciler setup

Reviewed Changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 13 comments.

Show a summary per file
File Description
internal/operator-controller/controllers/clusterextension_controller.go Removes monolithic reconcile method, adds ReconcileStepFunc infrastructure and step executor
internal/operator-controller/controllers/clusterextension_reconcile_steps.go New file containing common reconciliation step functions with context-based data flow
internal/operator-controller/controllers/boxcutter_reconcile_steps.go New file containing boxcutter-specific steps (MigrateStorage) and BoxcutterRevisionStatesGetter
cmd/operator-controller/main.go Configures reconciliation step pipelines for both boxcutter and Helm appliers
internal/operator-controller/controllers/suite_test.go Adds functional options pattern for test reconciler configuration
internal/operator-controller/controllers/clusterextension_controller_test.go Refactors tests to use new functional options pattern for reconciler setup

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.


for _, applyOpt := range opts {
applyOpt(ctrlBuilder)
}
Copy link

Copilot AI Nov 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The removal of the blank line at line 312 reduces code readability. There should be a blank line between the loop and the return statement to separate the loop logic from the function return.

Consider keeping the blank line for better visual separation.

Suggested change
}
}

Copilot uses AI. Check for mistakes.
Comment on lines 42 to 151
func HandleFinalizers(f finalizer.Finalizer) ReconcileStepFunc {
return func(ctx context.Context, ext *ocv1.ClusterExtension) (context.Context, *ctrl.Result, error) {
l := log.FromContext(ctx)

l.Info("handling finalizers")
finalizeResult, err := f.Finalize(ctx, ext)
if err != nil {
setStatusProgressing(ext, err)
return ctx, nil, err
}
if finalizeResult.Updated || finalizeResult.StatusUpdated {
// On create: make sure the finalizer is applied before we do anything
// On delete: make sure we do nothing after the finalizer is removed
return ctx, &ctrl.Result{}, nil
}

if ext.GetDeletionTimestamp() != nil {
// If we've gotten here, that means the cluster extension is being deleted, we've handled all of
// _our_ finalizers (above), but the cluster extension is still present in the cluster, likely
// because there are _other_ finalizers that other controllers need to handle, (e.g. the orphan
// deletion finalizer).
return nil, nil, nil
}
return ctx, nil, nil
}
}

func RetrieveRevisionStates(r RevisionStatesGetter) ReconcileStepFunc {
return func(ctx context.Context, ext *ocv1.ClusterExtension) (context.Context, *ctrl.Result, error) {
l := log.FromContext(ctx)
l.Info("getting installed bundle")
revisionStates, err := r.GetRevisionStates(ctx, ext)
if err != nil {
setInstallStatus(ext, nil)
var saerr *authentication.ServiceAccountNotFoundError
if errors.As(err, &saerr) {
setInstalledStatusConditionUnknown(ext, saerr.Error())
setStatusProgressing(ext, errors.New("installation cannot proceed due to missing ServiceAccount"))
return ctx, nil, err
}
setInstalledStatusConditionUnknown(ext, err.Error())
setStatusProgressing(ext, errors.New("retrying to get installed bundle"))
return ctx, nil, err
}
return context.WithValue(ctx, revisionStatesKey{}, revisionStates), nil, nil
}
}

func RetrieveRevisionMetadata(r resolve.Resolver) ReconcileStepFunc {
return func(ctx context.Context, ext *ocv1.ClusterExtension) (context.Context, *ctrl.Result, error) {
l := log.FromContext(ctx)
revisionStates := ctx.Value(revisionStatesKey{}).(*RevisionStates)
var resolvedRevisionMetadata *RevisionMetadata
if len(revisionStates.RollingOut) == 0 {
l.Info("resolving bundle")
var bm *ocv1.BundleMetadata
if revisionStates.Installed != nil {
bm = &revisionStates.Installed.BundleMetadata
}
resolvedBundle, resolvedBundleVersion, resolvedDeprecation, err := r.Resolve(ctx, ext, bm)
if err != nil {
// Note: We don't distinguish between resolution-specific errors and generic errors
setStatusProgressing(ext, err)
setInstalledStatusFromRevisionStates(ext, revisionStates)
ensureAllConditionsWithReason(ext, ocv1.ReasonFailed, err.Error())
return ctx, nil, err
}

// set deprecation status after _successful_ resolution
// TODO:
// 1. It seems like deprecation status should reflect the currently installed bundle, not the resolved
// bundle. So perhaps we should set package and channel deprecations directly after resolution, but
// defer setting the bundle deprecation until we successfully install the bundle.
// 2. If resolution fails because it can't find a bundle, that doesn't mean we wouldn't be able to find
// a deprecation for the ClusterExtension's spec.packageName. Perhaps we should check for a non-nil
// resolvedDeprecation even if resolution returns an error. If present, we can still update some of
// our deprecation status.
// - Open question though: what if different catalogs have different opinions of what's deprecated.
// If we can't resolve a bundle, how do we know which catalog to trust for deprecation information?
// Perhaps if the package shows up in multiple catalogs and deprecations don't match, we can set
// the deprecation status to unknown? Or perhaps we somehow combine the deprecation information from
// all catalogs?
SetDeprecationStatus(ext, resolvedBundle.Name, resolvedDeprecation)
resolvedRevisionMetadata = &RevisionMetadata{
Package: resolvedBundle.Package,
Image: resolvedBundle.Image,
BundleMetadata: bundleutil.MetadataFor(resolvedBundle.Name, *resolvedBundleVersion),
}
} else {
resolvedRevisionMetadata = revisionStates.RollingOut[0]
}
return context.WithValue(ctx, resolvedRevisionMetadataKey{}, resolvedRevisionMetadata), nil, nil
}
}

func UnpackBundle(i imageutil.Puller, cache imageutil.Cache) ReconcileStepFunc {
return func(ctx context.Context, ext *ocv1.ClusterExtension) (context.Context, *ctrl.Result, error) {
l := log.FromContext(ctx)
l.Info("unpacking resolved bundle")
resolvedRevisionMetadata := ctx.Value(resolvedRevisionMetadataKey{}).(*RevisionMetadata)
imageFS, _, _, err := i.Pull(ctx, ext.GetName(), resolvedRevisionMetadata.Image, cache)
if err != nil {
revisionStates := ctx.Value(revisionStatesKey{}).(*RevisionStates)
// Wrap the error passed to this with the resolution information until we have successfully
// installed since we intend for the progressing condition to replace the resolved condition
// and will be removing the .status.resolution field from the ClusterExtension status API
setStatusProgressing(ext, wrapErrorWithResolutionInfo(resolvedRevisionMetadata.BundleMetadata, err))
setInstalledStatusFromRevisionStates(ext, revisionStates)
return ctx, nil, err
}
return context.WithValue(ctx, imageFSKey{}, imageFS), nil, nil
}
}

func ApplyBundle(a Applier) ReconcileStepFunc {
Copy link

Copilot AI Nov 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The exported functions HandleFinalizers, RetrieveRevisionStates, RetrieveRevisionMetadata, UnpackBundle, and ApplyBundle lack documentation. These are the core building blocks of the new reconciliation architecture and should have doc comments explaining their purpose, inputs, outputs, and any side effects.

Example for HandleFinalizers:

// HandleFinalizers returns a ReconcileStepFunc that manages finalizers for the ClusterExtension.
// It ensures finalizers are added on resource creation and properly handles cleanup on deletion.
// If finalizers are updated, reconciliation stops to allow the changes to be persisted.
func HandleFinalizers(f finalizer.Finalizer) ReconcileStepFunc {

Copilot uses AI. Check for mistakes.
return func(ctx context.Context, ext *ocv1.ClusterExtension) (context.Context, *ctrl.Result, error) {
l := log.FromContext(ctx)
l.Info("unpacking resolved bundle")
resolvedRevisionMetadata := ctx.Value(resolvedRevisionMetadataKey{}).(*RevisionMetadata)
Copy link

Copilot AI Nov 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The UnpackBundle step performs an unsafe type assertion on line 141: ctx.Value(resolvedRevisionMetadataKey{}).(*RevisionMetadata). If the previous RetrieveRevisionMetadata step is skipped or the context doesn't contain this key, this will panic.

Consider adding a nil check before the type assertion.

Suggested change
resolvedRevisionMetadata := ctx.Value(resolvedRevisionMetadataKey{}).(*RevisionMetadata)
val := ctx.Value(resolvedRevisionMetadataKey{})
resolvedRevisionMetadata, ok := val.(*RevisionMetadata)
if !ok || resolvedRevisionMetadata == nil {
return ctx, nil, errors.New("resolved revision metadata not found in context")
}

Copilot uses AI. Check for mistakes.
// - A potentially modified context for the next step
// - An optional reconciliation result that if non-nil will stop reconciliation
// - Any error that occurred during reconciliation, which will be returned to the caller
type ReconcileStepFunc func(context.Context, *ocv1.ClusterExtension) (context.Context, *ctrl.Result, error)
Copy link

Copilot AI Nov 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ReconcileSteps type (line 67) lacks documentation. Since this is a central type in the refactored architecture, it should have a doc comment explaining its purpose and usage.

Consider adding:

// ReconcileSteps is an ordered collection of reconciliation steps that are executed sequentially.
// Each step receives the output context from the previous step, allowing data to flow through the pipeline.
type ReconcileSteps []ReconcileStepFunc
Suggested change
type ReconcileStepFunc func(context.Context, *ocv1.ClusterExtension) (context.Context, *ctrl.Result, error)
type ReconcileStepFunc func(context.Context, *ocv1.ClusterExtension) (context.Context, *ctrl.Result, error)
// ReconcileSteps is an ordered collection of reconciliation steps that are executed sequentially.
// Each step receives the output context from the previous step, allowing data to flow through the pipeline.

Copilot uses AI. Check for mistakes.
func RetrieveRevisionMetadata(r resolve.Resolver) ReconcileStepFunc {
return func(ctx context.Context, ext *ocv1.ClusterExtension) (context.Context, *ctrl.Result, error) {
l := log.FromContext(ctx)
revisionStates := ctx.Value(revisionStatesKey{}).(*RevisionStates)
Copy link

Copilot AI Nov 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The RetrieveRevisionStates step performs an unsafe type assertion on line 93: ctx.Value(revisionStatesKey{}).(*RevisionStates). If the previous RetrieveRevisionStates step is skipped or the context doesn't contain this key, this will panic.

Consider adding a nil check:

revisionStatesValue := ctx.Value(revisionStatesKey{})
if revisionStatesValue == nil {
    return ctx, nil, errors.New("revision states not found in context")
}
revisionStates := revisionStatesValue.(*RevisionStates)
Suggested change
revisionStates := ctx.Value(revisionStatesKey{}).(*RevisionStates)
revisionStatesValue := ctx.Value(revisionStatesKey{})
if revisionStatesValue == nil {
return ctx, nil, errors.New("revision states not found in context")
}
revisionStates, ok := revisionStatesValue.(*RevisionStates)
if !ok {
return ctx, nil, errors.New("revision states in context has unexpected type")
}

Copilot uses AI. Check for mistakes.
Comment on lines 159 to 161
resolvedRevisionMetadata := ctx.Value(resolvedRevisionMetadataKey{}).(*RevisionMetadata)
imageFS := ctx.Value(imageFSKey{}).(fs.FS)
revisionStates := ctx.Value(revisionStatesKey{}).(*RevisionStates)
Copy link

Copilot AI Nov 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ApplyBundle step performs multiple unsafe type assertions (lines 159, 160, 161). If any of the preceding steps are skipped or fail to populate the context, these will panic:

  • ctx.Value(resolvedRevisionMetadataKey{}).(*RevisionMetadata)
  • ctx.Value(imageFSKey{}).(fs.FS)
  • ctx.Value(revisionStatesKey{}).(*RevisionStates)

Consider adding nil checks before all type assertions to prevent panics.

Suggested change
resolvedRevisionMetadata := ctx.Value(resolvedRevisionMetadataKey{}).(*RevisionMetadata)
imageFS := ctx.Value(imageFSKey{}).(fs.FS)
revisionStates := ctx.Value(revisionStatesKey{}).(*RevisionStates)
resolvedRevisionMetadataVal := ctx.Value(resolvedRevisionMetadataKey{})
if resolvedRevisionMetadataVal == nil {
return ctx, nil, errors.New("missing resolvedRevisionMetadata in context")
}
resolvedRevisionMetadata, ok := resolvedRevisionMetadataVal.(*RevisionMetadata)
if !ok {
return ctx, nil, errors.New("invalid type for resolvedRevisionMetadata in context")
}
imageFSVal := ctx.Value(imageFSKey{})
if imageFSVal == nil {
return ctx, nil, errors.New("missing imageFS in context")
}
imageFS, ok := imageFSVal.(fs.FS)
if !ok {
return ctx, nil, errors.New("invalid type for imageFS in context")
}
revisionStatesVal := ctx.Value(revisionStatesKey{})
if revisionStatesVal == nil {
return ctx, nil, errors.New("missing revisionStates in context")
}
revisionStates, ok := revisionStatesVal.(*RevisionStates)
if !ok {
return ctx, nil, errors.New("invalid type for revisionStates in context")
}

Copilot uses AI. Check for mistakes.
Comment on lines 37 to 39

type revisionStatesKey struct{}
type resolvedRevisionMetadataKey struct{}
Copy link

Copilot AI Nov 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing documentation for the context value key types revisionStatesKey, resolvedRevisionMetadataKey, and imageFSKey. These types are used as context keys but lack any explanation of their purpose or what values they hold.

Consider adding doc comments:

// revisionStatesKey is the context key for storing *RevisionStates
type revisionStatesKey struct{}

// resolvedRevisionMetadataKey is the context key for storing *RevisionMetadata
type resolvedRevisionMetadataKey struct{}

// imageFSKey is the context key for storing fs.FS
type imageFSKey struct{}
Suggested change
type revisionStatesKey struct{}
type resolvedRevisionMetadataKey struct{}
// revisionStatesKey is the context key for storing *RevisionStates.
type revisionStatesKey struct{}
// resolvedRevisionMetadataKey is the context key for storing *RevisionMetadata.
type resolvedRevisionMetadataKey struct{}
// imageFSKey is the context key for storing fs.FS.

Copilot uses AI. Check for mistakes.
Comment on lines 144 to 149
revisionStates := ctx.Value(revisionStatesKey{}).(*RevisionStates)
// Wrap the error passed to this with the resolution information until we have successfully
// installed since we intend for the progressing condition to replace the resolved condition
// and will be removing the .status.resolution field from the ClusterExtension status API
setStatusProgressing(ext, wrapErrorWithResolutionInfo(resolvedRevisionMetadata.BundleMetadata, err))
setInstalledStatusFromRevisionStates(ext, revisionStates)
Copy link

Copilot AI Nov 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The UnpackBundle step retrieves revisionStates from context on line 144 only to use it in error handling (lines 149), but it's never used if unpacking succeeds. This type assertion could panic if the context doesn't contain the key.

Consider either:

  1. Adding a nil check before the type assertion
  2. Moving this retrieval inside the error handling block where it's actually needed
Suggested change
revisionStates := ctx.Value(revisionStatesKey{}).(*RevisionStates)
// Wrap the error passed to this with the resolution information until we have successfully
// installed since we intend for the progressing condition to replace the resolved condition
// and will be removing the .status.resolution field from the ClusterExtension status API
setStatusProgressing(ext, wrapErrorWithResolutionInfo(resolvedRevisionMetadata.BundleMetadata, err))
setInstalledStatusFromRevisionStates(ext, revisionStates)
var revisionStates *RevisionStates
if v := ctx.Value(revisionStatesKey{}); v != nil {
if rs, ok := v.(*RevisionStates); ok {
revisionStates = rs
}
}
// Wrap the error passed to this with the resolution information until we have successfully
// installed since we intend for the progressing condition to replace the resolved condition
// and will be removing the .status.resolution field from the ClusterExtension status API
setStatusProgressing(ext, wrapErrorWithResolutionInfo(resolvedRevisionMetadata.BundleMetadata, err))
if revisionStates != nil {
setInstalledStatusFromRevisionStates(ext, revisionStates)
}

Copilot uses AI. Check for mistakes.
for _, opt := range opts {
opt(reconciler)
}
reconciler.ReconcileSteps = []controllers.ReconcileStepFunc{controllers.HandleFinalizers(reconciler.Finalizers), controllers.RetrieveRevisionStates(reconciler.RevisionStatesGetter)}
Copy link

Copilot AI Nov 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The newClientAndReconciler function unconditionally sets up ReconcileSteps starting with HandleFinalizers and RetrieveRevisionStates on line 94, but these steps depend on Finalizers and RevisionStatesGetter being non-nil respectively. If a test doesn't configure these via options, the steps will still be added but will fail at runtime.

Consider either:

  1. Checking if these dependencies are nil before adding their steps
  2. Ensuring all dependencies have sensible defaults

For example:

if reconciler.Finalizers != nil {
    reconciler.ReconcileSteps = append(reconciler.ReconcileSteps, controllers.HandleFinalizers(reconciler.Finalizers))
}
if reconciler.RevisionStatesGetter != nil {
    reconciler.ReconcileSteps = append(reconciler.ReconcileSteps, controllers.RetrieveRevisionStates(reconciler.RevisionStatesGetter))
}
Suggested change
reconciler.ReconcileSteps = []controllers.ReconcileStepFunc{controllers.HandleFinalizers(reconciler.Finalizers), controllers.RetrieveRevisionStates(reconciler.RevisionStatesGetter)}
reconciler.ReconcileSteps = []controllers.ReconcileStepFunc{}
if reconciler.Finalizers != nil {
reconciler.ReconcileSteps = append(reconciler.ReconcileSteps, controllers.HandleFinalizers(reconciler.Finalizers))
}
if reconciler.RevisionStatesGetter != nil {
reconciler.ReconcileSteps = append(reconciler.ReconcileSteps, controllers.RetrieveRevisionStates(reconciler.RevisionStatesGetter))
}

Copilot uses AI. Check for mistakes.
@codecov
Copy link

codecov bot commented Nov 13, 2025

Codecov Report

❌ Patch coverage is 96.29630% with 8 lines in your changes missing coverage. Please review.
✅ Project coverage is 74.42%. Comparing base (1355ff7) to head (ea86263).

Files with missing lines Patch % Lines
cmd/operator-controller/main.go 94.44% 1 Missing and 3 partials ⚠️
...ontroller/controllers/boxcutter_reconcile_steps.go 88.57% 2 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2332      +/-   ##
==========================================
+ Coverage   74.23%   74.42%   +0.18%     
==========================================
  Files          91       93       +2     
  Lines        7239     7300      +61     
==========================================
+ Hits         5374     5433      +59     
- Misses       1433     1434       +1     
- Partials      432      433       +1     
Flag Coverage Δ
e2e 44.54% <53.70%> (+0.19%) ⬆️
experimental-e2e 48.78% <77.31%> (+0.30%) ⬆️
unit 58.45% <44.44%> (-0.10%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Copy link
Contributor

@perdasilva perdasilva left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, I think decomposing the reconciler into its main steps / functions makes sense as it makes things easier to test in isolation and fosters an additive architecture.

Although, I don't feel like a well structured monolith is necessarily a problem when the number of steps is relatively small, and the pipeline is pretty linear since that makes handling the code easier imo.

Another concern is passing values between the steps using context. I think there's a few drawbacks like bypassing compile type checks, and opaque/implicit dependencies. Would having a state object instead be a better alternative? Such an object might also provide a good data source for determining the underlying state and setting the conditions appropriately.

We should also call out that having two CE reconcilers is a temporary state. While I think this pipeline approach can make us more nimble, I think we should be reasonably confident that, given the simplicity of the pipeline, there aren't any simpler alternatives.

E.g., what if we decompose the steps into methods to help with isolation testing, adapt the component interfaces to be usable in both helm and boxcutter world (I think we might already be there to a large extent?), and, pass state object between the steps and drive status setting from what falls out of that.

Having said that, I don't think this is complexity we cannot live with. I'm just wondering if we should call out what problems it solves, and why its preferable to any other reasonable alternative. I'm also curious to see what others think.

}
}

func RetrieveRevisionMetadata(r resolve.Resolver) ReconcileStepFunc {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should name this in a way that makes it clear that this is the resolution step? E.g.

Suggested change
func RetrieveRevisionMetadata(r resolve.Resolver) ReconcileStepFunc {
func ResolveBundle(r resolve.Resolver) ReconcileStepFunc {

or something like that

@pedjak
Copy link
Contributor Author

pedjak commented Nov 17, 2025

Although, I don't feel like a well structured monolith is necessarily a problem when the number of steps is relatively small, and the pipeline is pretty linear since that makes handling the code easier imo.

The motivation for this PR came out the work in #2281 where we have realized that handling conditions with Boxcutter differently would require introducing conditional executions depending on the backend type in one part of the reconcile loop, although the rest of the reconciliation is the same. In order to keep things easy to understand and test, I have divided the reconciliation into steps so that some of them could be swapped out/replaced depending on the used backend. Moreover, we are opening a possibility to add write simpler unit tests.

Another concern is passing values between the steps using context. I think there's a few drawbacks like bypassing compile type checks, and opaque/implicit dependencies. Would having a state object instead be a better alternative? Such an object might also provide a good data source for determining the underlying state and setting the conditions appropriately.

Golang context is used for sharing request-scoped cross-cutting concerns by many frameworks. The added valueFromContext() perform necessary checks at the runtime.

E.g., what if we decompose the steps into methods to help with isolation testing, adapt the component interfaces to be usable in both helm and boxcutter world (I think we might already be there to a large extent?), and, pass state object between the steps and drive status setting from what falls out of that.

I am not sure what would be the difference to what is the proposed here? Can you provide some sketches?

Comment on lines 712 to 714
controllers.HandleFinalizers(ceReconciler.Finalizers),
controllers.RetrieveRevisionStates(ceReconciler.RevisionStatesGetter),
controllers.RetrieveRevisionMetadata(ceReconciler.Resolver),
controllers.UnpackBundle(ceReconciler.ImagePuller, ceReconciler.ImageCache),
controllers.ApplyBundle(ceReconciler.Applier),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need Finalizers, RevisionStatesGetter, Resolver, ImagePuller, etc. to still be fields on the ceReconciler? It seems like those details are no longer a direct concern of the reconciler?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need Finalizers, RevisionStatesGetter, Resolver, ImagePuller, etc. to still be fields on the ceReconciler? It seems like those details are no longer a direct concern of the reconciler?

Correct, I would remove them if we agree about PR direction.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed, ptal.

@perdasilva
Copy link
Contributor

perdasilva commented Nov 17, 2025

I am not sure what would be the difference to what is the proposed here? Can you provide some sketches?

Maybe we could use a state/result type struct to collect the the results of each step and use that instead of the context to pass results between subsequent steps, e.g.:

type Result struct {
	RevisionStates            *RevisionStates
	BundleResolutionResult    *BundleResolutionResult // contains bundle resolution state: bundle/deprecation metadata, etc.
	BundleUnpackResult.       *BundleUnpackResult // contains bundle unpack state: fs, time unpacked, still unpacking, etc.
	ApplierResult             *ApplierResult // contains applier results
}

type ReconcileStepFunc func(context.Context, *ocv1.ClusterExtension, *Result) (context.Context, *ctrl.Result, error)

Errors could either be embedded in the step results or just be raised by the step function itself through its return values like we have it now.

If the only point of contention is condition setting, we could strip the steps of setting any conditions and just have one step at the end that does it based on the results. Then have one impl for helm and one for boxcutter and just configure the reconciler accordingly.

If the folks feel a framework for a linear 5-6 step pipeline is overkill (I don't have super strong opinions here - I can live with either). We could still get the testing benefits by decomposing the reconcile function similarly to what you have done here but just call them in turn to eliminate the indirection.

Golang context is used for sharing request-scoped cross-cutting concerns by many frameworks. The added valueFromContext() perform necessary checks at the runtime.

I get that. I'm just wondering if we're better off using a state object to carry results between steps. The impression I got from searching around and talking to Gemini is that context values are typically used for more lightweight information such as request metadata (ids, users, headers, etc.) rather than intermediate computation results.

I think we should call out the motivation in the description. As it is now, we've refactored things to make it explicit that storage migration isn't part of the helm reconciler.

I hope all of this is cogent. I'm pretty beat. But we can take it up again tomorrow ^^

@pedjak
Copy link
Contributor Author

pedjak commented Nov 17, 2025

I get that. I'm just wondering if we're better off using a state object to carry results between steps. The impression I got from searching around and talking to Gemini is that context values are typically used for more lightweight information such as request metadata (ids, users, headers, etc.) rather than intermediate computation results.

Having a dedicated state instead of the golang context:

  • would not save us from runtime errors because any of fields in the state could be nil, so we still need to check for that
  • for each reconciler we would need to develop a dedicated little step framework that would pass the proper state (yes, we could apply the same approach to other controllers in this repo).

If the folks feel a framework for a linear 5-6 step pipeline is overkill (I don't have super strong opinions here - I can live with either).

I would not consider as a framework something that consists of 15 lines of code:

func (steps *ReconcileSteps) Reconcile(ctx context.Context, ext *ocv1.ClusterExtension) (ctrl.Result, error) {
	var res *ctrl.Result
	var err error
	nextStepCtx := ctx
	for _, step := range *steps {
		nextStepCtx, res, err = step(nextStepCtx, ext)
		if err != nil {
			return ctrl.Result{}, err
		}
		if res != nil {
			return *res, nil
		}
		if nextStepCtx == nil {
			return ctrl.Result{}, nil
		}
	}
	return ctrl.Result{}, nil
}

We could still get the testing benefits by decomposing the reconcile function similarly to what you have done here but just call them in turn to eliminate the indirection.

Not really, because the wiring is fixed - by the introduced indirection, we are now able to add for example migration to boxcutter, and apply bundles and reports conditions in a different way in cases of Helm and Boxcutter backends.

I think we should call out the motivation in the description. As it is now, we've refactored things to make it explicit that storage migration isn't part of the helm reconciler.

Done.

@joelanford
Copy link
Member

To help overcome the concern about the context value type safety, perhaps we could encapsulate the context value handling into a sub-internal package with 100% test coverage, where only typed accessor/getter functions are exported and available to our reconcile functions?

The only downside at that point would be that we'd need nil checks for those values just not being present in the context, and we'd have to handle those errors (or panic).

@joelanford
Copy link
Member

In general, I see this design as an improvement over what we currently have because it tidies things up a bit, although I also think @perdasilva's comments have merit.

Decomposing and separating out the set of functions and their order makes things easier to physically see. But I think it does potentially introduce surface area for unintended divergence between helm and boxcutter setups (but maybe our test suite has us covered there?).

And I do agree with @perdasilva that context values are worse, in isolation, than our current monolithic structure where the right values are always available in the right places. We could define a single interface that has methods for each of the steps with the right signatures for data exchange. That would be brittle, but maybe the brittleness would be a benefit because it would force us to confront and account for the changes needed in one reconcile flavor vs another?

@camilamacedo86
Copy link
Contributor

camilamacedo86 commented Nov 18, 2025

Thank you, @pedjak, for this thoughtful refactoring! I really appreciate the effort to improve structure and testability.

After reviewing the changes, I’m not fully sure this pattern is the best fit for our situation. I might be wrong, but since we may end up with a single runtime long-term (likely only Boxcutter), the extra flexibility could turn into unnecessary complexity. In that case, the benefits of the step framework might be short-lived, while the cost of maintaining and later unwinding it remains.

The current Helm/Boxcutter reconcile is already quite simple and explicit. This PR adds several new types, files, and ~100 lines of framework logic. That feels like more machinery than we need right now. If/when Helm gets removed, the existing code path is very easy to clean up. With the framework pattern, we’d have more scaffolding to unwind, even though the composability is no longer needed.

I also have some technical concerns around how context is used. Using context.WithValue to pass internal state between steps is less type-safe and makes dependencies harder to see in controller code. The Go team recommends using context values only for request-scoped data that crosses API boundaries (see the official Go blog), which is different from the use here.

A Simpler Alternative ( Common Approach for Reconciles )

If the main goal is improving testability and separation of concerns, we could break the reconciliation flow into smaller helper methods with explicit parameters. This keeps things type-safe, idiomatic, and easier to follow, while still giving us good structure:

func (r *ClusterExtensionReconciler) reconcile(ctx context.Context, ext *ocv1.ClusterExtension) (ctrl.Result, error) {
    if res, err := r.handleFinalizers(ctx, ext); res != nil || err != nil {
        return *res, err
    }

    if res, err := r.migrateStorageIfNeeded(ctx, ext); res != nil || err != nil {
        return *res, err
    }

    states, err := r.getRevisionStates(ctx, ext)
    if err != nil {
        return r.handleRevisionStatesError(ctx, ext, err)
    }

    resolved, err := r.resolveBundle(ctx, ext, states)
    if err != nil {
        return r.handleResolutionError(ctx, ext, states, err)
    }

    imageFS, err := r.unpackBundle(ctx, ext, resolved)
    if err != nil {
        return r.handleUnpackError(ctx, ext, states, resolved, err)
    }

    return r.applyBundle(ctx, ext, states, resolved, imageFS)
}

@camilamacedo86
Copy link
Contributor

Hi @pedjak,

Just to supplement the discussion — and please take this as a gentle suggestion:

Another important point I'd like to raise is that we should be using controller-runtime's EnvTest (https://book.kubebuilder.io/reference/envtest) to validate our reconcilers, rather than relying on unit tests. Without EnvTest, we're not really exercising the reconciler in a realistic control-plane environment, so these changes won't fully address the underlying issue. Unit tests also require mocks that EnvTest makes unnecessary.

If we follow the recommended EnvTest approach, would we really need the testing benefits this PR is trying to introduce? I'm honestly not sure we would, because unit-testing reconcilers isn't considered a good practice — EnvTest already provides the proper, realistic way to test them without the need for mocks or additional framework layers.

I would gently suggest we hold this change for now, until we have:

  • Boxcutter in a more mature state,
  • clarity on how much the reconciliation flows will actually diverge (or not),
  • clarity on whether we will support more than one runtime,
  • a better understanding of the next steps for OLM v1.

Once we have that information, it will be much easier to decide whether this framework approach truly fits our future needs.

Copilot AI review requested due to automatic review settings November 18, 2025 12:19
@pedjak pedjak force-pushed the reconciliation-as-chain-of-steps branch from 864c12f to 1cf61b6 Compare November 18, 2025 12:19
Copilot finished reviewing on behalf of pedjak November 18, 2025 12:20
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@pedjak
Copy link
Contributor Author

pedjak commented Nov 18, 2025

And I do agree with @perdasilva that context values are worse, in isolation, than our current monolithic structure where the right values are always available in the right places. We could define a single interface that has methods for each of the steps with the right signatures for data exchange. That would be brittle, but maybe the brittleness would be a benefit because it would force us to confront and account for the changes needed in one reconcile flavor vs another?

@joelanford @perdasilva I have added a dedicated reconcile state holder for exchanging data between the steps, ptal. IMO we do not need to perform nil checks, given that the potential panics are going to bubble up as the reconciliation error, given that controller-runtime RecoverPanic config param is enabled by the default.

@pedjak
Copy link
Contributor Author

pedjak commented Nov 18, 2025

The current Helm/Boxcutter reconcile is already quite simple and explicit. This PR adds several new types, files, and ~100 lines of framework logic.

@camilamacedo86

Not really, it adds just a step function definition and ~15 lines of code that iterate over the provided steps. The rest of PR is just splitting the long existing reconcile function into a number of logically separated steps, boundaries were already visible by dedicated log messages like:

  • "handling finalizers"
  • "getting installed bundle"
  • "resolving bundle"
  • etc.

I mentioned already the motivation for this move in PR description - we have currently two backends and some apply and condition logic is going to be different. Instead of trying to add more conditional logic, splitting it into a chain if steps that could replaced/extended is good way to combat this problem and obtain additional benefits long-term.

The Go team recommends using context values only for request-scoped data that crosses API boundaries (see the official Go blog), which is different from the use here.

It depends how you look at: an argument for the reconcile function is ctrl.Request, a sort of request that needs to be processed. Hence, carrying request-scoped data in the context might be perfectly fine.

@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 18, 2025
@perdasilva
Copy link
Contributor

I think the addition of the state object is a good compromise here and resolves the runtime type check issue, which was my biggest concern. If we can get agreement amongst the reviewers here, I'm happy to move forward with this.

…ipeline

Replaces the monolithic 170-line reconcile() method with a flexible
step-based architecture that executes discrete reconciliation phases in
sequence. Each phase (`HandleFinalizers`, `RetrieveRevisionStates`,
`RetrieveRevisionMetadata`, `UnpackBundle`, `ApplyBundle`) is now a standalone
function that can be composed differently for Helm vs Boxcutter workflows.

Changes:
  - Introduce `ReconcileStepFunc` type and `ReconcileSteps` executor
  - Extract reconcile logic into individual step functions in new file
    `clusterextension_reconcile_steps.go`
  - Move `BoxcutterRevisionStatesGetter` to `boxcutter_reconcile_steps.go`
    alongside `MigrateStorage` step
  - Configure step pipelines in `main.go` for each applier type
  - Refactor tests to use functional options pattern for reconciler setup
@pedjak pedjak force-pushed the reconciliation-as-chain-of-steps branch from 1cf61b6 to fa112fe Compare November 20, 2025 10:52
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 20, 2025
Copilot AI review requested due to automatic review settings November 20, 2025 15:08
Copilot finished reviewing on behalf of pedjak November 20, 2025 15:10
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

type ReconcileStepFunc func(context.Context, *reconcileState, *ocv1.ClusterExtension) (*ctrl.Result, error)

// ReconcileSteps is an ordered collection of reconciliation steps that are executed sequentially.
// Each step receives the output context from the previous step, allowing data to flow through the pipeline.
Copy link

Copilot AI Nov 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentation states "Each step receives the output context from the previous step, allowing data to flow through the pipeline" but the context is not modified between steps - it's the same context passed to all steps. The state flows through via the reconcileState parameter, not the context. This should be clarified to say "Each step receives the shared state from previous steps" or similar.

Suggested change
// Each step receives the output context from the previous step, allowing data to flow through the pipeline.
// Each step receives the shared state from previous steps, allowing data to flow through the pipeline.

Copilot uses AI. Check for mistakes.
// It takes a context and ClusterExtension object as input and executes each step in the ReconcileSteps slice.
// If any step returns an error, reconciliation stops and the error is returned.
// If any step returns a non-nil ctrl.Result, reconciliation stops and that result is returned.
// If any step returns a nil context, reconciliation stops successfully.
Copy link

Copilot AI Nov 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentation states "If any step returns a nil context, reconciliation stops successfully" but the implementation doesn't check for nil context. Steps never return a context - they return (*ctrl.Result, error). This documentation line appears to be incorrect or copied from a different implementation pattern.

Suggested change
// If any step returns a nil context, reconciliation stops successfully.

Copilot uses AI. Check for mistakes.
Copilot finished reviewing on behalf of pedjak November 20, 2025 20:49
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants