🌱 Refactor `ClusterExtension` reconciler to use composable step-based pipeline #2332

pedjak · 2025-11-13T15:54:31Z

Description

Replaces the monolithic 170-line reconcile() method with a flexible step-based architecture that executes discrete reconciliation phases in sequence. Each phase (HandleFinalizers, RetrieveRevisionStates, RetrieveRevisionMetadata, UnpackBundle, ApplyBundle) is now a standalone function that can be composed differently for Helm vs Boxcutter workflows.

The motivation for this PR came out the work in #2281 where we have realized that handling conditions with Boxcutter differently would require introducing conditional executions depending on the backend type in one part of the reconcile loop, although the rest of the reconciliation is the same. In order to keep things easy to understand and test, I have divided the reconciliation into steps so that some of them could be swapped out/replaced depending on the used backend. Moreover, we are opening a possibility to add and write simpler unit tests.

Changes:

Introduce ReconcileStepFunc type and ReconcileSteps executor
Extract reconcile logic into individual step functions in new file clusterextension_reconcile_steps.go
Move BoxcutterRevisionStatesGetter to boxcutter_reconcile_steps.go alongside MigrateStorage step
Configure step pipelines in main.go for each applier type
Refactor tests to use functional options pattern for reconciler setup

Reviewer Checklist

API Go Documentation
Tests: Unit Tests (and E2E Tests, if appropriate)
Comprehensive Commit Messages
Links to related GitHub Issue(s)

netlify · 2025-11-13T15:54:36Z

✅ Deploy Preview for olmv1 ready!

Name	Link
🔨 Latest commit	`ea86263`
🔍 Latest deploy log	https://app.netlify.com/projects/olmv1/deploys/691f2eff2793130008df6ede
😎 Deploy Preview	https://deploy-preview-2332--olmv1.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

openshift-ci · 2025-11-13T15:54:37Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign grokspawn, thetechnick for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

cmd/OWNERS
internal/operator-controller/OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copilot

Pull Request Overview

This PR refactors the ClusterExtension reconciler from a monolithic 170-line reconcile method into a composable step-based pipeline architecture. Each reconciliation phase (finalizer handling, revision state retrieval, metadata resolution, bundle unpacking, and bundle application) is now a standalone function that can be configured differently for Helm vs Boxcutter workflows.

Introduces ReconcileStepFunc type and ReconcileSteps executor for sequential step execution
Extracts reconciliation logic into individual step functions with context-based data passing
Moves Boxcutter-specific code (BoxcutterRevisionStatesGetter, MigrateStorage) into dedicated file
Configures applier-specific pipelines in main.go
Updates tests to use functional options pattern for reconciler setup

Reviewed Changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 13 comments.

Show a summary per file

File	Description
internal/operator-controller/controllers/clusterextension_controller.go	Removes monolithic reconcile method, adds ReconcileStepFunc infrastructure and step executor
internal/operator-controller/controllers/clusterextension_reconcile_steps.go	New file containing common reconciliation step functions with context-based data flow
internal/operator-controller/controllers/boxcutter_reconcile_steps.go	New file containing boxcutter-specific steps (MigrateStorage) and BoxcutterRevisionStatesGetter
cmd/operator-controller/main.go	Configures reconciliation step pipelines for both boxcutter and Helm appliers
internal/operator-controller/controllers/suite_test.go	Adds functional options pattern for test reconciler configuration
internal/operator-controller/controllers/clusterextension_controller_test.go	Refactors tests to use new functional options pattern for reconciler setup

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-11-13T16:06:59Z

internal/operator-controller/controllers/clusterextension_controller.go


 	for _, applyOpt := range opts {
 		applyOpt(ctrlBuilder)
 	}


[nitpick] The removal of the blank line at line 312 reduces code readability. There should be a blank line between the loop and the return statement to separate the loop logic from the function return.

Consider keeping the blank line for better visual separation.

Suggested change

}

}

Copilot · 2025-11-13T16:06:59Z

internal/operator-controller/controllers/clusterextension_reconcile_steps.go

+func HandleFinalizers(f finalizer.Finalizer) ReconcileStepFunc {
+	return func(ctx context.Context, ext *ocv1.ClusterExtension) (context.Context, *ctrl.Result, error) {
+		l := log.FromContext(ctx)
+
+		l.Info("handling finalizers")
+		finalizeResult, err := f.Finalize(ctx, ext)
+		if err != nil {
+			setStatusProgressing(ext, err)
+			return ctx, nil, err
+		}
+		if finalizeResult.Updated || finalizeResult.StatusUpdated {
+			// On create: make sure the finalizer is applied before we do anything
+			// On delete: make sure we do nothing after the finalizer is removed
+			return ctx, &ctrl.Result{}, nil
+		}
+
+		if ext.GetDeletionTimestamp() != nil {
+			// If we've gotten here, that means the cluster extension is being deleted, we've handled all of
+			// _our_ finalizers (above), but the cluster extension is still present in the cluster, likely
+			// because there are _other_ finalizers that other controllers need to handle, (e.g. the orphan
+			// deletion finalizer).
+			return nil, nil, nil
+		}
+		return ctx, nil, nil
+	}
+}
+
+func RetrieveRevisionStates(r RevisionStatesGetter) ReconcileStepFunc {
+	return func(ctx context.Context, ext *ocv1.ClusterExtension) (context.Context, *ctrl.Result, error) {
+		l := log.FromContext(ctx)
+		l.Info("getting installed bundle")
+		revisionStates, err := r.GetRevisionStates(ctx, ext)
+		if err != nil {
+			setInstallStatus(ext, nil)
+			var saerr *authentication.ServiceAccountNotFoundError
+			if errors.As(err, &saerr) {
+				setInstalledStatusConditionUnknown(ext, saerr.Error())
+				setStatusProgressing(ext, errors.New("installation cannot proceed due to missing ServiceAccount"))
+				return ctx, nil, err
+			}
+			setInstalledStatusConditionUnknown(ext, err.Error())
+			setStatusProgressing(ext, errors.New("retrying to get installed bundle"))
+			return ctx, nil, err
+		}
+		return context.WithValue(ctx, revisionStatesKey{}, revisionStates), nil, nil
+	}
+}
+
+func RetrieveRevisionMetadata(r resolve.Resolver) ReconcileStepFunc {
+	return func(ctx context.Context, ext *ocv1.ClusterExtension) (context.Context, *ctrl.Result, error) {
+		l := log.FromContext(ctx)
+		revisionStates := ctx.Value(revisionStatesKey{}).(*RevisionStates)
+		var resolvedRevisionMetadata *RevisionMetadata
+		if len(revisionStates.RollingOut) == 0 {
+			l.Info("resolving bundle")
+			var bm *ocv1.BundleMetadata
+			if revisionStates.Installed != nil {
+				bm = &revisionStates.Installed.BundleMetadata
+			}
+			resolvedBundle, resolvedBundleVersion, resolvedDeprecation, err := r.Resolve(ctx, ext, bm)
+			if err != nil {
+				// Note: We don't distinguish between resolution-specific errors and generic errors
+				setStatusProgressing(ext, err)
+				setInstalledStatusFromRevisionStates(ext, revisionStates)
+				ensureAllConditionsWithReason(ext, ocv1.ReasonFailed, err.Error())
+				return ctx, nil, err
+			}
+
+			// set deprecation status after _successful_ resolution
+			// TODO:
+			//  1. It seems like deprecation status should reflect the currently installed bundle, not the resolved
+			//     bundle. So perhaps we should set package and channel deprecations directly after resolution, but
+			//     defer setting the bundle deprecation until we successfully install the bundle.
+			//  2. If resolution fails because it can't find a bundle, that doesn't mean we wouldn't be able to find
+			//     a deprecation for the ClusterExtension's spec.packageName. Perhaps we should check for a non-nil
+			//     resolvedDeprecation even if resolution returns an error. If present, we can still update some of
+			//     our deprecation status.
+			//       - Open question though: what if different catalogs have different opinions of what's deprecated.
+			//         If we can't resolve a bundle, how do we know which catalog to trust for deprecation information?
+			//         Perhaps if the package shows up in multiple catalogs and deprecations don't match, we can set
+			//         the deprecation status to unknown? Or perhaps we somehow combine the deprecation information from
+			//         all catalogs?
+			SetDeprecationStatus(ext, resolvedBundle.Name, resolvedDeprecation)
+			resolvedRevisionMetadata = &RevisionMetadata{
+				Package:        resolvedBundle.Package,
+				Image:          resolvedBundle.Image,
+				BundleMetadata: bundleutil.MetadataFor(resolvedBundle.Name, *resolvedBundleVersion),
+			}
+		} else {
+			resolvedRevisionMetadata = revisionStates.RollingOut[0]
+		}
+		return context.WithValue(ctx, resolvedRevisionMetadataKey{}, resolvedRevisionMetadata), nil, nil
+	}
+}
+
+func UnpackBundle(i imageutil.Puller, cache imageutil.Cache) ReconcileStepFunc {
+	return func(ctx context.Context, ext *ocv1.ClusterExtension) (context.Context, *ctrl.Result, error) {
+		l := log.FromContext(ctx)
+		l.Info("unpacking resolved bundle")
+		resolvedRevisionMetadata := ctx.Value(resolvedRevisionMetadataKey{}).(*RevisionMetadata)
+		imageFS, _, _, err := i.Pull(ctx, ext.GetName(), resolvedRevisionMetadata.Image, cache)
+		if err != nil {
+			revisionStates := ctx.Value(revisionStatesKey{}).(*RevisionStates)
+			// Wrap the error passed to this with the resolution information until we have successfully
+			// installed since we intend for the progressing condition to replace the resolved condition
+			// and will be removing the .status.resolution field from the ClusterExtension status API
+			setStatusProgressing(ext, wrapErrorWithResolutionInfo(resolvedRevisionMetadata.BundleMetadata, err))
+			setInstalledStatusFromRevisionStates(ext, revisionStates)
+			return ctx, nil, err
+		}
+		return context.WithValue(ctx, imageFSKey{}, imageFS), nil, nil
+	}
+}
+
+func ApplyBundle(a Applier) ReconcileStepFunc {


The exported functions HandleFinalizers, RetrieveRevisionStates, RetrieveRevisionMetadata, UnpackBundle, and ApplyBundle lack documentation. These are the core building blocks of the new reconciliation architecture and should have doc comments explaining their purpose, inputs, outputs, and any side effects.

Example for HandleFinalizers:

// HandleFinalizers returns a ReconcileStepFunc that manages finalizers for the ClusterExtension. // It ensures finalizers are added on resource creation and properly handles cleanup on deletion. // If finalizers are updated, reconciliation stops to allow the changes to be persisted. func HandleFinalizers(f finalizer.Finalizer) ReconcileStepFunc {

Copilot · 2025-11-13T16:07:00Z

internal/operator-controller/controllers/clusterextension_reconcile_steps.go

+	return func(ctx context.Context, ext *ocv1.ClusterExtension) (context.Context, *ctrl.Result, error) {
+		l := log.FromContext(ctx)
+		l.Info("unpacking resolved bundle")
+		resolvedRevisionMetadata := ctx.Value(resolvedRevisionMetadataKey{}).(*RevisionMetadata)


The UnpackBundle step performs an unsafe type assertion on line 141: ctx.Value(resolvedRevisionMetadataKey{}).(*RevisionMetadata). If the previous RetrieveRevisionMetadata step is skipped or the context doesn't contain this key, this will panic.

Consider adding a nil check before the type assertion.

Suggested change

resolvedRevisionMetadata := ctx.Value(resolvedRevisionMetadataKey{}).(*RevisionMetadata)

val := ctx.Value(resolvedRevisionMetadataKey{})

resolvedRevisionMetadata, ok := val.(*RevisionMetadata)

if !ok || resolvedRevisionMetadata == nil {

return ctx, nil, errors.New("resolved revision metadata not found in context")

}

cmd/operator-controller/main.go

Copilot · 2025-11-13T16:07:00Z

internal/operator-controller/controllers/clusterextension_controller.go

+// - A potentially modified context for the next step
+// - An optional reconciliation result that if non-nil will stop reconciliation
+// - Any error that occurred during reconciliation, which will be returned to the caller
+type ReconcileStepFunc func(context.Context, *ocv1.ClusterExtension) (context.Context, *ctrl.Result, error)


The ReconcileSteps type (line 67) lacks documentation. Since this is a central type in the refactored architecture, it should have a doc comment explaining its purpose and usage.

Consider adding:

// ReconcileSteps is an ordered collection of reconciliation steps that are executed sequentially. // Each step receives the output context from the previous step, allowing data to flow through the pipeline. type ReconcileSteps []ReconcileStepFunc

Suggested change

type ReconcileStepFunc func(context.Context, *ocv1.ClusterExtension) (context.Context, *ctrl.Result, error)

type ReconcileStepFunc func(context.Context, *ocv1.ClusterExtension) (context.Context, *ctrl.Result, error)

// ReconcileSteps is an ordered collection of reconciliation steps that are executed sequentially.

// Each step receives the output context from the previous step, allowing data to flow through the pipeline.

Copilot · 2025-11-13T16:07:02Z

internal/operator-controller/controllers/clusterextension_reconcile_steps.go

+func RetrieveRevisionMetadata(r resolve.Resolver) ReconcileStepFunc {
+	return func(ctx context.Context, ext *ocv1.ClusterExtension) (context.Context, *ctrl.Result, error) {
+		l := log.FromContext(ctx)
+		revisionStates := ctx.Value(revisionStatesKey{}).(*RevisionStates)


The RetrieveRevisionStates step performs an unsafe type assertion on line 93: ctx.Value(revisionStatesKey{}).(*RevisionStates). If the previous RetrieveRevisionStates step is skipped or the context doesn't contain this key, this will panic.

Consider adding a nil check:

revisionStatesValue := ctx.Value(revisionStatesKey{}) if revisionStatesValue == nil { return ctx, nil, errors.New("revision states not found in context") } revisionStates := revisionStatesValue.(*RevisionStates)

Suggested change

revisionStates := ctx.Value(revisionStatesKey{}).(*RevisionStates)

revisionStatesValue := ctx.Value(revisionStatesKey{})

if revisionStatesValue == nil {

return ctx, nil, errors.New("revision states not found in context")

}

revisionStates, ok := revisionStatesValue.(*RevisionStates)

if !ok {

return ctx, nil, errors.New("revision states in context has unexpected type")

}

Copilot · 2025-11-13T16:07:02Z

internal/operator-controller/controllers/clusterextension_reconcile_steps.go

+		resolvedRevisionMetadata := ctx.Value(resolvedRevisionMetadataKey{}).(*RevisionMetadata)
+		imageFS := ctx.Value(imageFSKey{}).(fs.FS)
+		revisionStates := ctx.Value(revisionStatesKey{}).(*RevisionStates)


The ApplyBundle step performs multiple unsafe type assertions (lines 159, 160, 161). If any of the preceding steps are skipped or fail to populate the context, these will panic:

ctx.Value(resolvedRevisionMetadataKey{}).(*RevisionMetadata)

ctx.Value(imageFSKey{}).(fs.FS)

ctx.Value(revisionStatesKey{}).(*RevisionStates)

Consider adding nil checks before all type assertions to prevent panics.

Suggested change

resolvedRevisionMetadata := ctx.Value(resolvedRevisionMetadataKey{}).(*RevisionMetadata)

imageFS := ctx.Value(imageFSKey{}).(fs.FS)

revisionStates := ctx.Value(revisionStatesKey{}).(*RevisionStates)

resolvedRevisionMetadataVal := ctx.Value(resolvedRevisionMetadataKey{})

if resolvedRevisionMetadataVal == nil {

return ctx, nil, errors.New("missing resolvedRevisionMetadata in context")

}

resolvedRevisionMetadata, ok := resolvedRevisionMetadataVal.(*RevisionMetadata)

if !ok {

return ctx, nil, errors.New("invalid type for resolvedRevisionMetadata in context")

}

imageFSVal := ctx.Value(imageFSKey{})

if imageFSVal == nil {

return ctx, nil, errors.New("missing imageFS in context")

}

imageFS, ok := imageFSVal.(fs.FS)

if !ok {

return ctx, nil, errors.New("invalid type for imageFS in context")

}

revisionStatesVal := ctx.Value(revisionStatesKey{})

if revisionStatesVal == nil {

return ctx, nil, errors.New("missing revisionStates in context")

}

revisionStates, ok := revisionStatesVal.(*RevisionStates)

if !ok {

return ctx, nil, errors.New("invalid type for revisionStates in context")

}

Copilot · 2025-11-13T16:07:02Z

internal/operator-controller/controllers/clusterextension_reconcile_steps.go

+
+type revisionStatesKey struct{}
+type resolvedRevisionMetadataKey struct{}


Missing documentation for the context value key types revisionStatesKey, resolvedRevisionMetadataKey, and imageFSKey. These types are used as context keys but lack any explanation of their purpose or what values they hold.

Consider adding doc comments:

// revisionStatesKey is the context key for storing *RevisionStates type revisionStatesKey struct{} // resolvedRevisionMetadataKey is the context key for storing *RevisionMetadata type resolvedRevisionMetadataKey struct{} // imageFSKey is the context key for storing fs.FS type imageFSKey struct{}

Suggested change

type revisionStatesKey struct{}

type resolvedRevisionMetadataKey struct{}

// revisionStatesKey is the context key for storing *RevisionStates.

type revisionStatesKey struct{}

// resolvedRevisionMetadataKey is the context key for storing *RevisionMetadata.

type resolvedRevisionMetadataKey struct{}

// imageFSKey is the context key for storing fs.FS.

Copilot · 2025-11-13T16:07:03Z

internal/operator-controller/controllers/clusterextension_reconcile_steps.go

+			revisionStates := ctx.Value(revisionStatesKey{}).(*RevisionStates)
+			// Wrap the error passed to this with the resolution information until we have successfully
+			// installed since we intend for the progressing condition to replace the resolved condition
+			// and will be removing the .status.resolution field from the ClusterExtension status API
+			setStatusProgressing(ext, wrapErrorWithResolutionInfo(resolvedRevisionMetadata.BundleMetadata, err))
+			setInstalledStatusFromRevisionStates(ext, revisionStates)


The UnpackBundle step retrieves revisionStates from context on line 144 only to use it in error handling (lines 149), but it's never used if unpacking succeeds. This type assertion could panic if the context doesn't contain the key.

Consider either:

Adding a nil check before the type assertion

Moving this retrieval inside the error handling block where it's actually needed

Suggested change

revisionStates := ctx.Value(revisionStatesKey{}).(*RevisionStates)

// Wrap the error passed to this with the resolution information until we have successfully

// installed since we intend for the progressing condition to replace the resolved condition

// and will be removing the .status.resolution field from the ClusterExtension status API

setStatusProgressing(ext, wrapErrorWithResolutionInfo(resolvedRevisionMetadata.BundleMetadata, err))

setInstalledStatusFromRevisionStates(ext, revisionStates)

var revisionStates *RevisionStates

if v := ctx.Value(revisionStatesKey{}); v != nil {

if rs, ok := v.(*RevisionStates); ok {

revisionStates = rs

}

}

// Wrap the error passed to this with the resolution information until we have successfully

// installed since we intend for the progressing condition to replace the resolved condition

// and will be removing the .status.resolution field from the ClusterExtension status API

setStatusProgressing(ext, wrapErrorWithResolutionInfo(resolvedRevisionMetadata.BundleMetadata, err))

if revisionStates != nil {

setInstalledStatusFromRevisionStates(ext, revisionStates)

}

Copilot · 2025-11-13T16:07:03Z

internal/operator-controller/controllers/suite_test.go

+	for _, opt := range opts {
+		opt(reconciler)
+	}
+	reconciler.ReconcileSteps = []controllers.ReconcileStepFunc{controllers.HandleFinalizers(reconciler.Finalizers), controllers.RetrieveRevisionStates(reconciler.RevisionStatesGetter)}


The newClientAndReconciler function unconditionally sets up ReconcileSteps starting with HandleFinalizers and RetrieveRevisionStates on line 94, but these steps depend on Finalizers and RevisionStatesGetter being non-nil respectively. If a test doesn't configure these via options, the steps will still be added but will fail at runtime.

Consider either:

Checking if these dependencies are nil before adding their steps

Ensuring all dependencies have sensible defaults

For example:

if reconciler.Finalizers != nil { reconciler.ReconcileSteps = append(reconciler.ReconcileSteps, controllers.HandleFinalizers(reconciler.Finalizers)) } if reconciler.RevisionStatesGetter != nil { reconciler.ReconcileSteps = append(reconciler.ReconcileSteps, controllers.RetrieveRevisionStates(reconciler.RevisionStatesGetter)) }

Suggested change

reconciler.ReconcileSteps = []controllers.ReconcileStepFunc{controllers.HandleFinalizers(reconciler.Finalizers), controllers.RetrieveRevisionStates(reconciler.RevisionStatesGetter)}

reconciler.ReconcileSteps = []controllers.ReconcileStepFunc{}

if reconciler.Finalizers != nil {

reconciler.ReconcileSteps = append(reconciler.ReconcileSteps, controllers.HandleFinalizers(reconciler.Finalizers))

}

if reconciler.RevisionStatesGetter != nil {

reconciler.ReconcileSteps = append(reconciler.ReconcileSteps, controllers.RetrieveRevisionStates(reconciler.RevisionStatesGetter))

}

codecov · 2025-11-13T16:09:15Z

Codecov Report

❌ Patch coverage is 96.29630% with 8 lines in your changes missing coverage. Please review.
✅ Project coverage is 74.42%. Comparing base (1355ff7) to head (ea86263).

Files with missing lines	Patch %	Lines
cmd/operator-controller/main.go	94.44%	1 Missing and 3 partials ⚠️
...ontroller/controllers/boxcutter_reconcile_steps.go	88.57%	2 Missing and 2 partials ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2332      +/-   ##
==========================================
+ Coverage   74.23%   74.42%   +0.18%     
==========================================
  Files          91       93       +2     
  Lines        7239     7300      +61     
==========================================
+ Hits         5374     5433      +59     
- Misses       1433     1434       +1     
- Partials      432      433       +1

Flag	Coverage Δ
e2e	`44.54% <53.70%> (+0.19%)`	⬆️
experimental-e2e	`48.78% <77.31%> (+0.30%)`	⬆️
unit	`58.45% <44.44%> (-0.10%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copilot

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

perdasilva

In general, I think decomposing the reconciler into its main steps / functions makes sense as it makes things easier to test in isolation and fosters an additive architecture.

Although, I don't feel like a well structured monolith is necessarily a problem when the number of steps is relatively small, and the pipeline is pretty linear since that makes handling the code easier imo.

Another concern is passing values between the steps using context. I think there's a few drawbacks like bypassing compile type checks, and opaque/implicit dependencies. Would having a state object instead be a better alternative? Such an object might also provide a good data source for determining the underlying state and setting the conditions appropriately.

We should also call out that having two CE reconcilers is a temporary state. While I think this pipeline approach can make us more nimble, I think we should be reasonably confident that, given the simplicity of the pipeline, there aren't any simpler alternatives.

E.g., what if we decompose the steps into methods to help with isolation testing, adapt the component interfaces to be usable in both helm and boxcutter world (I think we might already be there to a large extent?), and, pass state object between the steps and drive status setting from what falls out of that.

Having said that, I don't think this is complexity we cannot live with. I'm just wondering if we should call out what problems it solves, and why its preferable to any other reasonable alternative. I'm also curious to see what others think.

perdasilva · 2025-11-17T14:06:42Z

internal/operator-controller/controllers/clusterextension_reconcile_steps.go

+	}
+}
+
+func RetrieveRevisionMetadata(r resolve.Resolver) ReconcileStepFunc {


Maybe we should name this in a way that makes it clear that this is the resolution step? E.g.

Suggested change

func RetrieveRevisionMetadata(r resolve.Resolver) ReconcileStepFunc {

func ResolveBundle(r resolve.Resolver) ReconcileStepFunc {

or something like that

pedjak · 2025-11-17T15:27:28Z

Although, I don't feel like a well structured monolith is necessarily a problem when the number of steps is relatively small, and the pipeline is pretty linear since that makes handling the code easier imo.

The motivation for this PR came out the work in #2281 where we have realized that handling conditions with Boxcutter differently would require introducing conditional executions depending on the backend type in one part of the reconcile loop, although the rest of the reconciliation is the same. In order to keep things easy to understand and test, I have divided the reconciliation into steps so that some of them could be swapped out/replaced depending on the used backend. Moreover, we are opening a possibility to add write simpler unit tests.

Another concern is passing values between the steps using context. I think there's a few drawbacks like bypassing compile type checks, and opaque/implicit dependencies. Would having a state object instead be a better alternative? Such an object might also provide a good data source for determining the underlying state and setting the conditions appropriately.

Golang context is used for sharing request-scoped cross-cutting concerns by many frameworks. The added valueFromContext() perform necessary checks at the runtime.

E.g., what if we decompose the steps into methods to help with isolation testing, adapt the component interfaces to be usable in both helm and boxcutter world (I think we might already be there to a large extent?), and, pass state object between the steps and drive status setting from what falls out of that.

I am not sure what would be the difference to what is the proposed here? Can you provide some sketches?

joelanford · 2025-11-17T19:35:53Z

cmd/operator-controller/main.go

+		controllers.HandleFinalizers(ceReconciler.Finalizers),
+		controllers.RetrieveRevisionStates(ceReconciler.RevisionStatesGetter),
+		controllers.RetrieveRevisionMetadata(ceReconciler.Resolver),
+		controllers.UnpackBundle(ceReconciler.ImagePuller, ceReconciler.ImageCache),
+		controllers.ApplyBundle(ceReconciler.Applier),


Do we need Finalizers, RevisionStatesGetter, Resolver, ImagePuller, etc. to still be fields on the ceReconciler? It seems like those details are no longer a direct concern of the reconciler?

Do we need Finalizers, RevisionStatesGetter, Resolver, ImagePuller, etc. to still be fields on the ceReconciler? It seems like those details are no longer a direct concern of the reconciler?

Correct, I would remove them if we agree about PR direction.

Removed, ptal.

perdasilva · 2025-11-17T19:52:20Z

I am not sure what would be the difference to what is the proposed here? Can you provide some sketches?

Maybe we could use a state/result type struct to collect the the results of each step and use that instead of the context to pass results between subsequent steps, e.g.:

type Result struct {
	RevisionStates            *RevisionStates
	BundleResolutionResult    *BundleResolutionResult // contains bundle resolution state: bundle/deprecation metadata, etc.
	BundleUnpackResult.       *BundleUnpackResult // contains bundle unpack state: fs, time unpacked, still unpacking, etc.
	ApplierResult             *ApplierResult // contains applier results
}

type ReconcileStepFunc func(context.Context, *ocv1.ClusterExtension, *Result) (context.Context, *ctrl.Result, error)

Errors could either be embedded in the step results or just be raised by the step function itself through its return values like we have it now.

If the only point of contention is condition setting, we could strip the steps of setting any conditions and just have one step at the end that does it based on the results. Then have one impl for helm and one for boxcutter and just configure the reconciler accordingly.

If the folks feel a framework for a linear 5-6 step pipeline is overkill (I don't have super strong opinions here - I can live with either). We could still get the testing benefits by decomposing the reconcile function similarly to what you have done here but just call them in turn to eliminate the indirection.

Golang context is used for sharing request-scoped cross-cutting concerns by many frameworks. The added valueFromContext() perform necessary checks at the runtime.

I get that. I'm just wondering if we're better off using a state object to carry results between steps. The impression I got from searching around and talking to Gemini is that context values are typically used for more lightweight information such as request metadata (ids, users, headers, etc.) rather than intermediate computation results.

I think we should call out the motivation in the description. As it is now, we've refactored things to make it explicit that storage migration isn't part of the helm reconciler.

I hope all of this is cogent. I'm pretty beat. But we can take it up again tomorrow ^^

pedjak · 2025-11-17T20:22:28Z

I get that. I'm just wondering if we're better off using a state object to carry results between steps. The impression I got from searching around and talking to Gemini is that context values are typically used for more lightweight information such as request metadata (ids, users, headers, etc.) rather than intermediate computation results.

Having a dedicated state instead of the golang context:

would not save us from runtime errors because any of fields in the state could be nil, so we still need to check for that
for each reconciler we would need to develop a dedicated little step framework that would pass the proper state (yes, we could apply the same approach to other controllers in this repo).

If the folks feel a framework for a linear 5-6 step pipeline is overkill (I don't have super strong opinions here - I can live with either).

I would not consider as a framework something that consists of 15 lines of code:

func (steps *ReconcileSteps) Reconcile(ctx context.Context, ext *ocv1.ClusterExtension) (ctrl.Result, error) {
	var res *ctrl.Result
	var err error
	nextStepCtx := ctx
	for _, step := range *steps {
		nextStepCtx, res, err = step(nextStepCtx, ext)
		if err != nil {
			return ctrl.Result{}, err
		}
		if res != nil {
			return *res, nil
		}
		if nextStepCtx == nil {
			return ctrl.Result{}, nil
		}
	}
	return ctrl.Result{}, nil
}

We could still get the testing benefits by decomposing the reconcile function similarly to what you have done here but just call them in turn to eliminate the indirection.

Not really, because the wiring is fixed - by the introduced indirection, we are now able to add for example migration to boxcutter, and apply bundles and reports conditions in a different way in cases of Helm and Boxcutter backends.

I think we should call out the motivation in the description. As it is now, we've refactored things to make it explicit that storage migration isn't part of the helm reconciler.

Done.

joelanford · 2025-11-17T21:34:11Z

To help overcome the concern about the context value type safety, perhaps we could encapsulate the context value handling into a sub-internal package with 100% test coverage, where only typed accessor/getter functions are exported and available to our reconcile functions?

The only downside at that point would be that we'd need nil checks for those values just not being present in the context, and we'd have to handle those errors (or panic).

joelanford · 2025-11-17T21:45:04Z

In general, I see this design as an improvement over what we currently have because it tidies things up a bit, although I also think @perdasilva's comments have merit.

Decomposing and separating out the set of functions and their order makes things easier to physically see. But I think it does potentially introduce surface area for unintended divergence between helm and boxcutter setups (but maybe our test suite has us covered there?).

And I do agree with @perdasilva that context values are worse, in isolation, than our current monolithic structure where the right values are always available in the right places. We could define a single interface that has methods for each of the steps with the right signatures for data exchange. That would be brittle, but maybe the brittleness would be a benefit because it would force us to confront and account for the changes needed in one reconcile flavor vs another?

camilamacedo86 · 2025-11-18T09:54:39Z

Thank you, @pedjak, for this thoughtful refactoring! I really appreciate the effort to improve structure and testability.

After reviewing the changes, I’m not fully sure this pattern is the best fit for our situation. I might be wrong, but since we may end up with a single runtime long-term (likely only Boxcutter), the extra flexibility could turn into unnecessary complexity. In that case, the benefits of the step framework might be short-lived, while the cost of maintaining and later unwinding it remains.

The current Helm/Boxcutter reconcile is already quite simple and explicit. This PR adds several new types, files, and ~100 lines of framework logic. That feels like more machinery than we need right now. If/when Helm gets removed, the existing code path is very easy to clean up. With the framework pattern, we’d have more scaffolding to unwind, even though the composability is no longer needed.

I also have some technical concerns around how context is used. Using context.WithValue to pass internal state between steps is less type-safe and makes dependencies harder to see in controller code. The Go team recommends using context values only for request-scoped data that crosses API boundaries (see the official Go blog), which is different from the use here.

A Simpler Alternative ( Common Approach for Reconciles )

If the main goal is improving testability and separation of concerns, we could break the reconciliation flow into smaller helper methods with explicit parameters. This keeps things type-safe, idiomatic, and easier to follow, while still giving us good structure:

func (r *ClusterExtensionReconciler) reconcile(ctx context.Context, ext *ocv1.ClusterExtension) (ctrl.Result, error) {
    if res, err := r.handleFinalizers(ctx, ext); res != nil || err != nil {
        return *res, err
    }

    if res, err := r.migrateStorageIfNeeded(ctx, ext); res != nil || err != nil {
        return *res, err
    }

    states, err := r.getRevisionStates(ctx, ext)
    if err != nil {
        return r.handleRevisionStatesError(ctx, ext, err)
    }

    resolved, err := r.resolveBundle(ctx, ext, states)
    if err != nil {
        return r.handleResolutionError(ctx, ext, states, err)
    }

    imageFS, err := r.unpackBundle(ctx, ext, resolved)
    if err != nil {
        return r.handleUnpackError(ctx, ext, states, resolved, err)
    }

    return r.applyBundle(ctx, ext, states, resolved, imageFS)
}

camilamacedo86 · 2025-11-18T11:42:57Z

Hi @pedjak,

Just to supplement the discussion — and please take this as a gentle suggestion:

Another important point I'd like to raise is that we should be using controller-runtime's EnvTest (https://book.kubebuilder.io/reference/envtest) to validate our reconcilers, rather than relying on unit tests. Without EnvTest, we're not really exercising the reconciler in a realistic control-plane environment, so these changes won't fully address the underlying issue. Unit tests also require mocks that EnvTest makes unnecessary.

If we follow the recommended EnvTest approach, would we really need the testing benefits this PR is trying to introduce? I'm honestly not sure we would, because unit-testing reconcilers isn't considered a good practice — EnvTest already provides the proper, realistic way to test them without the need for mocks or additional framework layers.

I would gently suggest we hold this change for now, until we have:

Boxcutter in a more mature state,
clarity on how much the reconciliation flows will actually diverge (or not),
clarity on whether we will support more than one runtime,
a better understanding of the next steps for OLM v1.

Once we have that information, it will be much easier to decide whether this framework approach truly fits our future needs.

Copilot

Pull Request Overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated no new comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

pedjak · 2025-11-18T13:24:20Z

And I do agree with @perdasilva that context values are worse, in isolation, than our current monolithic structure where the right values are always available in the right places. We could define a single interface that has methods for each of the steps with the right signatures for data exchange. That would be brittle, but maybe the brittleness would be a benefit because it would force us to confront and account for the changes needed in one reconcile flavor vs another?

@joelanford @perdasilva I have added a dedicated reconcile state holder for exchanging data between the steps, ptal. IMO we do not need to perform nil checks, given that the potential panics are going to bubble up as the reconciliation error, given that controller-runtime RecoverPanic config param is enabled by the default.

pedjak · 2025-11-18T14:23:02Z

The current Helm/Boxcutter reconcile is already quite simple and explicit. This PR adds several new types, files, and ~100 lines of framework logic.

@camilamacedo86

Not really, it adds just a step function definition and ~15 lines of code that iterate over the provided steps. The rest of PR is just splitting the long existing reconcile function into a number of logically separated steps, boundaries were already visible by dedicated log messages like:

"handling finalizers"
"getting installed bundle"
"resolving bundle"
etc.

I mentioned already the motivation for this move in PR description - we have currently two backends and some apply and condition logic is going to be different. Instead of trying to add more conditional logic, splitting it into a chain if steps that could replaced/extended is good way to combat this problem and obtain additional benefits long-term.

The Go team recommends using context values only for request-scoped data that crosses API boundaries (see the official Go blog), which is different from the use here.

It depends how you look at: an argument for the reconcile function is ctrl.Request, a sort of request that needs to be processed. Hence, carrying request-scoped data in the context might be perfectly fine.

perdasilva · 2025-11-20T08:15:31Z

I think the addition of the state object is a good compromise here and resolves the runtime type check issue, which was my biggest concern. If we can get agreement amongst the reviewers here, I'm happy to move forward with this.

…ipeline Replaces the monolithic 170-line reconcile() method with a flexible step-based architecture that executes discrete reconciliation phases in sequence. Each phase (`HandleFinalizers`, `RetrieveRevisionStates`, `RetrieveRevisionMetadata`, `UnpackBundle`, `ApplyBundle`) is now a standalone function that can be composed differently for Helm vs Boxcutter workflows. Changes: - Introduce `ReconcileStepFunc` type and `ReconcileSteps` executor - Extract reconcile logic into individual step functions in new file `clusterextension_reconcile_steps.go` - Move `BoxcutterRevisionStatesGetter` to `boxcutter_reconcile_steps.go` alongside `MigrateStorage` step - Configure step pipelines in `main.go` for each applier type - Refactor tests to use functional options pattern for reconciler setup

Copilot

Pull Request Overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-11-20T15:17:15Z

internal/operator-controller/controllers/clusterextension_controller.go

+type ReconcileStepFunc func(context.Context, *reconcileState, *ocv1.ClusterExtension) (*ctrl.Result, error)
+
+// ReconcileSteps is an ordered collection of reconciliation steps that are executed sequentially.
+// Each step receives the output context from the previous step, allowing data to flow through the pipeline.


The documentation states "Each step receives the output context from the previous step, allowing data to flow through the pipeline" but the context is not modified between steps - it's the same context passed to all steps. The state flows through via the reconcileState parameter, not the context. This should be clarified to say "Each step receives the shared state from previous steps" or similar.

Suggested change

// Each step receives the output context from the previous step, allowing data to flow through the pipeline.

// Each step receives the shared state from previous steps, allowing data to flow through the pipeline.

Copilot · 2025-11-20T15:17:16Z

internal/operator-controller/controllers/clusterextension_controller.go

+// It takes a context and ClusterExtension object as input and executes each step in the ReconcileSteps slice.
+// If any step returns an error, reconciliation stops and the error is returned.
+// If any step returns a non-nil ctrl.Result, reconciliation stops and that result is returned.
+// If any step returns a nil context, reconciliation stops successfully.


The documentation states "If any step returns a nil context, reconciliation stops successfully" but the implementation doesn't check for nil context. Steps never return a context - they return (*ctrl.Result, error). This documentation line appears to be incorrect or copied from a different implementation pattern.

Suggested change

// If any step returns a nil context, reconciliation stops successfully.

Copilot

Pull Request Overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated no new comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot AI review requested due to automatic review settings November 13, 2025 15:54

pedjak requested a review from a team as a code owner November 13, 2025 15:54

openshift-ci bot requested review from bentito and thetechnick November 13, 2025 15:54

pedjak changed the title ~~Refactor ClusterExtension reconciler to use composable step-based pipeline~~ 🌱 Refactor ClusterExtension reconciler to use composable step-based pipeline Nov 13, 2025

Copilot started reviewing on behalf of pedjak November 13, 2025 15:56 View session

Copilot finished reviewing on behalf of pedjak November 13, 2025 15:59

pedjak requested review from perdasilva and removed request for bentito and thetechnick November 13, 2025 16:00

Copilot AI reviewed Nov 13, 2025

View reviewed changes

perdasilva reviewed Nov 17, 2025

View reviewed changes

joelanford reviewed Nov 17, 2025

View reviewed changes

Copilot AI review requested due to automatic review settings November 18, 2025 12:19

pedjak force-pushed the reconciliation-as-chain-of-steps branch from 864c12f to 1cf61b6 Compare November 18, 2025 12:19

Copilot started reviewing on behalf of pedjak November 18, 2025 12:19 View session

Copilot finished reviewing on behalf of pedjak November 18, 2025 12:20

Copilot AI reviewed Nov 18, 2025

View reviewed changes

openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 18, 2025

pedjak force-pushed the reconciliation-as-chain-of-steps branch from 1cf61b6 to fa112fe Compare November 20, 2025 10:52

openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 20, 2025

Removed all fields but Client from ClusterExtensionReconciler

ea86263

Copilot AI review requested due to automatic review settings November 20, 2025 15:08

Copilot started reviewing on behalf of pedjak November 20, 2025 15:09 View session

Copilot finished reviewing on behalf of pedjak November 20, 2025 15:10

Copilot AI reviewed Nov 20, 2025

View reviewed changes

pedjak requested review from Copilot, joelanford and perdasilva November 20, 2025 20:48

Copilot started reviewing on behalf of pedjak November 20, 2025 20:48 View session

Copilot finished reviewing on behalf of pedjak November 20, 2025 20:49

Copilot AI reviewed Nov 20, 2025

View reviewed changes

-		resolvedRevisionMetadata := ctx.Value(resolvedRevisionMetadataKey{}).(*RevisionMetadata)
+		val := ctx.Value(resolvedRevisionMetadataKey{})
+		resolvedRevisionMetadata, ok := val.(*RevisionMetadata)
+		if !ok || resolvedRevisionMetadata == nil {
+			return ctx, nil, errors.New("resolved revision metadata not found in context")
+		}

-		revisionStates := ctx.Value(revisionStatesKey{}).(*RevisionStates)
+		revisionStatesValue := ctx.Value(revisionStatesKey{})
+		if revisionStatesValue == nil {
+			return ctx, nil, errors.New("revision states not found in context")
+		}
+		revisionStates, ok := revisionStatesValue.(*RevisionStates)
+		if !ok {
+			return ctx, nil, errors.New("revision states in context has unexpected type")
+		}

-		resolvedRevisionMetadata := ctx.Value(resolvedRevisionMetadataKey{}).(*RevisionMetadata)
-		imageFS := ctx.Value(imageFSKey{}).(fs.FS)
-		revisionStates := ctx.Value(revisionStatesKey{}).(*RevisionStates)
+		resolvedRevisionMetadataVal := ctx.Value(resolvedRevisionMetadataKey{})
+		if resolvedRevisionMetadataVal == nil {
+			return ctx, nil, errors.New("missing resolvedRevisionMetadata in context")
+		}
+		resolvedRevisionMetadata, ok := resolvedRevisionMetadataVal.(*RevisionMetadata)
+		if !ok {
+			return ctx, nil, errors.New("invalid type for resolvedRevisionMetadata in context")
+		}
+		imageFSVal := ctx.Value(imageFSKey{})
+		if imageFSVal == nil {
+			return ctx, nil, errors.New("missing imageFS in context")
+		}
+		imageFS, ok := imageFSVal.(fs.FS)
+		if !ok {
+			return ctx, nil, errors.New("invalid type for imageFS in context")
+		}
+		revisionStatesVal := ctx.Value(revisionStatesKey{})
+		if revisionStatesVal == nil {
+			return ctx, nil, errors.New("missing revisionStates in context")
+		}
+		revisionStates, ok := revisionStatesVal.(*RevisionStates)
+		if !ok {
+			return ctx, nil, errors.New("invalid type for revisionStates in context")
+		}


		type revisionStatesKey struct{}
		type resolvedRevisionMetadataKey struct{}

-	reconciler.ReconcileSteps = []controllers.ReconcileStepFunc{controllers.HandleFinalizers(reconciler.Finalizers), controllers.RetrieveRevisionStates(reconciler.RevisionStatesGetter)}
+	reconciler.ReconcileSteps = []controllers.ReconcileStepFunc{}
+	if reconciler.Finalizers != nil {
+		reconciler.ReconcileSteps = append(reconciler.ReconcileSteps, controllers.HandleFinalizers(reconciler.Finalizers))
+	}
+	if reconciler.RevisionStatesGetter != nil {
+		reconciler.ReconcileSteps = append(reconciler.ReconcileSteps, controllers.RetrieveRevisionStates(reconciler.RevisionStatesGetter))
+	}

	func RetrieveRevisionMetadata(r resolve.Resolver) ReconcileStepFunc {
	func ResolveBundle(r resolve.Resolver) ReconcileStepFunc {

	// Each step receives the output context from the previous step, allowing data to flow through the pipeline.
	// Each step receives the shared state from previous steps, allowing data to flow through the pipeline.

🌱 Refactor ClusterExtension reconciler to use composable step-based pipeline #2332

Are you sure you want to change the base?

🌱 Refactor ClusterExtension reconciler to use composable step-based pipeline #2332

Uh oh!

Conversation

pedjak commented Nov 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Reviewer Checklist

Uh oh!

netlify bot commented Nov 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for olmv1 ready!

Uh oh!

openshift-ci bot commented Nov 13, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Nov 13, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 13, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 13, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI Nov 13, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 13, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 13, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 13, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 13, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 13, 2025

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Nov 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

perdasilva left a comment

Choose a reason for hiding this comment

Uh oh!

perdasilva Nov 17, 2025

Choose a reason for hiding this comment

Uh oh!

pedjak commented Nov 17, 2025

Uh oh!

joelanford Nov 17, 2025

Choose a reason for hiding this comment

Uh oh!

pedjak Nov 17, 2025

Choose a reason for hiding this comment

Uh oh!

pedjak Nov 20, 2025

Choose a reason for hiding this comment

Uh oh!

perdasilva commented Nov 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pedjak commented Nov 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

joelanford commented Nov 17, 2025

🌱 Refactor `ClusterExtension` reconciler to use composable step-based pipeline #2332

🌱 Refactor `ClusterExtension` reconciler to use composable step-based pipeline #2332

pedjak commented Nov 13, 2025 •

edited

Loading

netlify bot commented Nov 13, 2025 •

edited

Loading

codecov bot commented Nov 13, 2025 •

edited

Loading

perdasilva commented Nov 17, 2025 •

edited

Loading

pedjak commented Nov 17, 2025 •

edited

Loading

camilamacedo86 commented Nov 18, 2025 •

edited

Loading

pedjak commented Nov 18, 2025 •

edited

Loading