Skip to content

Conversation

@machadovilaca
Copy link
Member

Currently, on token refresh, we're recreating the secret. But Prometheus does not fetch the new secret, and fails to access the metrics endpoint, rceiving 401 from the pod.

This PR fixes the issue by also remove the ServiceMonitor, letting the next reconcile loop to re-create it, to force Prometheus, afer a few minutes, to re fetch the token from the secret.

What this PR does / why we need it:

This is a manual cherry-pick of #3764

Reviewer Checklist

Reviewers are supposed to review the PR for every aspect below one by one. To check an item means the PR is either "OK" or "Not Applicable" in terms of that item. All items are supposed to be checked before merging a PR.

  • PR Message
  • Commit Messages
  • How to test
  • Unit Tests
  • Functional Tests
  • User Documentation
  • Developer Documentation
  • Upgrade Scenario
  • Uninstallation Scenario
  • Backward Compatibility
  • Troubleshooting Friendly

Jira Ticket:

https://issues.redhat.com/browse/OCPBUGS-59858

Release note:

On metric token refresh, also delete the ServiceMonitor

Currently, on token refresh, we're recreating the secret. But Prometheus
does not fetch the new secret, and fails to access the metrics endpoint,
rceiving 401 from the pod.

This PR fixes the issue by also remove the ServiceMonitor, letting the
next reconcile loop to re-create it, to force Prometheus, afer a few
minutes, to re fetch the token from the secret.

Signed-off-by: Nahshon Unna-Tsameret <[email protected]>
Co-authored-by: Nahshon Unna-Tsameret <[email protected]>
@kubevirt-bot kubevirt-bot added release-note Denotes a PR that will be considered when it comes time to generate release notes. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. labels Oct 1, 2025
@kubevirt-bot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign nunnatsa for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@sonarqubecloud
Copy link

sonarqubecloud bot commented Oct 1, 2025

Copy link

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there - I've reviewed your changes - here's some feedback:

  • The ServiceMonitor absence assertion in the test uses MatchError incorrectly—switch to a direct apierrors.IsNotFound(err) check on the returned error instead of MatchError.
  • Factory function names for secrets and ServiceMonitors mix Create/New prefixes and exported/unexported variants—consider standardizing these naming patterns for better readability.
  • The comment above UpdateExistingResource in secret.go still references the old secretReconciler name—update it to reflect SecretReconciler for clarity.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The ServiceMonitor absence assertion in the test uses MatchError incorrectly—switch to a direct apierrors.IsNotFound(err) check on the returned error instead of MatchError.
- Factory function names for secrets and ServiceMonitors mix Create/New prefixes and exported/unexported variants—consider standardizing these naming patterns for better readability.
- The comment above UpdateExistingResource in secret.go still references the old secretReconciler name—update it to reflect SecretReconciler for clarity.

## Individual Comments

### Comment 1
<location> `controllers/alerts/serviceMonitor.go:49` </location>
<code_context>
 }

-func (r serviceMonitorReconciler) UpdateExistingResource(ctx context.Context, cl client.Client, resource client.Object, logger logr.Logger) (client.Object, bool, error) {
+func (r ServiceMonitorReconciler) UpdateExistingResource(ctx context.Context, cl client.Client, resource client.Object, logger logr.Logger) (client.Object, bool, error) {
 	found := resource.(*monitoringv1.ServiceMonitor)
+
</code_context>

<issue_to_address>
**issue (complexity):** Consider removing the Refresher indirection and helper method to simplify the update logic in ServiceMonitor reconciliation.

Here the new Refresher indirection and `deleteServiceMonitor` helper don’t really buy you much, and they obscure your real intent. You can flatten this back out by:

1. Dropping the `Refresher` field and the helper entirely.
2. Inlining the `cl.Delete` (and immediately `cl.Create` if you really need a full refresh) or even better, using a simple `Patch` to update just the changed fields.

For example, to recreate on every spec‐change:

```go
type ServiceMonitorReconciler struct {
    svc *monitoringv1.ServiceMonitor
}

func NewServiceMonitorReconciler(svc *monitoringv1.ServiceMonitor) *ServiceMonitorReconciler {
    return &ServiceMonitorReconciler{svc: svc}
}

func (r *ServiceMonitorReconciler) UpdateExistingResource(ctx context.Context, cl client.Client, existing client.Object, log logr.Logger) (client.Object, bool, error) {
    found := existing.(*monitoringv1.ServiceMonitor)

    // delete old
    if err := cl.Delete(ctx, found); err != nil {
        log.Error(err, "failed to delete ServiceMonitor")
        return nil, false, err
    }
    // recreate new
    newSM := r.svc.DeepCopy()
    if err := cl.Create(ctx, newSM); err != nil {
        log.Error(err, "failed to recreate ServiceMonitor")
        return nil, false, err
    }
    log.Info("refreshed ServiceMonitor")
    return newSM, true, nil
}
```

Or—if you only need to update fields in place—a simple patch is even clearer:

```go
func (r *ServiceMonitorReconciler) UpdateExistingResource(ctx context.Context, cl client.Client, existing client.Object, log logr.Logger) (client.Object, bool, error) {
    found := existing.(*monitoringv1.ServiceMonitor)
    original := found.DeepCopy()

    // sync just the Spec (and metadata) in place
    r.svc.Spec.DeepCopyInto(&found.Spec)
    updateCommonDetails(&r.svc.ObjectMeta, &found.ObjectMeta)

    if err := cl.Patch(ctx, found, client.MergeFrom(original)); err != nil {
        log.Error(err, "failed to patch ServiceMonitor")
        return nil, false, err
    }
    log.Info("successfully patched ServiceMonitor")
    return found, true, nil
}
```

Either approach removes the extra layer of indirection and makes your reconcile path much easier to follow.
</issue_to_address>

### Comment 2
<location> `controllers/alerts/refresher.go:7` </location>
<code_context>
+	"sync"
+)
+
+type Refresher interface {
+	setShouldRefresh()
+	refresh(f func() error) error
</code_context>

<issue_to_address>
**issue (complexity):** Consider removing the custom Refresher interface and mutex in favor of a simple boolean flag on the controller struct to track refresh state.

You don’t need a custom interface or mutex here—controller‐runtime’s reconcile loop is already single‐threaded, so you can just track “dirty” state with a bool on your Reconciler and inline the refresh. For example:

```go
// inside your controller struct
type AlertReconciler struct {
    client.Client
    Log logr.Logger

    needsRefresh bool
}

// in SetupWithManager, mark “dirty” on updates
func (r *AlertReconciler) SetupWithManager(mgr ctrl.Manager) error {
    return ctrl.NewControllerManagedBy(mgr).
        For(&alertsv1alpha1.Alert{}).
        WithEventFilter(predicate.Funcs{
            UpdateFunc: func(e event.UpdateEvent) bool {
                r.needsRefresh = true
                return true
            },
        }).
        Complete(r)
}

// in Reconcile, only run your refresh once when needed
func (r *AlertReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
    // ... your existing reconcile logic …

    if r.needsRefresh {
        if err := r.refreshAlerts(ctx); err != nil {
            return ctrl.Result{}, err
        }
        r.needsRefresh = false
    }

    return ctrl.Result{}, nil
}

// move your “f” body into a method
func (r *AlertReconciler) refreshAlerts(ctx context.Context) error {
    // existing f() logic
    return nil
}
```

This preserves exactly the same “only run once per change” semantics but removes the extra file, interface, and Mutex/flag indirection.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@coveralls
Copy link
Collaborator

Pull Request Test Coverage Report for Build 18159384946

Details

  • 97 of 106 (91.51%) changed or added relevant lines in 4 files are covered.
  • 3 unchanged lines in 1 file lost coverage.
  • Overall coverage increased (+0.2%) to 72.475%

Changes Missing Coverage Covered Lines Changed/Added Lines %
controllers/alerts/refresher.go 23 25 92.0%
controllers/alerts/serviceMonitor.go 37 39 94.87%
controllers/alerts/secret.go 33 38 86.84%
Files with Coverage Reduction New Missed Lines %
controllers/operands/operandHandler.go 3 86.14%
Totals Coverage Status
Change from base Build 17581526508: 0.2%
Covered Lines: 6543
Relevant Lines: 9028

💛 - Coveralls

@hco-bot
Copy link
Collaborator

hco-bot commented Oct 1, 2025

hco-e2e-kv-smoke-gcp lane succeeded.
/override ci/prow/hco-e2e-kv-smoke-azure

@kubevirt-bot
Copy link
Contributor

@hco-bot: Overrode contexts on behalf of hco-bot: ci/prow/hco-e2e-kv-smoke-azure

In response to this:

hco-e2e-kv-smoke-gcp lane succeeded.
/override ci/prow/hco-e2e-kv-smoke-azure

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@machadovilaca
Copy link
Member Author

/retest

@openshift-ci
Copy link

openshift-ci bot commented Oct 7, 2025

@machadovilaca: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/hco-e2e-operator-sdk-gcp 30d102f link true /test hco-e2e-operator-sdk-gcp
ci/prow/hco-e2e-operator-sdk-sno-azure 30d102f link false /test hco-e2e-operator-sdk-sno-azure
ci/prow/hco-e2e-upgrade-prev-operator-sdk-azure 30d102f link true /test hco-e2e-upgrade-prev-operator-sdk-azure
ci/prow/hco-e2e-operator-sdk-azure 30d102f link true /test hco-e2e-operator-sdk-azure
ci/prow/hco-e2e-upgrade-operator-sdk-azure 30d102f link true /test hco-e2e-upgrade-operator-sdk-azure
ci/prow/hco-e2e-upgrade-prev-operator-sdk-sno-aws 30d102f link false /test hco-e2e-upgrade-prev-operator-sdk-sno-aws
ci/prow/hco-e2e-upgrade-operator-sdk-sno-aws 30d102f link false /test hco-e2e-upgrade-operator-sdk-sno-aws
ci/prow/hco-e2e-operator-sdk-aws 30d102f link true /test hco-e2e-operator-sdk-aws
ci/prow/hco-e2e-upgrade-prev-operator-sdk-sno-azure 30d102f link false /test hco-e2e-upgrade-prev-operator-sdk-sno-azure
ci/prow/hco-e2e-upgrade-operator-sdk-aws 30d102f link true /test hco-e2e-upgrade-operator-sdk-aws
ci/prow/hco-e2e-upgrade-prev-operator-sdk-aws 30d102f link true /test hco-e2e-upgrade-prev-operator-sdk-aws
ci/prow/hco-e2e-upgrade-operator-sdk-sno-azure 30d102f link false /test hco-e2e-upgrade-operator-sdk-sno-azure
ci/prow/hco-e2e-operator-sdk-sno-aws 30d102f link false /test hco-e2e-operator-sdk-sno-aws

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@hco-bot
Copy link
Collaborator

hco-bot commented Oct 14, 2025

hco-e2e-operator-sdk-azure, hco-e2e-operator-sdk-aws lanes succeeded.
/override ci/prow/hco-e2e-operator-sdk-gcp

@kubevirt-bot
Copy link
Contributor

@hco-bot: Overrode contexts on behalf of hco-bot: ci/prow/hco-e2e-operator-sdk-gcp

In response to this:

hco-e2e-operator-sdk-azure, hco-e2e-operator-sdk-aws lanes succeeded.
/override ci/prow/hco-e2e-operator-sdk-gcp

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@sradco
Copy link
Collaborator

sradco commented Oct 19, 2025

@machadovilaca should we not also backport #3756 to 1.14?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dco-signoff: yes Indicates the PR's author has DCO signed all their commits. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants