[release-1.14] On metric token refresh, also delete the ServiceMonitor (#3790) #3795

machadovilaca · 2025-10-01T10:32:26Z

Currently, on token refresh, we're recreating the secret. But Prometheus does not fetch the new secret, and fails to access the metrics endpoint, rceiving 401 from the pod.

This PR fixes the issue by also remove the ServiceMonitor, letting the next reconcile loop to re-create it, to force Prometheus, afer a few minutes, to re fetch the token from the secret.

What this PR does / why we need it:

This is a manual cherry-pick of #3764

Reviewer Checklist

Reviewers are supposed to review the PR for every aspect below one by one. To check an item means the PR is either "OK" or "Not Applicable" in terms of that item. All items are supposed to be checked before merging a PR.

Jira Ticket:

https://issues.redhat.com/browse/OCPBUGS-59858

Release note:

On metric token refresh, also delete the ServiceMonitor

Currently, on token refresh, we're recreating the secret. But Prometheus does not fetch the new secret, and fails to access the metrics endpoint, rceiving 401 from the pod. This PR fixes the issue by also remove the ServiceMonitor, letting the next reconcile loop to re-create it, to force Prometheus, afer a few minutes, to re fetch the token from the secret. Signed-off-by: Nahshon Unna-Tsameret <[email protected]> Co-authored-by: Nahshon Unna-Tsameret <[email protected]>

kubevirt-bot · 2025-10-01T10:32:34Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign nunnatsa for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

sonarqubecloud · 2025-10-01T10:33:01Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

sourcery-ai

Hey there - I've reviewed your changes - here's some feedback:

The ServiceMonitor absence assertion in the test uses MatchError incorrectly—switch to a direct apierrors.IsNotFound(err) check on the returned error instead of MatchError.
Factory function names for secrets and ServiceMonitors mix Create/New prefixes and exported/unexported variants—consider standardizing these naming patterns for better readability.
The comment above UpdateExistingResource in secret.go still references the old secretReconciler name—update it to reflect SecretReconciler for clarity.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- The ServiceMonitor absence assertion in the test uses MatchError incorrectly—switch to a direct apierrors.IsNotFound(err) check on the returned error instead of MatchError.
- Factory function names for secrets and ServiceMonitors mix Create/New prefixes and exported/unexported variants—consider standardizing these naming patterns for better readability.
- The comment above UpdateExistingResource in secret.go still references the old secretReconciler name—update it to reflect SecretReconciler for clarity.

## Individual Comments

### Comment 1
<location> `controllers/alerts/serviceMonitor.go:49` </location>
<code_context>
 }

-func (r serviceMonitorReconciler) UpdateExistingResource(ctx context.Context, cl client.Client, resource client.Object, logger logr.Logger) (client.Object, bool, error) {
+func (r ServiceMonitorReconciler) UpdateExistingResource(ctx context.Context, cl client.Client, resource client.Object, logger logr.Logger) (client.Object, bool, error) {
 	found := resource.(*monitoringv1.ServiceMonitor)
+
</code_context>

<issue_to_address>
**issue (complexity):** Consider removing the Refresher indirection and helper method to simplify the update logic in ServiceMonitor reconciliation.

Here the new Refresher indirection and `deleteServiceMonitor` helper don’t really buy you much, and they obscure your real intent. You can flatten this back out by:

1. Dropping the `Refresher` field and the helper entirely.
2. Inlining the `cl.Delete` (and immediately `cl.Create` if you really need a full refresh) or even better, using a simple `Patch` to update just the changed fields.

For example, to recreate on every spec‐change:

```go
type ServiceMonitorReconciler struct {
    svc *monitoringv1.ServiceMonitor
}

func NewServiceMonitorReconciler(svc *monitoringv1.ServiceMonitor) *ServiceMonitorReconciler {
    return &ServiceMonitorReconciler{svc: svc}
}

func (r *ServiceMonitorReconciler) UpdateExistingResource(ctx context.Context, cl client.Client, existing client.Object, log logr.Logger) (client.Object, bool, error) {
    found := existing.(*monitoringv1.ServiceMonitor)

    // delete old
    if err := cl.Delete(ctx, found); err != nil {
        log.Error(err, "failed to delete ServiceMonitor")
        return nil, false, err
    }
    // recreate new
    newSM := r.svc.DeepCopy()
    if err := cl.Create(ctx, newSM); err != nil {
        log.Error(err, "failed to recreate ServiceMonitor")
        return nil, false, err
    }
    log.Info("refreshed ServiceMonitor")
    return newSM, true, nil
}
```

Or—if you only need to update fields in place—a simple patch is even clearer:

```go
func (r *ServiceMonitorReconciler) UpdateExistingResource(ctx context.Context, cl client.Client, existing client.Object, log logr.Logger) (client.Object, bool, error) {
    found := existing.(*monitoringv1.ServiceMonitor)
    original := found.DeepCopy()

    // sync just the Spec (and metadata) in place
    r.svc.Spec.DeepCopyInto(&found.Spec)
    updateCommonDetails(&r.svc.ObjectMeta, &found.ObjectMeta)

    if err := cl.Patch(ctx, found, client.MergeFrom(original)); err != nil {
        log.Error(err, "failed to patch ServiceMonitor")
        return nil, false, err
    }
    log.Info("successfully patched ServiceMonitor")
    return found, true, nil
}
```

Either approach removes the extra layer of indirection and makes your reconcile path much easier to follow.
</issue_to_address>

### Comment 2
<location> `controllers/alerts/refresher.go:7` </location>
<code_context>
+	"sync"
+)
+
+type Refresher interface {
+	setShouldRefresh()
+	refresh(f func() error) error
</code_context>

<issue_to_address>
**issue (complexity):** Consider removing the custom Refresher interface and mutex in favor of a simple boolean flag on the controller struct to track refresh state.

You don’t need a custom interface or mutex here—controller‐runtime’s reconcile loop is already single‐threaded, so you can just track “dirty” state with a bool on your Reconciler and inline the refresh. For example:

```go
// inside your controller struct
type AlertReconciler struct {
    client.Client
    Log logr.Logger

    needsRefresh bool
}

// in SetupWithManager, mark “dirty” on updates
func (r *AlertReconciler) SetupWithManager(mgr ctrl.Manager) error {
    return ctrl.NewControllerManagedBy(mgr).
        For(&alertsv1alpha1.Alert{}).
        WithEventFilter(predicate.Funcs{
            UpdateFunc: func(e event.UpdateEvent) bool {
                r.needsRefresh = true
                return true
            },
        }).
        Complete(r)
}

// in Reconcile, only run your refresh once when needed
func (r *AlertReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
    // ... your existing reconcile logic …

    if r.needsRefresh {
        if err := r.refreshAlerts(ctx); err != nil {
            return ctrl.Result{}, err
        }
        r.needsRefresh = false
    }

    return ctrl.Result{}, nil
}

// move your “f” body into a method
func (r *AlertReconciler) refreshAlerts(ctx context.Context) error {
    // existing f() logic
    return nil
}
```

This preserves exactly the same “only run once per change” semantics but removes the extra file, interface, and Mutex/flag indirection.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

controllers/alerts/serviceMonitor.go

controllers/alerts/refresher.go

coveralls · 2025-10-01T10:38:10Z

Pull Request Test Coverage Report for Build 18159384946

Details

97 of 106 (91.51%) changed or added relevant lines in 4 files are covered.
3 unchanged lines in 1 file lost coverage.
Overall coverage increased (+0.2%) to 72.475%

Changes Missing Coverage	Covered Lines	Changed/Added Lines	%
controllers/alerts/refresher.go	23	25	92.0%
controllers/alerts/serviceMonitor.go	37	39	94.87%
controllers/alerts/secret.go	33	38	86.84%

Files with Coverage Reduction	New Missed Lines	%
controllers/operands/operandHandler.go	3	86.14%

Totals
Change from base Build 17581526508:	0.2%
Covered Lines:	6543
Relevant Lines:	9028

💛 - Coveralls

hco-bot · 2025-10-01T13:21:07Z

hco-e2e-kv-smoke-gcp lane succeeded.
/override ci/prow/hco-e2e-kv-smoke-azure

kubevirt-bot · 2025-10-01T13:21:11Z

@hco-bot: Overrode contexts on behalf of hco-bot: ci/prow/hco-e2e-kv-smoke-azure

In response to this:

hco-e2e-kv-smoke-gcp lane succeeded.
/override ci/prow/hco-e2e-kv-smoke-azure

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

machadovilaca · 2025-10-07T18:17:28Z

/retest

openshift-ci · 2025-10-07T20:52:11Z

@machadovilaca: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/hco-e2e-operator-sdk-gcp	`30d102f`	link	true	`/test hco-e2e-operator-sdk-gcp`
ci/prow/hco-e2e-operator-sdk-sno-azure	`30d102f`	link	false	`/test hco-e2e-operator-sdk-sno-azure`
ci/prow/hco-e2e-upgrade-prev-operator-sdk-azure	`30d102f`	link	true	`/test hco-e2e-upgrade-prev-operator-sdk-azure`
ci/prow/hco-e2e-operator-sdk-azure	`30d102f`	link	true	`/test hco-e2e-operator-sdk-azure`
ci/prow/hco-e2e-upgrade-operator-sdk-azure	`30d102f`	link	true	`/test hco-e2e-upgrade-operator-sdk-azure`
ci/prow/hco-e2e-upgrade-prev-operator-sdk-sno-aws	`30d102f`	link	false	`/test hco-e2e-upgrade-prev-operator-sdk-sno-aws`
ci/prow/hco-e2e-upgrade-operator-sdk-sno-aws	`30d102f`	link	false	`/test hco-e2e-upgrade-operator-sdk-sno-aws`
ci/prow/hco-e2e-operator-sdk-aws	`30d102f`	link	true	`/test hco-e2e-operator-sdk-aws`
ci/prow/hco-e2e-upgrade-prev-operator-sdk-sno-azure	`30d102f`	link	false	`/test hco-e2e-upgrade-prev-operator-sdk-sno-azure`
ci/prow/hco-e2e-upgrade-operator-sdk-aws	`30d102f`	link	true	`/test hco-e2e-upgrade-operator-sdk-aws`
ci/prow/hco-e2e-upgrade-prev-operator-sdk-aws	`30d102f`	link	true	`/test hco-e2e-upgrade-prev-operator-sdk-aws`
ci/prow/hco-e2e-upgrade-operator-sdk-sno-azure	`30d102f`	link	false	`/test hco-e2e-upgrade-operator-sdk-sno-azure`
ci/prow/hco-e2e-operator-sdk-sno-aws	`30d102f`	link	false	`/test hco-e2e-operator-sdk-sno-aws`

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

hco-bot · 2025-10-14T08:47:11Z

hco-e2e-operator-sdk-azure, hco-e2e-operator-sdk-aws lanes succeeded.
/override ci/prow/hco-e2e-operator-sdk-gcp

kubevirt-bot · 2025-10-14T08:47:17Z

@hco-bot: Overrode contexts on behalf of hco-bot: ci/prow/hco-e2e-operator-sdk-gcp

In response to this:

hco-e2e-operator-sdk-azure, hco-e2e-operator-sdk-aws lanes succeeded.
/override ci/prow/hco-e2e-operator-sdk-gcp

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

sradco · 2025-10-19T09:30:47Z

@machadovilaca should we not also backport #3756 to 1.14?

machadovilaca requested review from nunnatsa and orenc1 October 1, 2025 10:32

kubevirt-bot added release-note Denotes a PR that will be considered when it comes time to generate release notes. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. labels Oct 1, 2025

kubevirt-bot requested a review from tiraboschi October 1, 2025 10:32

kubevirt-bot added the size/L label Oct 1, 2025

sourcery-ai bot reviewed Oct 1, 2025

View reviewed changes

controllers/alerts/serviceMonitor.go Show resolved Hide resolved

controllers/alerts/refresher.go Show resolved Hide resolved

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[release-1.14] On metric token refresh, also delete the ServiceMonitor (#3790) #3795

[release-1.14] On metric token refresh, also delete the ServiceMonitor (#3790) #3795

machadovilaca commented Oct 1, 2025

Uh oh!

kubevirt-bot commented Oct 1, 2025

Uh oh!

sonarqubecloud bot commented Oct 1, 2025

Uh oh!

sourcery-ai bot left a comment

Uh oh!

Uh oh!

Uh oh!

coveralls commented Oct 1, 2025

Uh oh!

hco-bot commented Oct 1, 2025

Uh oh!

kubevirt-bot commented Oct 1, 2025

Uh oh!

machadovilaca commented Oct 7, 2025

Uh oh!

openshift-ci bot commented Oct 7, 2025

Uh oh!

hco-bot commented Oct 14, 2025

Uh oh!

kubevirt-bot commented Oct 14, 2025

Uh oh!

sradco commented Oct 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

[release-1.14] On metric token refresh, also delete the ServiceMonitor (#3790) #3795

Are you sure you want to change the base?

[release-1.14] On metric token refresh, also delete the ServiceMonitor (#3790) #3795

Conversation

machadovilaca commented Oct 1, 2025

Uh oh!

kubevirt-bot commented Oct 1, 2025

Uh oh!

sonarqubecloud bot commented Oct 1, 2025

Quality Gate passed

Uh oh!

sourcery-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coveralls commented Oct 1, 2025

Pull Request Test Coverage Report for Build 18159384946

Details

💛 - Coveralls

Uh oh!

hco-bot commented Oct 1, 2025

Uh oh!

kubevirt-bot commented Oct 1, 2025

Uh oh!

machadovilaca commented Oct 7, 2025

Uh oh!

openshift-ci bot commented Oct 7, 2025

Uh oh!

hco-bot commented Oct 14, 2025

Uh oh!

kubevirt-bot commented Oct 14, 2025

Uh oh!

sradco commented Oct 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants