-
Notifications
You must be signed in to change notification settings - Fork 162
[release-1.14] On metric token refresh, also delete the ServiceMonitor (#3790) #3795
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: release-1.14
Are you sure you want to change the base?
[release-1.14] On metric token refresh, also delete the ServiceMonitor (#3790) #3795
Conversation
Currently, on token refresh, we're recreating the secret. But Prometheus does not fetch the new secret, and fails to access the metrics endpoint, rceiving 401 from the pod. This PR fixes the issue by also remove the ServiceMonitor, letting the next reconcile loop to re-create it, to force Prometheus, afer a few minutes, to re fetch the token from the secret. Signed-off-by: Nahshon Unna-Tsameret <[email protected]> Co-authored-by: Nahshon Unna-Tsameret <[email protected]>
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey there - I've reviewed your changes - here's some feedback:
- The ServiceMonitor absence assertion in the test uses MatchError incorrectly—switch to a direct apierrors.IsNotFound(err) check on the returned error instead of MatchError.
- Factory function names for secrets and ServiceMonitors mix Create/New prefixes and exported/unexported variants—consider standardizing these naming patterns for better readability.
- The comment above UpdateExistingResource in secret.go still references the old secretReconciler name—update it to reflect SecretReconciler for clarity.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The ServiceMonitor absence assertion in the test uses MatchError incorrectly—switch to a direct apierrors.IsNotFound(err) check on the returned error instead of MatchError.
- Factory function names for secrets and ServiceMonitors mix Create/New prefixes and exported/unexported variants—consider standardizing these naming patterns for better readability.
- The comment above UpdateExistingResource in secret.go still references the old secretReconciler name—update it to reflect SecretReconciler for clarity.
## Individual Comments
### Comment 1
<location> `controllers/alerts/serviceMonitor.go:49` </location>
<code_context>
}
-func (r serviceMonitorReconciler) UpdateExistingResource(ctx context.Context, cl client.Client, resource client.Object, logger logr.Logger) (client.Object, bool, error) {
+func (r ServiceMonitorReconciler) UpdateExistingResource(ctx context.Context, cl client.Client, resource client.Object, logger logr.Logger) (client.Object, bool, error) {
found := resource.(*monitoringv1.ServiceMonitor)
+
</code_context>
<issue_to_address>
**issue (complexity):** Consider removing the Refresher indirection and helper method to simplify the update logic in ServiceMonitor reconciliation.
Here the new Refresher indirection and `deleteServiceMonitor` helper don’t really buy you much, and they obscure your real intent. You can flatten this back out by:
1. Dropping the `Refresher` field and the helper entirely.
2. Inlining the `cl.Delete` (and immediately `cl.Create` if you really need a full refresh) or even better, using a simple `Patch` to update just the changed fields.
For example, to recreate on every spec‐change:
```go
type ServiceMonitorReconciler struct {
svc *monitoringv1.ServiceMonitor
}
func NewServiceMonitorReconciler(svc *monitoringv1.ServiceMonitor) *ServiceMonitorReconciler {
return &ServiceMonitorReconciler{svc: svc}
}
func (r *ServiceMonitorReconciler) UpdateExistingResource(ctx context.Context, cl client.Client, existing client.Object, log logr.Logger) (client.Object, bool, error) {
found := existing.(*monitoringv1.ServiceMonitor)
// delete old
if err := cl.Delete(ctx, found); err != nil {
log.Error(err, "failed to delete ServiceMonitor")
return nil, false, err
}
// recreate new
newSM := r.svc.DeepCopy()
if err := cl.Create(ctx, newSM); err != nil {
log.Error(err, "failed to recreate ServiceMonitor")
return nil, false, err
}
log.Info("refreshed ServiceMonitor")
return newSM, true, nil
}
```
Or—if you only need to update fields in place—a simple patch is even clearer:
```go
func (r *ServiceMonitorReconciler) UpdateExistingResource(ctx context.Context, cl client.Client, existing client.Object, log logr.Logger) (client.Object, bool, error) {
found := existing.(*monitoringv1.ServiceMonitor)
original := found.DeepCopy()
// sync just the Spec (and metadata) in place
r.svc.Spec.DeepCopyInto(&found.Spec)
updateCommonDetails(&r.svc.ObjectMeta, &found.ObjectMeta)
if err := cl.Patch(ctx, found, client.MergeFrom(original)); err != nil {
log.Error(err, "failed to patch ServiceMonitor")
return nil, false, err
}
log.Info("successfully patched ServiceMonitor")
return found, true, nil
}
```
Either approach removes the extra layer of indirection and makes your reconcile path much easier to follow.
</issue_to_address>
### Comment 2
<location> `controllers/alerts/refresher.go:7` </location>
<code_context>
+ "sync"
+)
+
+type Refresher interface {
+ setShouldRefresh()
+ refresh(f func() error) error
</code_context>
<issue_to_address>
**issue (complexity):** Consider removing the custom Refresher interface and mutex in favor of a simple boolean flag on the controller struct to track refresh state.
You don’t need a custom interface or mutex here—controller‐runtime’s reconcile loop is already single‐threaded, so you can just track “dirty” state with a bool on your Reconciler and inline the refresh. For example:
```go
// inside your controller struct
type AlertReconciler struct {
client.Client
Log logr.Logger
needsRefresh bool
}
// in SetupWithManager, mark “dirty” on updates
func (r *AlertReconciler) SetupWithManager(mgr ctrl.Manager) error {
return ctrl.NewControllerManagedBy(mgr).
For(&alertsv1alpha1.Alert{}).
WithEventFilter(predicate.Funcs{
UpdateFunc: func(e event.UpdateEvent) bool {
r.needsRefresh = true
return true
},
}).
Complete(r)
}
// in Reconcile, only run your refresh once when needed
func (r *AlertReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
// ... your existing reconcile logic …
if r.needsRefresh {
if err := r.refreshAlerts(ctx); err != nil {
return ctrl.Result{}, err
}
r.needsRefresh = false
}
return ctrl.Result{}, nil
}
// move your “f” body into a method
func (r *AlertReconciler) refreshAlerts(ctx context.Context) error {
// existing f() logic
return nil
}
```
This preserves exactly the same “only run once per change” semantics but removes the extra file, interface, and Mutex/flag indirection.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
Pull Request Test Coverage Report for Build 18159384946Details
💛 - Coveralls |
|
hco-e2e-kv-smoke-gcp lane succeeded. |
|
@hco-bot: Overrode contexts on behalf of hco-bot: ci/prow/hco-e2e-kv-smoke-azure In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/retest |
|
@machadovilaca: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
hco-e2e-operator-sdk-azure, hco-e2e-operator-sdk-aws lanes succeeded. |
|
@hco-bot: Overrode contexts on behalf of hco-bot: ci/prow/hco-e2e-operator-sdk-gcp In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
@machadovilaca should we not also backport #3756 to 1.14? |



Currently, on token refresh, we're recreating the secret. But Prometheus does not fetch the new secret, and fails to access the metrics endpoint, rceiving 401 from the pod.
This PR fixes the issue by also remove the ServiceMonitor, letting the next reconcile loop to re-create it, to force Prometheus, afer a few minutes, to re fetch the token from the secret.
What this PR does / why we need it:
This is a manual cherry-pick of #3764
Reviewer Checklist
Jira Ticket:
Release note: