Skip to content

Conversation

jklaw90
Copy link
Contributor

@jklaw90 jklaw90 commented Jul 23, 2025

What type of PR is this?

/kind bug

What this PR does / why we need it:

Server doesn't have a health check, so gets traffic before its ready causing failed open requests.

Which issue(s) this PR fixes:

Fixes #8344

Special notes for your reviewer:

Does this PR introduce a user-facing change?

updated manifests to add liveness and readiness checks to all VPA components.

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


@k8s-ci-robot k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-area area/vertical-pod-autoscaler and removed do-not-merge/needs-area labels Jul 23, 2025
@k8s-ci-robot k8s-ci-robot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Jul 23, 2025
@@ -151,6 +151,10 @@ func main() {
as.Serve(w, r)
healthCheck.UpdateLastActivity()
})
http.HandleFunc("/healthz", func(w http.ResponseWriter, r *http.Request) {
healthCheck.ServeHTTP(w, r)
Copy link
Contributor Author

@jklaw90 jklaw90 Jul 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could just respond with 200, but we were already using the healthcheck struct so assumed we should just use it here

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is already a /health-check endpoint that all three VPA components have

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh got it, will update

Copy link
Contributor Author

@jklaw90 jklaw90 Jul 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would this cause an issue when no pods are admitted for a minute(idle cluster)? the probes would start failing?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not as far as I can tell. Why? Have you seen evidence of that happening?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, the admission-controller never runs healthCheck.StartMonitoring(), so that situation will never happen.
The other components do run that, and they also operate on a loop, so they keep updating the lastActivity

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ahh missed that. so it should be good as is then. thank you!

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Jul 23, 2025
@omerap12
Copy link
Member

Thanks for the update. Since we are already updating the admission controller, can we also update other things at the same time? (like the recommender and updater)

@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Jul 29, 2025
livenessProbe:
httpGet:
path: /health-check
port: prometheus
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name of this port is weird, since it's mixed used.
I don't know if that happens much though

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, seemed safer since port is a flag in code.

@adrianmoisey
Copy link
Member

Could you update the release note?

@adrianmoisey
Copy link
Member

/lgtm

cc @omerap12

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 30, 2025
Copy link
Member

@omerap12 omerap12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!
Agree on the port name though

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jklaw90, omerap12

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 30, 2025
@k8s-ci-robot k8s-ci-robot merged commit ff6e93b into kubernetes:master Jul 30, 2025
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/vertical-pod-autoscaler cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

VPA admission fails open on pod startup.
4 participants