KEP-4781 restarting kubelet does not change pod status #5493

HirazawaUi · 2025-08-23T12:07:28Z

One-line PR description: This KEP aims to ensure that restarting the kubelet for a short period does not affect the status of pods on the node.

Issue link: Fix inconsistent container ready state after kubelet restart #4781

Other comments:

k8s-ci-robot · 2025-08-23T12:07:34Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: HirazawaUi
Once this PR has been reviewed and has the lgtm label, please assign dchen1107 for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

keps/sig-node/OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

HirazawaUi · 2025-08-25T15:50:36Z

keps/sig-node/4781-kublet-restart-pod-status/README.md

+By preserving the old state without immediate health checks, there is a delay in recognizing containers that have become unhealthy during or after kubelet's downtime. Services relying on Pod readiness for service discovery might continue directing traffic to Pods with containers that are no longer healthy but are still reported as Ready.
+We plan to immediately trigger a probe after that to reduce the risk caused by such delays.
+
+## Design Details


I did not refer to the implementation approach of the previous KEP. After reviewing the POC PR related to that KEP, I found the implementation process somewhat cumbersome, and it also presented some potential edge case issues.

After tracing the pod status transition process, I adopted a new implementation method to achieve the goal: consistently relying on the detection results of the probeManager. This approach simplifies the implementation and helps us avoid certain edge cases. And in this section, the behavioral differences of kubelet under several scenarios are also analyzed. Could you please take a look?

My POC PR: kubernetes/kubernetes#133676

@SergeyKanzhelev @thockin

keps/sig-node/4781-kublet-restart-pod-status/kep.yaml

toVersus · 2025-09-04T23:16:34Z

keps/sig-node/4781-kublet-restart-pod-status/README.md

+
+2. We ensure that if the `Started` field in the container status is true, the container is considered started (since the startupProbe only runs during container startup and will not execute again once completed).
+
+3. If the Kubelet restart occurs within the `nodeMonitorGracePeriod` and the Pod’s Ready condition is set to false, we will set the container’s ready status to false. It will remain in this state until subsequent probes reset it to true.


If the Kubelet restart occurs within the nodeMonitorGracePeriod

Does this mean the case where the kubelet has been down for longer than nodeMonitorGracePeriod and then restarts afterward?

So basically, the Node Lifecycle Controller notices that the Lease hasn’t been updated past nodeMonitorGracePeriod, marks the Node as NotReady, and flips the Pods’ Ready condition to False. After the kubelet restarts, it fetches the Pod info for its own Node from the API server, and the prober manager simply carries over that Ready condition value, right?

The scenario here indeed warrants a more detailed explanation.

Since we cannot delay waiting for the prober manager to trigger a probe before updating the container status in the syncPod process, the pod status update always occurs before the probe. This means that when we first update the container status, we do not know the actual state of the container.

For a short kubelet restart, we can confidently assume that the container's state has not changed. Therefore, we can retain the container's state and let the prober manager trigger a probe for the container to correctly update its state in the pod.

However, for a prolonged kubelet restart, when the node is already in a NotReady state, we can no longer assume that the container's state in the pod remains unchanged. In this case, we follow the previous behavior by setting the container's Ready field to false (as mentioned in the KEP, before applying the changes in this KEP, after a pod is initially added to the prober manager, the probe result is set to an initial value, and the initial value for the readiness probe is Failure, which sets the container's Ready field to false). Then, the probe is performed, and the container's state is correctly updated.

In summary:

For a short kubelet restart, we choose to inherit the container's state from before the kubelet restart.

For a prolonged kubelet restart, we follow the pre-change behavior by first setting the container's Ready field to false, waiting for the actual probe result, and eventually driving the container's state to its actual value. However, compared to the pre-change behavior, this approach still has an advantage: it avoids unnecessary state transitions for the container's Ready field (from true -> false -> true) and instead transitions directly from false -> true. This prevents meaningless state flapping, reducing unnecessary reconciliation work for various controllers that watch container status, such as the EndpointSlice controller or external controllers that depend on EndpointSlice, thereby alleviating their workload.

Thanks for the clear explanation!

For a prolonged kubelet restart, we follow the pre-change behavior by first setting the container's Ready field to false, waiting for the actual probe result, and eventually driving the container's state to its actual value.

I think I finally understand the part I was a bit unclear about regarding how the UPDATE PodStatus step in your diagram determines readiness. If the kubelet is down longer than nodeMonitorGracePeriod, the container’s ready condition is set to false. In that case, in your PoC, the section below is where the ready state becomes false, right?
https://github.com/kubernetes/kubernetes/blob/d207ce94fe550ec35ff6a6b120faf759b8cb9fae/pkg/kubelet/prober/prober_manager.go#L336-L339

keps/sig-node/4781-kublet-restart-pod-status/README.md

thockin · 2025-09-09T21:47:00Z

Can we include the history of this?

kubernetes/kubernetes#100277
kubernetes/kubernetes#100277 (comment)
kubernetes/kubernetes#102367

thockin

I am strongly in favor of this KEP, but I leave the specific details for people most familiar with Kubelet to iron out :)

keps/sig-node/4781-kublet-restart-pod-status/README.md

HirazawaUi · 2025-09-11T15:17:19Z

Can we include the history of this?

I don’t have many ideas for now, so I’ve simply placed these links in the Motivation section. If you feel the wording needs further description or that some context should be added to the links, please let me know — I’ll be happy to make the necessary changes.

k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Aug 23, 2025

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Aug 23, 2025

k8s-ci-robot requested a review from derekwaynecarr August 23, 2025 12:07

k8s-ci-robot added the kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory label Aug 23, 2025

k8s-ci-robot requested a review from mrunalp August 23, 2025 12:07

k8s-ci-robot added sig/node Categorizes an issue or PR as relevant to SIG Node. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Aug 23, 2025

HirazawaUi mentioned this pull request Aug 23, 2025

KEP-4781: Fix inconsistent container start and ready state after kubelet restart #4784

Closed

HirazawaUi force-pushed the kep-4781 branch from af44812 to 89af219 Compare August 23, 2025 12:19

HirazawaUi mentioned this pull request Aug 24, 2025

WIP: Restarting kubelet does not change pod status kubernetes/kubernetes#133676

Open

HirazawaUi force-pushed the kep-4781 branch 2 times, most recently from eca940b to acbdb7e Compare August 25, 2025 15:34

HirazawaUi changed the title ~~[WIP] KEP-4781 restarting kubelet does not change pod status~~ KEP-4781 restarting kubelet does not change pod status Aug 25, 2025

k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Aug 25, 2025

HirazawaUi commented Aug 25, 2025

View reviewed changes

HirazawaUi mentioned this pull request Aug 26, 2025

Restarted kubelet should not evict pods due to the node affinity kubernetes/kubernetes#124586

Open

HirazawaUi force-pushed the kep-4781 branch 2 times, most recently from 303bb56 to 8f0a0c4 Compare August 27, 2025 15:30

toVersus mentioned this pull request Aug 29, 2025

kubelet: fix pod ready state flips after kubelet restart kubernetes/kubernetes#133770

Open

chrishenzie reviewed Sep 2, 2025

View reviewed changes

keps/sig-node/4781-kublet-restart-pod-status/kep.yaml Show resolved Hide resolved

HirazawaUi force-pushed the kep-4781 branch from 8f0a0c4 to 419c3c9 Compare September 3, 2025 14:32

toVersus reviewed Sep 4, 2025

View reviewed changes

thockin reviewed Sep 9, 2025

View reviewed changes

keps/sig-node/4781-kublet-restart-pod-status/README.md Outdated Show resolved Hide resolved

keps/sig-node/4781-kublet-restart-pod-status/README.md Outdated Show resolved Hide resolved

HirazawaUi force-pushed the kep-4781 branch from 419c3c9 to aecf8a1 Compare September 11, 2025 15:04

restarting kubelet does not change pod status.

62df41c

HirazawaUi force-pushed the kep-4781 branch from aecf8a1 to 62df41c Compare September 11, 2025 15:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

KEP-4781 restarting kubelet does not change pod status #5493

KEP-4781 restarting kubelet does not change pod status #5493

HirazawaUi commented Aug 23, 2025 •

edited

Loading

Uh oh!

k8s-ci-robot commented Aug 23, 2025

Uh oh!

HirazawaUi Aug 25, 2025 •

edited

Loading

Uh oh!

Uh oh!

toVersus Sep 4, 2025

Uh oh!

HirazawaUi Sep 5, 2025 •

edited

Loading

Uh oh!

toVersus Sep 7, 2025

Uh oh!

HirazawaUi Sep 11, 2025

Uh oh!

Uh oh!

thockin commented Sep 9, 2025

Uh oh!

thockin left a comment

Uh oh!

Uh oh!

Uh oh!

HirazawaUi commented Sep 11, 2025

Uh oh!

Uh oh!


		2. We ensure that if the `Started` field in the container status is true, the container is considered started (since the startupProbe only runs during container startup and will not execute again once completed).

		3. If the Kubelet restart occurs within the `nodeMonitorGracePeriod` and the Pod’s Ready condition is set to false, we will set the container’s ready status to false. It will remain in this state until subsequent probes reset it to true.

KEP-4781 restarting kubelet does not change pod status #5493

Are you sure you want to change the base?

KEP-4781 restarting kubelet does not change pod status #5493

Conversation

HirazawaUi commented Aug 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

k8s-ci-robot commented Aug 23, 2025

Uh oh!

HirazawaUi Aug 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

toVersus Sep 4, 2025

Choose a reason for hiding this comment

Uh oh!

HirazawaUi Sep 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

toVersus Sep 7, 2025

Choose a reason for hiding this comment

Uh oh!

HirazawaUi Sep 11, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

thockin commented Sep 9, 2025

Uh oh!

thockin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

HirazawaUi commented Sep 11, 2025

Uh oh!

Uh oh!

HirazawaUi commented Aug 23, 2025 •

edited

Loading

HirazawaUi Aug 25, 2025 •

edited

Loading

HirazawaUi Sep 5, 2025 •

edited

Loading