Skip to content

Add a webhook to prevent eviction of pods on kosmos NotReady nodes #343

@wangyizhi1

Description

@wangyizhi1

What would you like to be added:
Add a webhook to prevent eviction of pods on kosmos NotReady nodes

Why is this needed:
When the kosmos node remains not ready for more than 5 minutes, the node-controller of the controller-manager initiates eviction, which is equivalent to deleting pods. However, this approach may not always be appropriate because when the cluster reconnects, it leads to pod restarts.

The NotReady state of a node is more likely due to a kosmos service outage or cross-cluster network issues rather than a physical node failure. Therefore, there is a need for a mechanism to prevent the node-controller from deleting pods.

Since deletion is irreversible, one proposed solution is to intercept the pod deletion operation for the system:serviceaccount:kube-system:node-controller. Certain conditions need to be met before interception, such as utils.IsKosmosNode(node) && utils.IsNotReady(node) && v.needToPrevent(req.UserInfo.Username).

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions