-
Notifications
You must be signed in to change notification settings - Fork 1.6k
KEP-5673: add Graceful Deletion support for Custom Resources #5674
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
Welcome @dasionov! |
|
Hi @dasionov. Thanks for your PR. I'm waiting for a github.com member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: dasionov The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
| - Potential for confusion about when grace periods apply (only with finalizers) | ||
|
|
||
| ## Alternatives | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we achieve the same aim without any in-tree code? If so, I recommend listing that as one of the alternatives.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean by in-tree code?
I listed several alternatives, but in my opinion, none of them are as good as the main one.
If there’s another alternative you think might be a better fit that I’ve missed, I’d be glad to hear it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In-tree means code in the repository https://github.com/kubernetes/kubernetes
|
|
||
| **Decision**: Rejected. Grace periods should only apply when there's work to do | ||
| (i.e., finalizers present), consistent with Pod behavior. | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How I'd implement grace period for custom resources? A CEL expression that maps a requested grace period to a decision:
- nope, no grace period
- the actual grace period (eg: clamped to 60 second minimum)
Does that seem feasible? The expression would be part of the CustomResourceDefinition API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main limitation I see with the CEL-based approach is that it only evaluates whether to apply graceful deletion and possibly determines how much grace time to allow — for example, returning 0 if there are no finalizers, or request.gracePeriodSeconds otherwise.
However, evaluation is only the first step. The second and more critical part is propagating the evaluated grace period value into the resource, so that relevant controllers can act on it. This propagation step cannot be handled by a CEL expression (and not even by a webhook), since webhooks cannot mutate resources during DELETE operations.
In our KubeVirt use case, the logic needs to:
Compute the effective grace period based on the user’s request, existing finalizers, etc.
Update the resource metadata with that value so controllers can rely on it for cleanup.
Propagate it through the resource hierarchy:
The VM receives the delete request and its grace period is updated (e.g. from 1h → 30s).
The VMI deletion then reflects the same grace period.
Finally, the virt-launcher Pod is terminated with that propagated grace period.
Only after this cascade completes should the VM, VMI, and Pod all be removed in order.
For this behavior to work reliably across all delete requests, we need native support for setting and reading the metadata.gracePeriod field — something we currently can’t achieve via a custom webhook or CEL evaluation alone.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we are writing the code for Kubernetes, the in-tree CEL evaluation absolutely can mutate metadata during handling of a DELETE.
We would still need the metadata field; the difference is that we do not directly set it to an arbitrary, client-provided value.
| - "@dasionov" | ||
| owning-sig: sig-api-machinery | ||
| participating-sigs: | ||
| - sig-apimachinery |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this doesn't look right. I would have an empty list here.
| # The target maturity stage in the current dev cycle for this KEP. | ||
| # This KEP proposes a stable implementation without feature gate as the change | ||
| # is backward compatible and low-risk. | ||
| stage: stable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
alpha beta perhaps?
| for honoring the grace period) | ||
| - Supporting grace periods for Custom Resources without finalizers (consistent | ||
| with Pod behavior where grace periods only apply when there's cleanup work) | ||
| - Changing the CRD API or requiring CRD authors to explicitly opt-in to this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might need a reword. I think we are planning to change the API for CustomResourceDefinitions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How so ?
The only thing I tend to achieve in this KEP is a mutation of metadata.deletionGracePeriodSeconds existing field on delete, if I understand CEL implamantation correctly I don't see how CRD api change is needed
| behavior. | ||
|
|
||
| This KEP proposes implementing this as stable behavior without a feature gate, | ||
| as the change is purely additive, backward compatible, and poses minimal risk. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There should still be a way to opt out (I suggest).
| - Verify resource still exists with grace period | ||
| - Remove second finalizer | ||
| - Verify resource is deleted | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a way for a CRD to opt out of graceful deletion (eg using CEL)? If so, let's test that the opt out works how we expect it would.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are not fully implementing the graceful termination mechanism in Kubernetes as it works for Pods yet. Instead, we currently only consider the user’s request to shorten or extend the existing graceful termination period in valid cases — for example, when finalizers are present to handle the termination process or under similar conditions.
The responsibility for implementing graceful termination lies with the CRD controllers, so there’s no need to opt out when no graceful termination logic is implemented in the first place.
| alpha/beta graduation process. This is justified because: | ||
|
|
||
| 1. **Backward Compatible**: The change only adds information (a metadata field) | ||
| that existing controllers can safely ignore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have we tested all existing controllers to ensure they don't break?
If not, we should caveat "safely".
| - Will enabling / disabling the feature require downtime or reprovisioning of a node? No | ||
|
|
||
| **Rationale for no feature gate**: This change is purely additive and backward | ||
| compatible. It only sets an additional metadata field that existing controllers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know that we have tested all API clients to verify this.
| **Rationale for no feature gate**: This change is purely additive and backward | ||
| compatible. It only sets an additional metadata field that existing controllers | ||
| can safely ignore. The worst-case scenario is that the field is set but not used, | ||
| which poses no risk to cluster stability or functionality. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[citation needed]
|
|
||
| ###### How can an operator determine if the feature is in use by workloads? | ||
|
|
||
| 1. Check if CRs have `deletionGracePeriodSeconds` set: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this work if there are no recent deletes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's a pointer, it would be nil if there is no value.
c6355aa to
7962067
Compare
This KEP proposes adding support for graceful deletion to Custom Resources (CRs) by honoring the `--grace-period` flag (and `DeleteOptions.GracePeriodSeconds`) during deletion operations. Currently, when users delete a Custom Resource with a specified grace period, the value is ignored, leading to unexpected behavior for controllers and operators that rely on finalizers and graceful shutdown logic. This enhancement will ensure that when a Custom Resource is deleted with a grace period, the `metadata.deletionGracePeriodSeconds` field is properly set, allowing finalizers to observe this value and implement appropriate graceful shutdown behavior. This KEP proposes implementing this as stable behavior without a feature gate, as the change is purely additive, backward compatible, and poses minimal risk. Signed-off-by: Daniel Sionov <[email protected]>
7962067 to
76c9509
Compare
Add support for graceful deletion of Custom Resources by honoring --grace-period and setting metadata.deletionGracePeriodSeconds.