-
Notifications
You must be signed in to change notification settings - Fork 364
[controller] Reschedule provisioning if node is missing #1438
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
[controller] Reschedule provisioning if node is missing #1438
Conversation
GenerateAccessibilityRequirements tries to get the Node and CSINode objects but if they are missing (because they were deleted), then the provisioning will fail with ProvisioningNoChange which means that it will potentially be retried forever if the node never comes back because nothing is removing the selected-node annotation anymore. This commit makes it so that Not Found api errors are properly caught and when it's the case, ProvisioningReschedule is returned to tell the scheduler to try a new node. This matches the previous implementation in the external-provisioner lib (https://github.com/kubernetes-sigs/sig-storage-lib-external-provisioner/pull/194/files#diff-3c5bb5f48211873c58fcba055dcae2ac7b1958969219e06e1508d76d485dace7L1496-L1498) Signed-off-by: Baptiste Girard-Carrabin <[email protected]>
|
Welcome @Fricounet! |
|
Hi @Fricounet. Thanks for your PR. I'm waiting for a github.com member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: Fricounet The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/ok-to-test |
| if err != nil { | ||
| if apierrors.IsNotFound(err) { | ||
| // The node or CSINode object can't be found, ask the scheduler for a reschedule | ||
| return nil, controller.ProvisioningReschedule, err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the GenerateAccessibilityRequirements properly return NotFound error now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because if the cache still has the csinode info, the GenerateAccessibilityRequirements will not even return a NotFound error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Under what condition the node never comes back will retry forever? Because if it is a long provisioning, the PV object will be eventually created or failed with a final error, so it should not be forever?
|
Did you test it end to end that the fix works as expected? |
What type of PR is this?
/kind bug
What this PR does / why we need it:
GenerateAccessibilityRequirements tries to get the Node and CSINode objects but if they are missing (because they were deleted), then the provisioning will fail with ProvisioningNoChange which means that it will potentially be retried forever if the node never comes back because nothing is removing the selected-node annotation anymore. This commit makes it so that Not Found api errors are properly caught and when it's the case, ProvisioningReschedule is returned to tell the scheduler to try a new node. This matches the previous implementation in the external-provisioner lib (https://github.com/kubernetes-sigs/sig-storage-lib-external-provisioner/pull/194/files#diff-3c5bb5f48211873c58fcba055dcae2ac7b1958969219e06e1508d76d485dace7L1496-L1498)
Which issue(s) this PR fixes:
Fixes #1437
Special notes for your reviewer:
Does this PR introduce a user-facing change?: