Hostpath PVs remain dangling when the node is deleted before OpenEBS can reconcile #4089
Unanswered
Tristan-Otterpohl-Forter
asked this question in
General
Replies: 1 comment 1 reply
-
|
We've had a similar scenario here |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Context
We are running Elasticsearch on Kubernetes using the OpenEBS Dynamic LocalPV (hostpath) provisioner.
Under normal circumstances, when a PVC and its associated pod are deleted, OpenEBS deletes the PV and the hostpath data from the node as expected.
However, when a node is deleted before OpenEBS reconciles the PV deletion, the PV and PVC remain in a dangling state. The provisioner logs errors similar to the following:
This happens because the node object no longer exists in the Kubernetes API, and the provisioner cannot determine where the hostpath volume resides.
Expected vs Actual Behavior
Environment
openebs-hostpathThis typically occurs during node rotations or Kubernetes upgrades, when Karpenter replaces EC2 instances and deletes their corresponding Node objects quickly after draining.
Observations
Existing Mitigation
AppsFlyer has built a tool called the PVC Releaser, which is designed to handle dangling PVC's automatically in these scenarios. But nothing manages dangling PV's. it may be useful to have that functionality built into OpenEBS.
Suggested Actions
This doesn’t necessarily need to be “fixed” since it’s a logical limitation, but it would help if this behavior were better documented.
Possible improvements:
Reproduce Issue
This is easiest to reproduce by deleting a node without draining/cordoning.
Summary
When a node is deleted before OpenEBS can reconcile a hostpath PV deletion, the PV and PVC remain dangling because the provisioner cannot locate the node to perform cleanup.
This behavior is expected but not clearly documented, and making it more visible would help operators running node-local storage on clusters with frequent node rotations.
Beta Was this translation helpful? Give feedback.
All reactions