-
Notifications
You must be signed in to change notification settings - Fork 603
WIP: Allow for control-plane node replacement in podman-etcd #2061
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
WIP: Allow for control-plane node replacement in podman-etcd #2061
Conversation
…ties of podman-etcd to allow for control-plane node replacement.
Can one of the admins check and authorise this run please: https://ci.kronosnet.org/job/resource-agents/job/resource-agents-pipeline/job/PR-2061/1/input |
return "$OCF_ERR_GENERIC" | ||
fi | ||
|
||
# When only the local revision exists, it can start normally. This ensures that it can start during node replacement events. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is too risky, as it be part of any normal startup, not only when an admin operates a node replacement.
There is an alternative already, the admin can set the property "force_new_cluster" on the node will ignore the revisions.
return "$OCF_SUCCESS" | ||
fi | ||
|
||
# When only the peer revision exists, the local node must join as a learner. This ensures that it can start during node replacement events. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is less risky, as the local revision comes directly from disk, so this might occur only if the node has no etcd data at all.
ocf_log err "local revision is older and peer is not starting: cannot start" | ||
ocf_exit_reason "local revision is older and peer is not starting: cannot start" | ||
ocf_log err "local revision is older or empty and peer is not starting: cannot start" | ||
ocf_exit_reason "local revision is older or emptyand peer is not starting: cannot start" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing a space between "and"
else | ||
if [ "$(attribute_node_cluster_id get)" = "$(attribute_node_cluster_id_peer)" ]; then | ||
ocf_log info "same cluster_id and revision: start normal" | ||
else if [ "$revision_compare_result" = "only_local" ] && [ "$(attribute_node_cluster_id get)" != ""] && [ "$(attribute_node_cluster_id get)" != "null" ]; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of minor notes on this
- you can use elif here
- there's a missing space before a closing bracket "]"
Moreover, this seems wrong to me. We have 2 starting resources with different cluster-ids, they cannot start normally.
This is a collection of tweaks intended to adjust the start-up properties of podman-etcd to allow for control-plane node replacement.