Skip to content

Conversation

jaypoulz
Copy link

This is a collection of tweaks intended to adjust the start-up properties of podman-etcd to allow for control-plane node replacement.

…ties of podman-etcd to allow for control-plane node replacement.
@jaypoulz jaypoulz marked this pull request as draft July 18, 2025 19:16
Copy link

knet-jenkins bot commented Jul 18, 2025

Can one of the admins check and authorise this run please: https://ci.kronosnet.org/job/resource-agents/job/resource-agents-pipeline/job/PR-2061/1/input

@clobrano clobrano self-requested a review July 22, 2025 06:08
return "$OCF_ERR_GENERIC"
fi

# When only the local revision exists, it can start normally. This ensures that it can start during node replacement events.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is too risky, as it be part of any normal startup, not only when an admin operates a node replacement.
There is an alternative already, the admin can set the property "force_new_cluster" on the node will ignore the revisions.

return "$OCF_SUCCESS"
fi

# When only the peer revision exists, the local node must join as a learner. This ensures that it can start during node replacement events.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is less risky, as the local revision comes directly from disk, so this might occur only if the node has no etcd data at all.

ocf_log err "local revision is older and peer is not starting: cannot start"
ocf_exit_reason "local revision is older and peer is not starting: cannot start"
ocf_log err "local revision is older or empty and peer is not starting: cannot start"
ocf_exit_reason "local revision is older or emptyand peer is not starting: cannot start"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing a space between "and"

else
if [ "$(attribute_node_cluster_id get)" = "$(attribute_node_cluster_id_peer)" ]; then
ocf_log info "same cluster_id and revision: start normal"
else if [ "$revision_compare_result" = "only_local" ] && [ "$(attribute_node_cluster_id get)" != ""] && [ "$(attribute_node_cluster_id get)" != "null" ]; then
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of minor notes on this

  • you can use elif here
  • there's a missing space before a closing bracket "]"

Moreover, this seems wrong to me. We have 2 starting resources with different cluster-ids, they cannot start normally.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants