Skip to content

Conversation

clobrano
Copy link
Collaborator

@clobrano clobrano commented Sep 30, 2025

Automatically promote etcd learner members to voting members when detected.
Includes refactored member management functions and improved validation.

See https://issues.redhat.com/browse/OCPBUGS-42808
Coincidentally, it also adresses https://issues.redhat.com/browse/OCPBUGS-57467

…ction

The check_peers function is broken up into smaller, more manageable
functions. This refactoring separates the logic for detecting a loss of
cluster leadership from the logic for managing peer membership.

The main function is renamed to check_peer as there is only 1 peer to
check (it was check_peers).
@clobrano clobrano marked this pull request as ready for review September 30, 2025 16:40
Copy link

knet-jenkins bot commented Sep 30, 2025

Can one of the admins check and authorise this run please: https://ci.kronosnet.org/job/resource-agents/job/resource-agents-pipeline/job/PR-2078/1/input


# test decimal input: the variable expansion removes from the input any string
# with at least one non-digit, so valid decimal inputs won't result in empty strings.
if [ -z "${dec##*[!0-9]*}" ] 2>/dev/null; then

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tested this, and it works, but I can't explain why.
"##" is a greedy prefix match.
"[!0-9]
" = A prefix ended in zero or more non decimal digits
String I thought would break this: "A12"

In my head, the prefix match would take the longest string ending in a non-decimal digit, or 'A' and leave you with 12.

It works that way if we omit the last '*', because the greedy match stops at the A.

Can you explain this?

Copy link
Collaborator Author

@clobrano clobrano Oct 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the catch is that *[!0-9]* matches any string that contains at least one non-digit character, not just the non-digit part. So it will catch "A12" completely, returning a zero string.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will update the comment with something clearer

Copy link
Collaborator Author

@clobrano clobrano Oct 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually, it can be much simplified with grep

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you could do if ! echo "$dec" | grep -q "^[1-9][0-9]*$"; then instead.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indeed, I'm testing something similar, but I wasn't thinking of forcing the first digit to be 1 👍

Copy link
Contributor

@oalbrigt oalbrigt Oct 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's looking for 1-9 as the first digit, so it wont match 0.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, it's a good idea :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and forgot to add it. I'm on it

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AH gotcha - it's glob rules, not regex rules.
So * is anymatch on both sides. [!0-9] is a non-decimal digit :)

ocf_log info "could not promote member $learner_member_id_hex, error code: $?"
return $OCF_SUCCESS
fi
ocf_log info "success promote member $learner_member_id_hex"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: successfully promoted member...

endpoint_status_json=$(get_endpoint_status_json)
ocf_log info "endpoint status: $endpoint_status_json"

count_endpoints=$(printf "%s" "$endpoint_status_json" | jq -r ".[].Endpoint" | wc -l)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: extra space in the printf ?

ocf_exit_reason "$NODENAME must force a new cluster"
return $OCF_ERR_GENERIC
fi
if [ "$endpoint_status_errors" != "null" ]; then

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am surprised jq returns null for this. I would expect an empty object. But if it's working 🤷‍♂️

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, this is not new code, just moved.
It returns null because the .Status.errors field is not always defined, so if an endpoint doesn't have the field, jq returns null.

Copy link

@jaypoulz jaypoulz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good to me. I assume this has been tested?

@clobrano
Copy link
Collaborator Author

clobrano commented Oct 1, 2025

I assume this has been tested?

Yes, manually and with local openshift/origin runs, which I'm already adjusting for the new timings

Automatically promote etcd learner members to voting members when detected.
Includes refactored member management functions and improved validation.
@clobrano clobrano force-pushed the OCPBUGS-42808-promote-etcd-member branch from e29bc7a to de7c73a Compare October 1, 2025 15:48
Copy link

knet-jenkins bot commented Oct 1, 2025

Can one of the admins check and authorise this run please: https://ci.kronosnet.org/job/resource-agents/job/resource-agents-pipeline/job/PR-2078/2/input

@jaypoulz
Copy link

jaypoulz commented Oct 2, 2025

/lgtm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants