-
Notifications
You must be signed in to change notification settings - Fork 603
OCPBUGS-42808: podman-etcd: add automatic learner member promotion #2078
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
OCPBUGS-42808: podman-etcd: add automatic learner member promotion #2078
Conversation
…ction The check_peers function is broken up into smaller, more manageable functions. This refactoring separates the logic for detecting a loss of cluster leadership from the logic for managing peer membership. The main function is renamed to check_peer as there is only 1 peer to check (it was check_peers).
Can one of the admins check and authorise this run please: https://ci.kronosnet.org/job/resource-agents/job/resource-agents-pipeline/job/PR-2078/1/input |
heartbeat/podman-etcd
Outdated
|
||
# test decimal input: the variable expansion removes from the input any string | ||
# with at least one non-digit, so valid decimal inputs won't result in empty strings. | ||
if [ -z "${dec##*[!0-9]*}" ] 2>/dev/null; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've tested this, and it works, but I can't explain why.
"##" is a greedy prefix match.
"[!0-9]" = A prefix ended in zero or more non decimal digits
String I thought would break this: "A12"
In my head, the prefix match would take the longest string ending in a non-decimal digit, or 'A' and leave you with 12.
It works that way if we omit the last '*', because the greedy match stops at the A.
Can you explain this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the catch is that *[!0-9]*
matches any string that contains at least one non-digit character, not just the non-digit part. So it will catch "A12" completely, returning a zero string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will update the comment with something clearer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually, it can be much simplified with grep
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe you could do if ! echo "$dec" | grep -q "^[1-9][0-9]*$"; then
instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indeed, I'm testing something similar, but I wasn't thinking of forcing the first digit to be 1 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's looking for 1-9 as the first digit, so it wont match 0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, it's a good idea :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and forgot to add it. I'm on it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AH gotcha - it's glob rules, not regex rules.
So * is anymatch on both sides. [!0-9] is a non-decimal digit :)
heartbeat/podman-etcd
Outdated
ocf_log info "could not promote member $learner_member_id_hex, error code: $?" | ||
return $OCF_SUCCESS | ||
fi | ||
ocf_log info "success promote member $learner_member_id_hex" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: successfully promoted member...
heartbeat/podman-etcd
Outdated
endpoint_status_json=$(get_endpoint_status_json) | ||
ocf_log info "endpoint status: $endpoint_status_json" | ||
|
||
count_endpoints=$(printf "%s" "$endpoint_status_json" | jq -r ".[].Endpoint" | wc -l) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: extra space in the printf ?
ocf_exit_reason "$NODENAME must force a new cluster" | ||
return $OCF_ERR_GENERIC | ||
fi | ||
if [ "$endpoint_status_errors" != "null" ]; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am surprised jq returns null for this. I would expect an empty object. But if it's working 🤷♂️
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, this is not new code, just moved.
It returns null
because the .Status.errors
field is not always defined, so if an endpoint doesn't have the field, jq returns null.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes look good to me. I assume this has been tested?
Yes, manually and with local openshift/origin runs, which I'm already adjusting for the new timings |
Automatically promote etcd learner members to voting members when detected. Includes refactored member management functions and improved validation.
e29bc7a
to
de7c73a
Compare
Can one of the admins check and authorise this run please: https://ci.kronosnet.org/job/resource-agents/job/resource-agents-pipeline/job/PR-2078/2/input |
/lgtm |
Automatically promote etcd learner members to voting members when detected.
Includes refactored member management functions and improved validation.
See https://issues.redhat.com/browse/OCPBUGS-42808
Coincidentally, it also adresses https://issues.redhat.com/browse/OCPBUGS-57467