-
Notifications
You must be signed in to change notification settings - Fork 23
feat: add cephadm
maintenance playbooks
#696
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Is there a way of determining the hosts that are not in maintenance and selecting one of them? That would make things a lot more simple. |
007e649
to
0522985
Compare
I suppose it is possible. However I see this as no different to how we handle controllers and VIP and intentionally avoid the VIP until the end. My concern would be if a host is in maintenance the command gets trapped as it will proceed to authenticate with the cluster and silently fail so it would involve timeouts and other work arounds. |
Does it error gracefully if the node is in maintenance? If not it might be worth adding "precheck" task to verify |
I will have to check but I think ceph has a tendency to return 0 regardless of it being successful or not. |
name: stackhpc.cephadm.commands | ||
vars: | ||
cephadm_commands: | ||
- "orch host maintenance enter {{ ansible_facts.nodename }}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Won't be possible for any host holding RGW services - gets:
WARNING: Removing RGW daemons can cause clients to lose connectivity.
Note: Warnings can be bypassed with the --force flag
Of course --force
defeats the purpose of other checks and is not viable here.
I reworked these into roles in the cephadm collection: stackhpc/ansible-collection-cephadm#153. Once that merges I'll propose some playbooks in SKC. |
Add two playbooks for entering and exiting maintenance mode for a given Ceph node.
Note these playbooks use stackhpc.cephadm.commands which will delegate the command to the first
mon
within your inventory. If this node is in maintenance you must specify--cephadm_delegate_host
and provide anothermon
.Note: this relies on something such as stackhpc/ansible-collection-cephadm/pull/109 being merged with some additional changes.