feat: add `cephadm` maintenance playbooks #696

jackhodgkiss · 2023-10-07T10:58:19Z

Add two playbooks for entering and exiting maintenance mode for a given Ceph node.

kayobe playbook run $KAYOBE_CONFIG_PATH/ansible/cephadm-enter-maintenance.yml --limit ceph-mon-01

kayobe playbook run $KAYOBE_CONFIG_PATH/ansible/cephadm-exit-maintenance.yml --limit ceph-mon-01

Note these playbooks use stackhpc.cephadm.commands which will delegate the command to the first mon within your inventory. If this node is in maintenance you must specify --cephadm_delegate_host and provide another mon.

kayobe playbook run $KAYOBE_CONFIG_PATH/ansible/cephadm-exit-maintenance.yml --limit ceph-mon-01 --cephadm_delegate_host ceph-mon-02

Note: this relies on something such as stackhpc/ansible-collection-cephadm/pull/109 being merged with some additional changes.

Alex-Welsh · 2023-10-09T07:50:23Z

Note these playbooks use stackhpc.cephadm.commands which will delegate the command to the first mon within your inventory. If this node is in maintenance you must specify --cephadm_delegate_host and provide another mon.

Is there a way of determining the hosts that are not in maintenance and selecting one of them? That would make things a lot more simple.

jackhodgkiss · 2023-10-10T09:06:45Z

Note these playbooks use stackhpc.cephadm.commands which will delegate the command to the first mon within your inventory. If this node is in maintenance you must specify --cephadm_delegate_host and provide another mon.

Is there a way of determining the hosts that are not in maintenance and selecting one of them? That would make things a lot more simple.

I suppose it is possible. However I see this as no different to how we handle controllers and VIP and intentionally avoid the VIP until the end.

My concern would be if a host is in maintenance the command gets trapped as it will proceed to authenticate with the cluster and silently fail so it would involve timeouts and other work arounds.

etc/kayobe/ansible/cephadm-enter-maintenance.yml

etc/kayobe/ansible/cephadm-exit-maintenance.yml

Alex-Welsh · 2023-10-20T12:39:21Z

Note these playbooks use stackhpc.cephadm.commands which will delegate the command to the first mon within your inventory. If this node is in maintenance you must specify --cephadm_delegate_host and provide another mon.

Is there a way of determining the hosts that are not in maintenance and selecting one of them? That would make things a lot more simple.

I suppose it is possible. However I see this as no different to how we handle controllers and VIP and intentionally avoid the VIP until the end.

My concern would be if a host is in maintenance the command gets trapped as it will proceed to authenticate with the cluster and silently fail so it would involve timeouts and other work arounds.

Does it error gracefully if the node is in maintenance? If not it might be worth adding "precheck" task to verify

jackhodgkiss · 2023-11-03T13:26:20Z

Note these playbooks use stackhpc.cephadm.commands which will delegate the command to the first mon within your inventory. If this node is in maintenance you must specify --cephadm_delegate_host and provide another mon.

Is there a way of determining the hosts that are not in maintenance and selecting one of them? That would make things a lot more simple.

I suppose it is possible. However I see this as no different to how we handle controllers and VIP and intentionally avoid the VIP until the end.
My concern would be if a host is in maintenance the command gets trapped as it will proceed to authenticate with the cluster and silently fail so it would involve timeouts and other work arounds.

Does it error gracefully if the node is in maintenance? If not it might be worth adding "precheck" task to verify

I will have to check but I think ceph has a tendency to return 0 regardless of it being successful or not.

cityofships · 2024-02-08T13:45:10Z

etc/kayobe/ansible/cephadm-enter-maintenance.yml

+        name: stackhpc.cephadm.commands
+      vars:
+        cephadm_commands:
+          - "orch host maintenance enter {{ ansible_facts.nodename }}"


Won't be possible for any host holding RGW services - gets:

WARNING: Removing RGW daemons can cause clients to lose connectivity. Note: Warnings can be bypassed with the --force flag

Of course --force defeats the purpose of other checks and is not viable here.

markgoddard · 2024-08-06T13:01:09Z

I reworked these into roles in the cephadm collection: stackhpc/ansible-collection-cephadm#153. Once that merges I'll propose some playbooks in SKC.

jackhodgkiss added 2 commits October 9, 2023 14:12

feat: add cephadm maintenance playbooks

b1b1895

fix: wrong indentation

0522985

jackhodgkiss force-pushed the ceph-maintenance-playbook branch from 007e649 to 0522985 Compare October 9, 2023 13:12

feat: use FQDN for import_role

b281882

Alex-Welsh reviewed Oct 20, 2023

View reviewed changes

etc/kayobe/ansible/cephadm-enter-maintenance.yml Show resolved Hide resolved

etc/kayobe/ansible/cephadm-exit-maintenance.yml Show resolved Hide resolved

Merge branch 'stackhpc/yoga' into ceph-maintenance-playbook

6b6c653

cityofships reviewed Feb 8, 2024

View reviewed changes

jackhodgkiss closed this Aug 22, 2024

jackhodgkiss deleted the ceph-maintenance-playbook branch January 15, 2025 20:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: add `cephadm` maintenance playbooks #696

feat: add `cephadm` maintenance playbooks #696

Uh oh!

jackhodgkiss commented Oct 7, 2023

Uh oh!

Alex-Welsh commented Oct 9, 2023

Uh oh!

jackhodgkiss commented Oct 10, 2023

Uh oh!

Uh oh!

Uh oh!

Alex-Welsh commented Oct 20, 2023

Uh oh!

jackhodgkiss commented Nov 3, 2023

Uh oh!

cityofships Feb 8, 2024

Uh oh!

markgoddard commented Aug 6, 2024

Uh oh!

Uh oh!

feat: add cephadm maintenance playbooks #696

feat: add cephadm maintenance playbooks #696

Uh oh!

Conversation

jackhodgkiss commented Oct 7, 2023

Uh oh!

Alex-Welsh commented Oct 9, 2023

Uh oh!

jackhodgkiss commented Oct 10, 2023

Uh oh!

Uh oh!

Uh oh!

Alex-Welsh commented Oct 20, 2023

Uh oh!

jackhodgkiss commented Nov 3, 2023

Uh oh!

cityofships Feb 8, 2024

Choose a reason for hiding this comment

Uh oh!

markgoddard commented Aug 6, 2024

Uh oh!

Uh oh!

feat: add `cephadm` maintenance playbooks #696

feat: add `cephadm` maintenance playbooks #696