Skip to content

Destroy cluster doesn't fail gracefully. #146

@PC-Admin

Description

@PC-Admin

Hello,

First off, awesome Ansible collection, thanks you for making it available!

Having a small issue when using this collection with -e "cephadm_recreate=true". As the previous run failed to add 2/3 hosts to my cluster the 'Destroy cluster' task fails like so:

TASK [stackhpc.cephadm.cephadm : Destroy cluster] ***********************************************************************************************************************************************
fatal: [index-16-09078]: FAILED! => {"changed": true, "cmd": ["cephadm", "rm-cluster", "--fsid", "53d7c6cc-2229-11ef-a94c-b1f216e39593", "--force"], "delta": "0:00:00.499584", "end": "2024-06-04 05:06:01.772498", "msg": "non-zero return code", "rc": 1, "start": "2024-06-04 05:06:01.272914", "stderr": "Traceback (most recent call last):\n  File \"/usr/lib/python3.10/runpy.py\", line 196, in _run_module_as_main\n    return _run_code(code, main_globals, None,\n  File \"/usr/lib/python3.10/runpy.py\", line 86, in _run_code\n    exec(code, run_globals)\n  File \"/tmp/tmpwf3vvwn_.cephadm.build/__main__.py\", line 10700, in <module>\n  File \"/tmp/tmpwf3vvwn_.cephadm.build/__main__.py\", line 10688, in main\n  File \"/tmp/tmpwf3vvwn_.cephadm.build/__main__.py\", line 7989, in command_rm_cluster\n  File \"/tmp/tmpwf3vvwn_.cephadm.build/__main__.py\", line 8047, in _rm_cluster\n  File \"/tmp/tmpwf3vvwn_.cephadm.build/__main__.py\", line 7979, in get_ceph_cluster_count\nFileNotFoundError: [Errno 2] No such file or directory: '/var/lib/ceph'", "stderr_lines": ["Traceback (most recent call last):", "  File \"/usr/lib/python3.10/runpy.py\", line 196, in _run_module_as_main", "    return _run_code(code, main_globals, None,", "  File \"/usr/lib/python3.10/runpy.py\", line 86, in _run_code", "    exec(code, run_globals)", "  File \"/tmp/tmpwf3vvwn_.cephadm.build/__main__.py\", line 10700, in <module>", "  File \"/tmp/tmpwf3vvwn_.cephadm.build/__main__.py\", line 10688, in main", "  File \"/tmp/tmpwf3vvwn_.cephadm.build/__main__.py\", line 7989, in command_rm_cluster", "  File \"/tmp/tmpwf3vvwn_.cephadm.build/__main__.py\", line 8047, in _rm_cluster", "  File \"/tmp/tmpwf3vvwn_.cephadm.build/__main__.py\", line 7979, in get_ceph_cluster_count", "FileNotFoundError: [Errno 2] No such file or directory: '/var/lib/ceph'"], "stdout": "Deleting cluster with fsid: 53d7c6cc-2229-11ef-a94c-b1f216e39593", "stdout_lines": ["Deleting cluster with fsid: 53d7c6cc-2229-11ef-a94c-b1f216e39593"]}
fatal: [storage-16-09074]: FAILED! => {"changed": true, "cmd": ["cephadm", "rm-cluster", "--fsid", "53d7c6cc-2229-11ef-a94c-b1f216e39593", "--force"], "delta": "0:00:00.513107", "end": "2024-06-04 05:06:01.810504", "msg": "non-zero return code", "rc": 1, "start": "2024-06-04 05:06:01.297397", "stderr": "Traceback (most recent call last):\n  File \"/usr/lib/python3.10/runpy.py\", line 196, in _run_module_as_main\n    return _run_code(code, main_globals, None,\n  File \"/usr/lib/python3.10/runpy.py\", line 86, in _run_code\n    exec(code, run_globals)\n  File \"/tmp/tmpwf3vvwn_.cephadm.build/__main__.py\", line 10700, in <module>\n  File \"/tmp/tmpwf3vvwn_.cephadm.build/__main__.py\", line 10688, in main\n  File \"/tmp/tmpwf3vvwn_.cephadm.build/__main__.py\", line 7989, in command_rm_cluster\n  File \"/tmp/tmpwf3vvwn_.cephadm.build/__main__.py\", line 8047, in _rm_cluster\n  File \"/tmp/tmpwf3vvwn_.cephadm.build/__main__.py\", line 7979, in get_ceph_cluster_count\nFileNotFoundError: [Errno 2] No such file or directory: '/var/lib/ceph'", "stderr_lines": ["Traceback (most recent call last):", "  File \"/usr/lib/python3.10/runpy.py\", line 196, in _run_module_as_main", "    return _run_code(code, main_globals, None,", "  File \"/usr/lib/python3.10/runpy.py\", line 86, in _run_code", "    exec(code, run_globals)", "  File \"/tmp/tmpwf3vvwn_.cephadm.build/__main__.py\", line 10700, in <module>", "  File \"/tmp/tmpwf3vvwn_.cephadm.build/__main__.py\", line 10688, in main", "  File \"/tmp/tmpwf3vvwn_.cephadm.build/__main__.py\", line 7989, in command_rm_cluster", "  File \"/tmp/tmpwf3vvwn_.cephadm.build/__main__.py\", line 8047, in _rm_cluster", "  File \"/tmp/tmpwf3vvwn_.cephadm.build/__main__.py\", line 7979, in get_ceph_cluster_count", "FileNotFoundError: [Errno 2] No such file or directory: '/var/lib/ceph'"], "stdout": "Deleting cluster with fsid: 53d7c6cc-2229-11ef-a94c-b1f216e39593", "stdout_lines": ["Deleting cluster with fsid: 53d7c6cc-2229-11ef-a94c-b1f216e39593"]}
changed: [storage-14-09034] => {"changed": true, "cmd": ["cephadm", "rm-cluster", "--fsid", "53d7c6cc-2229-11ef-a94c-b1f216e39593", "--force"], "delta": "0:00:07.164754", "end": "2024-06-04 05:06:08.185614", "rc": 0, "start": "2024-06-04 05:06:01.020860", "stderr": "", "stderr_lines": [], "stdout": "Deleting cluster with fsid: 53d7c6cc-2229-11ef-a94c-b1f216e39593", "stdout_lines": ["Deleting cluster with fsid: 53d7c6cc-2229-11ef-a94c-b1f216e39593"]}

TASK [stackhpc.cephadm.cephadm : Remove ssh keys] ***********************************************************************************************************************************************
changed: [storage-14-09034] => (item=/etc/ceph/cephadm.id) => {"ansible_loop_var": "item", "changed": true, "item": "/etc/ceph/cephadm.id", "path": "/etc/ceph/cephadm.id", "state": "absent"}
changed: [storage-14-09034] => (item=/etc/ceph/cephadm.pub) => {"ansible_loop_var": "item", "changed": true, "item": "/etc/ceph/cephadm.pub", "path": "/etc/ceph/cephadm.pub", "state": "absent"}

TASK [stackhpc.cephadm.cephadm : Run prechecks] *************************************************************************************************************************************************
included: /home/mcollins1/.ansible/collections/ansible_collections/stackhpc/cephadm/roles/cephadm/tasks/prechecks.yml for storage-14-09034

This causes the subsequent tasks to not be applied to those failed hosts.

Perhaps a ignore_errors: true here would be appropriate.

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions