Skip to content

Commit 261941b

Browse files
authored
Merge branch 'stackhpc/2024.1' into alertmanager-0.28.1
2 parents c51aa0f + 4c34b67 commit 261941b

File tree

6 files changed

+90
-3
lines changed

6 files changed

+90
-3
lines changed

.github/workflows/stackhpc-pull-request.yml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,9 @@ jobs:
1818
pull-requests: read
1919
name: Check changed files
2020
if: github.repository == 'stackhpc/stackhpc-kayobe-config'
21+
needs:
22+
- lint
23+
- tox
2124
outputs:
2225
aio: ${{ steps.changes.outputs.aio }}
2326
build-kayobe-image: ${{ steps.changes.outputs.build-kayobe-image }}

doc/source/contributor/ofed.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -78,12 +78,12 @@ a package update, which can also be limited to hosts in the ``mlnx`` group.
7878
7979
kayobe overcloud host package update --packages "*" --limit mlnx
8080
81-
To ensure the latest kernel is the default on boot, the bootloader entires will need
81+
To ensure the latest kernel is the default on boot, the bootloader entries will need
8282
to be reset before rebooting.
8383

8484
.. code-block:: console
8585
86-
kayobe playbook run $KAYOBE_CONFIG_PATH/ansible/reset-bls-entires.yml -e reset_bls_host=mlnx
86+
kayobe playbook run $KAYOBE_CONFIG_PATH/ansible/reset-bls-entries.yml -e reset_bls_host=mlnx
8787
8888
The hosts can now be rebooted to use the latest kernel, a rolling reboot may be applicable
8989
here to reduce distruptions. See the `package updates documentation <package-updates>`.

etc/kayobe/cephadm.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
# Cephadm deployment configuration.
44

55
# Ceph release name.
6-
cephadm_ceph_release: "{{ 'squid' if (ansible_facts['distribution_release'] == 'noble') else 'reef' }}"
6+
cephadm_ceph_release: "{{ 'squid' if os_release == 'noble' else 'reef' }}"
77

88
# Ceph FSID.
99
#cephadm_fsid:
Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
---
2+
fixes:
3+
- |
4+
The Ceph version is now determined by ``os_release``, rather
5+
than Ansible facts. Using Ansible facts caused playbooks to fail when
6+
facts are not gathered.
Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
---
2+
features:
3+
- |
4+
Added a new script, ``rabbitmq-queue-migration.sh``, which will migrate to
5+
the new RabbitMQ durable queues. This is intended for use prior to an
6+
upgrade to Epoxy.

tools/rabbitmq-queue-migration.sh

Lines changed: 72 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,72 @@
1+
#! /usr/bin/bash
2+
3+
set -ex
4+
5+
RED='\033[0;31m'
6+
GREEN='\033[0;32m'
7+
8+
RABBITMQ_SERVICES_TO_RESTART=barbican,blazar,cinder,cloudkitty,designate,heat,ironic,keystone,magnum,manila,neutron,nova,octavia
9+
RABBITMQ_CONTAINER_NAME=rabbitmq
10+
11+
if [[ ! $KAYOBE_CONFIG_PATH ]]; then
12+
echo "${RED}Environment variable \$KAYOBE_CONFIG_PATH is not defined"
13+
echo "${RED}Ensure your environment is set up to run kayobe commands"
14+
exit 2
15+
fi
16+
17+
if [[ ! "$1" = "--skip-checks" ]]; then
18+
# Fail if clocks are not synced
19+
if ! ( kayobe overcloud host command run -l controllers -b --command "timedatectl status | grep 'synchronized: yes'" ); then
20+
echo "${RED}Failed precheck: Time not synced on controllers"
21+
echo "${RED}Use 'timedatectl status' to check sync state"
22+
echo "${RED}Either wait for sync or use 'chronyc makestep'"
23+
exit 1
24+
fi
25+
kayobe overcloud service configuration generate --node-config-dir /tmp/rabbit-migration --kolla-tags none
26+
# Fail if any new feature flags are not set
27+
if ! ( grep 'om_enable_queue_manager: true' $KOLLA_CONFIG_PATH/globals.yml && \
28+
grep 'om_enable_rabbitmq_quorum_queues: true' $KOLLA_CONFIG_PATH/globals.yml && \
29+
grep 'om_enable_rabbitmq_transient_quorum_queue: true' $KOLLA_CONFIG_PATH/globals.yml && \
30+
grep 'om_enable_rabbitmq_stream_fanout: true' $KOLLA_CONFIG_PATH/globals.yml ); then
31+
echo "${RED}Failed precheck: The following must be enabled: om_enable_queue_manager, om_enable_rabbitmq_quorum_queues, om_enable_rabbitmq_transient_quorum_queue, om_enable_rabbitmq_stream_fanout"
32+
exit 1
33+
fi
34+
fi
35+
36+
# Generate new config, stop services using rabbit, and reset rabbit state
37+
kayobe overcloud service configuration generate --node-config-dir /etc/kolla --kolla-skip-tags rabbitmq-ha-precheck
38+
kayobe kolla ansible run "stop --yes-i-really-really-mean-it" -kt $RABBITMQ_SERVICES_TO_RESTART
39+
kayobe kolla ansible run rabbitmq-reset-state
40+
41+
if [[ ! "$1" = "--skip-checks" ]]; then
42+
# Fail if any queues still exist
43+
sleep 20
44+
# Note(mattcrees): We turn the text grey here so the failed Ansible calls don't freak anyone out
45+
CURRENTTERM=${TERM}
46+
export TERM=xterm-mono
47+
if ( kayobe overcloud host command run -l controllers -b --command "docker exec $RABBITMQ_CONTAINER_NAME rabbitmqctl list_queues name --silent | grep -v '^$'" ); then
48+
export TERM=${CURRENTTERM}
49+
echo -e "${RED}Failed check: RabbitMQ has not stopped properly, queues still exist"
50+
exit 1
51+
fi
52+
# Fail if any exchanges still exist (excluding those starting with 'amq.')
53+
if ( kayobe overcloud host command run -l controllers -b --command "docker exec $RABBITMQ_CONTAINER_NAME rabbitmqctl list_exchanges name --silent | grep -v '^$' | grep -v '^amq.'" ); then
54+
export TERM=${CURRENTTERM}
55+
echo -e "${RED}Failed check: RabbitMQ has not stopped properly, exchanges still exist"
56+
exit 1
57+
fi
58+
export TERM=${CURRENTTERM}
59+
fi
60+
61+
# Redeploy with all durable-type queues enabled
62+
kayobe kolla ansible run deploy-containers -kt $RABBITMQ_SERVICES_TO_RESTART
63+
64+
if [[ ! "$1" = "--skip-checks" ]]; then
65+
sleep 60
66+
# Assert that all queues are durable
67+
if ! ( kayobe overcloud host command run -l controllers -b --command "docker exec $RABBITMQ_CONTAINER_NAME rabbitmqctl list_queues durable --silent | grep false" > /dev/null 2>&1 ); then
68+
echo -e "${GREEN}Queues migrated successfully"
69+
else
70+
echo -e "${RED}Failed post-check: A controller has non-durable queues"
71+
fi
72+
fi

0 commit comments

Comments
 (0)