[DO NOT MERGE] test for reconnection to RabbitMQ #4736

battermann · 2025-08-28T11:52:48Z

Checklist

Add a new entry in an appropriate subdirectory of changelog.d
Read and follow the PR guidelines

jschaul · 2025-08-28T15:47:37Z

run the docker-compose services, either on latest commit; or on a previous commit using toxiproxy
start the test: TEST_INCLUDE=testRabbitMQConnection make ci-safe package=integration
when the prompt appears, break the connection in some way. Use either ./toxiproxy-rabbitmq-terminate.sh or a way with haproxy (you need to first fix the configuration, somehow the haproxy setup isn't quite working yet on that latest commit
re-establish a connection (press enter in the toxiproxy script)
press enter in the integration test so it attempts to send another message. If the test is green then, well, you could not reproduce an issue.

lwille · 2025-08-28T16:08:08Z

integration/test/Test/Demo.hs

In this test, we seem to be discretely doing things:

sending + consuming
(kill connection + wait for reconnect)
sending + consuming

I think that's not really what happens in the background worker, or in gundeck for forwarding notifications. Those processes would be waiting for new AMQP messages all the time, and suddenly the disconnect would kick.

What happens if the RabbitMQ connection is killed while sending/consuming messages?

It's correct that we test that sending and receiving works, then kill RabbitMQ, restart it and wait for reconnect and then test sending and receiving again.

But I don't understand what you mean with "that's not what happens in background worker". The background worker is always connected to the queue, also when the outage starts. Or more precisely, it's a thread in the background worker, that should constantly and indefinitely try to reconnect or restart if killed.

Gundeck in not involved as from the logs we can see that it's the notification-pusher that while trying to establish a connection to RabbitMQ.

Trying to kill RabbitMQ while bg worker is consuming messages is an interesting idea. What we do is, we kill the broker, while the bg is connected, however there are no messages being processed when the connection is killed, because the queue is empty. Killing the broker while the queue contains unprocessed messages is technically a bit difficult because when we take RabbitMQ down we also cannot produce messages. Still maybe worthwhile to try to find a workaround and test this?

zebot added the ok-to-test Approved for running tests in CI, overrides not-ok-to-test if both labels exist label Aug 28, 2025

jschaul force-pushed the WPB-19422-rabbit-mq-connection-loss-leads-to-backend-notification-pusher-getting-stuck-2 branch from 37a7252 to 835591a Compare August 28, 2025 15:49

lwille reviewed Aug 28, 2025

View reviewed changes

battermann and others added 11 commits September 1, 2025 12:52

test

79e33ac

WIP: toxiproxy

37dba74

WIP toxiproxy script

19aade5

WIP

ff179b4

WIP

49fdfe1

WIP

5fac655

try with tcp, not tls

5432b35

sleep 2 sec

1b90c08

haproxy WIP

ea1a2ce

updated haproxy.cfg

823dc66

wip haproxy

082947a

battermann force-pushed the WPB-19422-rabbit-mq-connection-loss-leads-to-backend-notification-pusher-getting-stuck-2 branch from 835591a to 082947a Compare September 1, 2025 13:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[DO NOT MERGE] test for reconnection to RabbitMQ #4736

[DO NOT MERGE] test for reconnection to RabbitMQ #4736

Uh oh!

battermann commented Aug 28, 2025

Uh oh!

jschaul commented Aug 28, 2025

Uh oh!

lwille Aug 28, 2025 •

edited

Loading

Uh oh!

battermann Aug 29, 2025

Uh oh!

Uh oh!

[DO NOT MERGE] test for reconnection to RabbitMQ #4736

Are you sure you want to change the base?

[DO NOT MERGE] test for reconnection to RabbitMQ #4736

Uh oh!

Conversation

battermann commented Aug 28, 2025

Checklist

Uh oh!

jschaul commented Aug 28, 2025

Uh oh!

lwille Aug 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

battermann Aug 29, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

lwille Aug 28, 2025 •

edited

Loading