I've noticed that on an offline node, if a transaction context is cancelled, the transaction is not terminated. It only terminates when the node gets back online.
More specifically, this is how I noticed this behavior:
- Create a cluster of 3 nodes
- Disconnect node1 from the other nodes (I've done this with
ip link set enp5s0 down on the node1 VM, where enp5s0 was the only network interface used for the dqlite connection)
- Initiate a write transaction from this node (the specific SQL command was
UPDATE operations SET updated_at = ?, status_code = ?, metadata = ?, error = ?, error_code = ? WHERE uuid = ?)
- Cancel the context used for the transacion
- Notice that the transaction doesn't end immediately
It doesn't matter if node1 was initially database leader or not. I can see the same behavior with both the leader or a voter node.
I've reproduced this only with custom debugging output with LXD. The respective debug outputs after the node was disconnected from the rest of the cluster:
# Initiate the write transaction
ERROR [2026-03-30T19:06:53Z] Committing operation metadata to database with context context="&{0x9cf01459dc0 0x109ad00}" operation=019d4024-38c2-7a93-a643-e1330f3ed689
WARNING[2026-03-30T19:07:03Z] Transaction timed out. Retrying once err="Failed beginning transaction: context deadline exceeded" member=3
ERROR [2026-03-30T19:07:07Z] Heartbeat not received in time, cancelling durable operations
# This is when the transaction context is cancelled
ERROR [2026-03-30T19:07:07Z] Cancelling durable operation due to missed heartbeat context operation=019d4024-38c2-7a93-a643-e1330f3ed689
# This is when the transaction actually terminates.
ERROR [2026-03-30T19:07:13Z] Finished committing operation metadata to database with context context="&{0x9cf01459dc0 0x109ad00}" err="Failed updating operation \"019d4024-38c2-7a93-a643-e1330f3ed689\" record: Failed beginning transaction: failed to create dqlite connection: no available dqlite leader server found" operation=019d4024-38c2-7a93-a643-e1330f3ed689
Note the 5s difference in time (19:07:07 until 19:07:13), and the error message ("failed to create dqlite connection: no available dqlite leader server found"). This shows that the transaction was not terminated because the context was cancelled, but because it failed to reach the cluster leader.
Is this intended behavior? Should the transaction terminate immediate when the context is cancelled?
I've noticed that on an offline node, if a transaction context is cancelled, the transaction is not terminated. It only terminates when the node gets back online.
More specifically, this is how I noticed this behavior:
ip link set enp5s0 downon the node1 VM, where enp5s0 was the only network interface used for the dqlite connection)UPDATE operations SET updated_at = ?, status_code = ?, metadata = ?, error = ?, error_code = ? WHERE uuid = ?)It doesn't matter if node1 was initially database leader or not. I can see the same behavior with both the leader or a voter node.
I've reproduced this only with custom debugging output with LXD. The respective debug outputs after the node was disconnected from the rest of the cluster:
Note the 5s difference in time (19:07:07 until 19:07:13), and the error message ("failed to create dqlite connection: no available dqlite leader server found"). This shows that the transaction was not terminated because the context was cancelled, but because it failed to reach the cluster leader.
Is this intended behavior? Should the transaction terminate immediate when the context is cancelled?