Skip to content

Callback on slow fsyncs called after libuv closed #7066

Open
@cjen1-msft

Description

@cjen1-msft

When there is a slow fsync, and the node is terminated before the fsync completes, the uv worker thread running the fsync may access deleted objects when the fsync completes.

2025-06-19T16:14:49.026392Z        100 [info ] src/snapshots/snapshot_manager.h:198 | New snapshot file written to snapshot_46_47 [189779 bytes] (unsynced)
2025-06-19T16:14:49.028663Z        100 [info ] src/snapshots/snapshot_manager.h:111 | Start fsync
2025-06-19T16:14:49.028693Z -0.016 0   [trace] /ccf/src/node/history.h:487          | mt_flush_to index=48
2025-06-19T16:14:49.028770Z -0.016 0   [trace] /ccf/src/node/history.h:91           | History [3] <sha256 100d1c49088435484be4b4202a124d8c41f1d8d4f8a36078e88eea41f71f9e79>
2025-06-19T16:14:49.028838Z        100 [debug] /ccf/src/host/ledger.h:1489          | Ledger commit: 48/48
2025-06-19T16:14:49.028961Z        100 [debug] /ccf/src/host/ledger.h:698           | Committed ledger file ledger_47-48.committed
2025-06-19T16:14:49.029085Z -0.016 0   [debug] /ccf/src/consensus/aft/raft.h:2373   | Commit on n[689f10e149371dc149addf8fc097736b8de4fe307623d3e77e15b79ddf1fcb09]: 48
2025-06-19T16:14:49.029156Z -0.016 0   [debug] /ccf/src/enclave/rpc_sessions.h:540  | Closing a session inside the enclave: 27
2025-06-19T16:14:49.029222Z        100 [debug] /ccf/src/host/rpc_connections.h:417  | rpc closed from enclave 27
2025-06-19T16:14:49.029356Z -0.016 0   [info ] /ccf/src/enclave/enclave.h:453       | Enclave stopped successfully. Stopping host...
2025-06-19T16:14:49.029433Z        100 [info ] cf/src/host/handle_ring_buffer.h:100 | Host stopped successfully
2025-06-19T16:14:49.029502Z        100 [info ] /ccf/src/host/main.cpp:992           | Exited event loop
2025-06-19T16:14:49.030333Z        100 [info ] /ccf/src/host/main.cpp:1011          | Ran an extra 1000 cleanup iteration(s)
2025-06-19T16:14:49.030596Z        100 [info ] /ccf/src/host/main.cpp:1017          | Failed to close uv loop, walking now
2025-06-19T16:14:49.030657Z        100 [fail ] /ccf/src/host/main.cpp:1019          | Failed to close uv loop cleanly: EBUSY

The right solution here is probably for the fsync'ing worker thread to notify the main thread that it has completed, such that we can shutdown cleanly.
This will probably look like setting some global register on the main thread, checking that in the completion callbacks on worker threads and ensuring they do not enqueue more work if it is set.
Then the main thread can simply run until completion.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions