Skip to content

src/vfs.c:1271: vfsMainFileClose: Assertion '(f->database->shm.lock[0] == 0)' #855

@simondeziel

Description

@simondeziel

In the LXD test suite, during the clustering_handover, an assertion failure in dqlite was found:

2025-10-09T23:08:03.9744500Z + lxd shutdown
2025-10-09T23:08:03.9761816Z time="2025-10-09T23:08:03Z" level=info msg="Found cluster member whose role needs to be changed" candidateAddress="100.64.1.103:8443" local="100.64.1.101:8443" newRole=spare
2025-10-09T23:08:03.9765596Z time="2025-10-09T23:08:03Z" level=info msg="Demoting offline member during rebalance" candidateAddress="100.64.1.103:8443"
2025-10-09T23:08:03.9890740Z time="2025-10-09T23:08:03Z" level=info msg="Completed heartbeat round" duration=142.222769ms local="100.64.1.101:8443"
2025-10-09T23:08:03.9943000Z time="2025-10-09T23:08:03Z" level=info msg="Starting heartbeat round" local="100.64.1.101:8443" mode=initial
2025-10-09T23:08:03.9952887Z time="2025-10-09T23:08:03Z" level=warning msg="Failed heartbeat" err="Failed to send heartbeat request: Put \"https://100.64.1.103:8443/internal/database\": dial tcp 100.64.1.103:8443: connect: connection refused" remote="100.64.1.103:8443"
2025-10-09T23:08:04.0008186Z time="2025-10-09T23:08:04Z" level=info msg="Partial heartbeat received" local="100.64.1.104:8443"
2025-10-09T23:08:04.0011674Z time="2025-10-09T23:08:04Z" level=info msg="Partial heartbeat received" local="100.64.1.102:8443"
2025-10-09T23:08:04.0025965Z time="2025-10-09T23:08:04Z" level=info msg="Asked to shutdown by API" force=false
2025-10-09T23:08:04.0029878Z time="2025-10-09T23:08:04Z" level=info msg="Starting shutdown sequence" signal="power failure"
2025-10-09T23:08:04.0031115Z time="2025-10-09T23:08:04Z" level=info msg="Unwatching path" driver=inotify path=/var/lib/ubuntu-advantage/interfaces/lxd-config.json
2025-10-09T23:08:04.0032123Z time="2025-10-09T23:08:04Z" level=info msg="Transferring leadership" address="100.64.1.101:8443"
2025-10-09T23:08:04.0050157Z time="2025-10-09T23:08:04Z" level=warning msg="Failed heartbeat" err="Failed to send heartbeat request: Put \"https://100.64.1.103:8443/internal/database\": dial tcp 100.64.1.103:8443: connect: connection refused" remote="100.64.1.103:8443"
2025-10-09T23:08:04.0089964Z lxd: src/vfs.c:1271: vfsMainFileClose: Assertion `(f->database->shm.lock[0] == 0)' failed.
2025-10-09T23:08:04.0119308Z time="2025-10-09T23:08:04Z" level=warning msg="Failed adding member event listener client" err="dial tcp 100.64.1.103:8443: connect: connection refused" local="100.64.1.104:8443" remote="100.64.1.103:8443"
2025-10-09T23:08:04.0123219Z time="2025-10-09T23:08:04Z" level=warning msg="Failed adding member event listener client" err="dial tcp 100.64.1.103:8443: connect: connection refused" local="100.64.1.102:8443" remote="100.64.1.103:8443"
2025-10-09T23:08:04.0125165Z time="2025-10-09T23:08:04Z" level=info msg="Dqlite connected outbound" local="100.64.1.102:40880" name=raft remote="100.64.1.104:8443"
2025-10-09T23:08:04.0140110Z time="2025-10-09T23:08:04Z" level=info msg="Dqlite proxy started" local="100.64.1.104:8443" name=dqlite remote="100.64.1.1:40880"
2025-10-09T23:08:04.0143688Z time="2025-10-09T23:08:04Z" level=info msg="Dqlite proxy started" local="100.64.1.102:40880" name=raft remote="100.64.1.104:8443"
2025-10-09T23:08:04.0172181Z time="2025-10-09T23:08:04Z" level=info msg="Dqlite connected outbound" local="100.64.1.104:33382" name=raft remote="100.64.1.102:8443"
2025-10-09T23:08:04.0187323Z time="2025-10-09T23:08:04Z" level=info msg="Dqlite proxy started" local="100.64.1.102:8443" name=dqlite remote="100.64.1.1:33382"
2025-10-09T23:08:04.0191223Z time="2025-10-09T23:08:04Z" level=info msg="Dqlite proxy started" local="100.64.1.104:33382" name=raft remote="100.64.1.102:8443"
2025-10-09T23:08:05.2390792Z Error: Put "http://unix.socket/internal/shutdown?force=false": EOF
2025-10-09T23:08:05.2392846Z time="2025-10-09T23:08:05Z" level=info msg="Dqlite proxy stopped" local="100.64.1.104:8443" name=dqlite remote="100.64.1.1:52886"
2025-10-09T23:08:05.2394393Z time="2025-10-09T23:08:05Z" level=info msg="Dqlite proxy stopped" local="100.64.1.102:8443" name=dqlite remote="100.64.1.1:40832"
2025-10-09T23:08:05.2395726Z time="2025-10-09T23:08:05Z" level=warning msg="Dqlite proxy failed" err="first: remote -> local: read tcp 100.64.1.102:42778->100.64.1.101:8443: read: connection reset by peer" local="100.64.1.102:42778" name=raft remote="100.64.1.101:8443"
2025-10-09T23:08:05.2396937Z time="2025-10-09T23:08:05Z" level=info msg="Dqlite proxy stopped" local="100.64.1.102:42778" name=raft remote="100.64.1.101:8443"
2025-10-09T23:08:05.2397788Z time="2025-10-09T23:08:05Z" level=info msg="Dqlite proxy stopped" local="100.64.1.104:33280" name=raft remote="100.64.1.101:8443"
2025-10-09T23:08:05.2404614Z suites/clustering.sh: line 2820: 32574 Aborted                 (core dumped) nsenter -n -m -t "${pid}" lxd --logfile "${LXD_DIR}/lxd.log" "${SERVER_DEBUG-}" "$@" 2>&1
2025-10-09T23:08:05.2405206Z + cleanup

There are many failures in the other tests ran with that PR but they are unrelated to this rare dqlite issue.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions