multi-viewer: synchronise shared-state mutations + skip empty render cycles by mushogenshin · Pull Request #171 · nmfisher/thermion

mushogenshin · 2026-05-10T12:25:27Z

Summary

Four fixes that together make Thermion's multi-viewer pipeline (multiple `SwapChain`s sharing one `Engine` / `Renderer`) safe under contention. Each addresses a distinct race exposed when several viewers mount, dispose, or render concurrently. Single-viewer is unaffected.

Background

Filament documents `Renderer` as a per-window primitive — the example in `Engine.h` shows one `Engine` ↔ one `SwapChain` ↔ one `Renderer` ↔ one `View`, and `Renderer.h:343` says "It is recommended to use the same swapChain for every call to beginFrame, failing to do so can result is losing all or part of the FrameInfo history." Thermion shares a single `mRenderer` across every `SwapChain` in the engine and calls `beginFrame(SC_i)` / `endFrame()` for each entry in `mViewAttachments` inside one `render()` call. That pattern works in single-viewer apps but has multiple race windows the moment more than one viewer is involved.

This PR closes those windows. (Filament's per-window assumption is itself worth addressing long-term via per-SwapChain Renderers, but the changes below make the shared-Renderer-with-multiple-SwapChains pattern reliable in the meantime.)

Fix 1 — `flushAndWait` before destroying a SwapChain

`Engine::destroy(SwapChain*)` is documented as not synchronising with the GPU; explicit guidance is to call `flushAndWait()` first if a frame might still be in flight. `destroySwapChain` previously called `detachAll` (which removes the swap chain from `mViewAttachments` synchronously) and then immediately `Engine_destroySwapChainRenderThread`. Filament's backend command queue could still have a `BEGIN_FRAME` / `END_FRAME` pair referencing the swap chain when the synchronous `engine->destroy(swapChain)` frees the underlying object. The backend then runs endFrame on freed memory and aborts at `Renderer.cpp:490` — "SwapChain must remain valid until endFrame is called."

Insert `await flush()` between `detachAll` and `Engine_destroySwapChainRenderThread`.

Fix 2 — skip begin/end cycle when SwapChain has no attached views

Both `setRendering(false)` and `renderManager.detach(view)` remove the view from a swap chain entry's view list but leave the swap chain entry itself in `mViewAttachments` for later reattachment. When `renderSwapChainAt` then iterates that entry, it called `mRenderer->beginFrame(...)` / `endFrame()` with no actual rendering work in between — wasted in single-viewer apps, dangerous in multi-viewer ones because the per-pair internal state on Filament's `mSwapChain` gets set/cleared rapidly across the iteration, opening a race window for the validity assertion when concurrent destroy/attach traffic from a sibling viewer hits Filament's internal state.

Scan `attachment.views` before calling `beginFrame`. Return false if all are null. The begin/end pair only happens when there's actual rendering work — which is what Filament's Renderer expects.

Fix 3 — serialise `destroySwapChain` + snapshot `_syncViews` iteration

Two changes in one commit because they address overlapping symptoms in the same dispose path:

(a) `destroySwapChain` serialisation: in mass-dispose scenarios (e.g., a stress-test screen toggling 8 viewers down to 1), eight concurrent `destroySwapChain` calls each ran their own `detachAll → flush → destroy` sequence in unsynchronised parallel. Each call's `flush()` had no stable target — other destroys were concurrently queueing fresh commands on Filament's backend, so no individual flush guaranteed a quiet backend. The shared Renderer's `mSwapChain` could end up pointing at a SwapChain another destroy was in the middle of freeing. Add a `static Future _destroyChain` to `FFIFilamentApp` so each `destroySwapChain` awaits the previous one — flush() now has a quiet backend at the moment it runs.

(b) `_syncViews` iteration snapshot: the loop iterated `_attachments.keys` directly, awaiting a render-thread round-trip on each iteration. Concurrent `attach` / `detach` from sibling viewers mutated the live map across the await, leaving the iterator with stale state — `_syncViews` then pushed inconsistent view lists to C++ RenderManager. Snapshot keys + per-key views at function entry; iterate the snapshot.

Fix 4 — serialise `attach` / `detach` / `detachAll` through a static Future chain

Fix 3 kept iteration self-consistent across its own awaits, but didn't synchronise between sibling viewers' attach/detach calls. Each viewer's snapshot was taken at its own moment; one viewer's snapshot could capture another viewer's view that was being concurrently destroyed via `destroyView` (which itself awaits `detach(view)`). By the time the snapshot's `_syncViews` reached `RenderManager_setRenderableRenderThread`, the captured view's native handle had been freed — Filament dereferenced the dangling pointer and the process took a SIGSEGV (past the precondition checks).

Introduce a static `_opChain` Future and a `_serialize` helper in `FFIRenderManager`. `attach`, `detach`, and `detachAll` now each await the previous op before running. Concurrent calls queue cleanly; view destruction can no longer interleave inside another viewer's sync.

Result

Combined, every shared-state mutation in the multi-viewer pipeline runs serially. The spike app's stress-test screen on Android — which previously crashed on every viewer-count toggle — now mounts, renders, region-picks, and disposes cleanly across 1 / 4 / 8 viewer counts and arbitrary toggle sequences.

Verification

1 → 4 → 8 → 4 → 1 → 4 toggle sequence on Android emulator: stable. No SIGABRT, no SIGSEGV, no `endFrame:490` precondition.
iOS, macOS, Windows: existing test paths pass — no new locks introduced where none were needed.

Sibling PRs in this series: android: per-view swap chain bookkeeping (multi-viewer fixes) #169 (Android per-view swap chains), viewer: detach view from RenderManager before destroying it on dispose #170 (viewer detach-before-destroy). All three are independently mergeable; together they cover every distinct race we've found.
mobile: synthesised tap uses localFocalPoint, not focalPoint (follow-up to #163) #168 — focalPoint follow-up to merged mobile: eager-accept the scale gesture; synthesize tap/double-tap #163.

…ing endFrame Filament's `Engine::destroy(SwapChain*)` is documented as not synchronising with the GPU; explicit guidance is to call `flushAndWait()` first if a frame might still be in flight. `destroySwapChain` previously called `detachAll(swapChain)` (which takes RenderManager's mutex and removes the swap chain from C++ mViewAttachments synchronously, so no further RenderManager::render iteration touches it) and then immediately `Engine_destroySwapChainRenderThread`. That's enough on Thermion's own render thread, but it's not enough for Filament's *backend* command queue: Filament's BEGIN_FRAME / render / END_FRAME commands are processed on its backend thread independently of the caller. An endFrame queued during the last RenderManager::render before detachAll could still be sitting in the backend queue when the synchronous `engine->destroy(swapChain)` frees the SwapChain object. The backend then runs endFrame on freed memory and aborts at Renderer.cpp:490 — "SwapChain must remain valid until endFrame is called." Symptom: SIGABRT on multi-viewer dispose / mount toggles in Flutter apps. Single-viewer is unaffected because the frame scheduler doesn't have anything queued to drain at dispose time. The earlier hypothesis (Issue nmfisher#167) was that the precondition itself was the bug; that turned out to be a separate list-aliasing issue. With nmfisher#167 / nmfisher#168 / nmfisher#170 / nmfisher#171 in place, this surfaced as the real underlying race that all four were downstream of. Fix: insert `await flush()` (which calls Engine_flushAndWaitRenderThread) between detachAll and Engine_destroySwapChainRenderThread. flushAndWait blocks until the backend has processed every command submitted up to this point, including any pending endFrame referencing the swap chain. Then destroy is safe. flush() is heavy but acceptable on a destroy path that already implies the surface is going away.

…ched views Filament's `Renderer` class is documented as a per-window primitive (see Renderer.h: "endFrame() schedules the current frame to be displayed on the Renderer's window."), but Thermion's RenderManager shares a single Renderer across every SwapChain in the Engine, calling `beginFrame(SC_i)` / `endFrame()` for each entry in `mViewAttachments` inside one render() call. That works fine when each entry has views — Filament's internal `mSwapChain` is set, read, and reset cleanly per pair. It misbehaves when an entry has ZERO views attached: the begin/end pair fires the same state machine as a real frame but does nothing useful, opening a window for SwapChain validity assertions to trip during the dispose / mount transitions multi-viewer apps create. Empty-views entries arise easily: `setRendering(false)` and `renderManager.detach(view)` both remove the view from the entry's view list but leave the SwapChain entry in `mViewAttachments` for later reattachment. Each render iteration then ran a wasted begin/end against that empty entry, while concurrent destroy / attach traffic from the disposing or mounting viewer hit Filament's internal state. On Android and Windows this reliably fired `endFrame:490 — SwapChain must remain valid until endFrame is called.` on multi-viewer count toggles. Single-viewer never triggered it because there was never an empty entry to iterate. Fix: scan `attachment.views` before calling beginFrame and return false (skip) if all entries are null. The begin/end pair only happens when there's actual rendering work, which is what Filament's Renderer expects. Single-viewer behaviour is unchanged. Multi-viewer dispose / mount no longer races against empty begin/end cycles.

Two fixes for the residual `endFrame:490 — SwapChain must remain valid` precondition that fires on mass dispose / mount in multi- viewer apps (8 viewers in particular): 1) **`FFIRenderManager._syncViews` snapshots before iterating.** The old loop iterated `_attachments.keys` and `_attachments[key]` directly. Each iteration awaits a render-thread round-trip, and while awaiting, concurrent `attach` / `detach` calls (common when multiple viewers mount/dispose at once) mutate the live map. The iterator state across awaits ends up stale: some swap chains get skipped, others written with old data. The C++ RenderManager then sees inconsistent attachments and the next render tickles the SwapChain validity assertion. Snapshot the keys + per-key view list at function entry, then iterate the snapshot. Tolerates concurrent modifications: if an entry is removed by the time we get back to it, the views list is null and we skip it cleanly. 2) **`FFIFilamentApp.destroySwapChain` serialises through a static Future chain.** With 8 simultaneous viewers disposing, eight `destroySwapChain` calls run concurrently. Each calls `flush()` (Engine::flushAndWait) between detach and engine->destroy — but if other destroys are queueing more commands on Filament's backend at the same instant, no individual flush guarantees a quiet backend. The shared Renderer's `mSwapChain` state ends up pointing at a SwapChain another destroy is in the middle of freeing. Serialise: each destroy awaits the previous one before starting. The flush in each destroy now has a stable target — nothing else is queueing destroy work — so it truly drains, and the synchronous engine->destroy can proceed safely. Together with the empty-views skip in `RenderManager::renderSwap- ChainAt` (52c33f7), these fixes close the timing windows for the shared-Renderer-with-multiple-SwapChains pattern. Filament's Renderer is documented as per-window, but Thermion's design shares it; both patches make the shared use safe under contention. Tested scenario: stress-test screen toggle 1 → 4 → 8 → 1 → 4. The 4-viewer state had been working (with frame drops); 8-viewer was stable but crashed on extended use; 8 → 1 transitions crashed reliably. All three are race surfaces with shared destroy/sync flow; serialisation + snapshotting eliminate them.

… Future chain Previous fix (d261b8f) snapshotted `_syncViews` iteration so the loop wouldn't see stale state across awaits. That kept iteration self-consistent, but didn't address concurrent calls *between* sibling viewers: in multi-viewer apps where multiple viewers mount or dispose simultaneously, sibling viewers' `attach`, `detach`, and `detachAll` calls run in parallel. Each takes its own snapshot at a different moment — and one viewer's snapshot can capture another viewer's view that's about to be destroyed via `destroyView`. By the time the snapshot's `_syncViews` reaches `RenderManager_setRenderableRenderThread`, that view's native handle has been freed; Filament dereferences the dangling pointer and the process takes a SIGSEGV. (The previous SIGABRT preconditions all triggered on detected inconsistent state at the same boundary; SIGSEGV happens when the inconsistent state slips past the precondition checks and hits real memory.) Fix: introduce a static `_opChain` Future and a `_serialize` helper that wraps each mutation of `_attachments`. attach, detach, and detachAll now each await the previous op before running — concurrent calls queue up cleanly, each runs to completion (including its `_syncViews` round-trips) before the next starts. View destruction (`destroyView` awaits `detach(view)` first) cannot interleave inside another viewer's sync. Combined with the destroySwapChain serialisation in ffi_filament_app.dart (also d261b8f), every shared-state mutation in the multi-viewer pipeline now runs serially. Tested scenario was the same one that surfaced the SIGSEGV: toggle stress-test from 8 viewers back to 1. The dispose flow fires 8 simultaneous viewer.dispose chains plus 8 simultaneous ThermionWidget.dispose async closures, each touching the shared `_attachments` and View handle state. Without serialisation the attach for the new single viewer would race against destroyView for the eight outgoing viewers; with serialisation, the ordering is enforced at the boundary that matters.

nmfisher · 2026-05-11T04:44:58Z

Thanks for this (and all the other PRs), these look like some very valuable contributions. Unfortunately this week is looking quite busy so I can't promise when I can get to them, but as soon as I get a chance I'll come back to review.

mushogenshin added 4 commits May 10, 2026 19:24

mushogenshin mentioned this pull request May 10, 2026

android: getSwapChains() should return a snapshot, not the live list #167

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

multi-viewer: synchronise shared-state mutations + skip empty render cycles#171

multi-viewer: synchronise shared-state mutations + skip empty render cycles#171
mushogenshin wants to merge 4 commits into
nmfisher:developfrom
mushogenshin:fix/multi-viewer-serialisation

mushogenshin commented May 10, 2026

Uh oh!

nmfisher commented May 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mushogenshin commented May 10, 2026

Summary

Background

Fix 1 — `flushAndWait` before destroying a SwapChain

Fix 2 — skip begin/end cycle when SwapChain has no attached views

Fix 3 — serialise `destroySwapChain` + snapshot `_syncViews` iteration

Fix 4 — serialise `attach` / `detach` / `detachAll` through a static Future chain

Result

Verification

Related

Uh oh!

nmfisher commented May 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants