android: getSwapChains() should return a snapshot, not the live list#167
android: getSwapChains() should return a snapshot, not the live list#167mushogenshin wants to merge 18 commits into
Conversation
The build hook in thermion_dart/hook/build.dart enumerates every
.cpp file under native/src/ and applies platform-specific exclusions
for `windows`, `d3d`, and `linux` paths. There is no analogous
exclusion for `vulkan/`, so on iOS the hook compiles
`native/src/vulkan/VulkanUtils.cpp` and
`native/src/vulkan/BaseVulkanTexture.cpp`. Both reference symbols in
the `bluevk::` namespace, but iOS is Metal-only and the platform
branch never adds `bluevk` to its `libs` list.
Result on iOS device builds (Xcode 15, Flutter master):
Undefined symbols for architecture arm64:
"bluevk::vkMapMemory", referenced from:
thermion::vulkan::readVkImageToBitmap(...) in VulkanUtils-*.o
"bluevk::vkFreeMemory", referenced from: ...
... (~50 more bluevk:: symbols)
ld: symbol(s) not found for architecture arm64
The whole `flutter build ios` fails inside `dart_build` because the
shared lib never links.
Fix: skip vulkan-prefixed source paths when targetOS is iOS, mirroring
the structure of the existing windows / linux exclusions a few lines
above. macOS still gets the vulkan sources (it links `bluevk`),
Android still gets them (also links `bluevk`), Linux still gets them.
Only iOS changes.
Verified: `flutter build ios --debug --no-codesign` now produces a
working `Runner.app` on top of upstream develop.
The mobile gesture handler in `_MobileListenerWidget` used a regular `GestureDetector` whose internal `ScaleGestureRecognizer` waits for movement to cross `kPanSlop` before claiming the gesture arena. When a Thermion view sits inside an ancestor `Scrollable` (e.g. ListView, PageView, CustomScrollView), the ancestor's `VerticalDragGesture- Recognizer` reaches its acceptance threshold first and wins the arena — touches starting on the viewer get interpreted as page scrolls, not viewport gestures. The result on iOS is "sometimes tumbles, sometimes scrolls" depending on drag direction and speed. Fix: switch to `RawGestureDetector` with a `_EagerScaleGesture- Recognizer` subclass that calls `resolve(GestureDisposition.accepted)` inside `addAllowedPointer`. The arena is claimed on PointerDown, so ancestor scrollables never get the chance to win. Single-finger orbit and pinch-zoom both resolve to the viewport regardless of how the viewer is composed. Tap and double-tap can no longer be detected via separate `TapGestureRecognizer` / `DoubleTapGestureRecognizer` entries (the eager scale wins arena before they ever accept), so they're synthesized inside the scale callbacks: a "tap" is a scale gesture whose total focal-point movement stayed below 8 px and whose duration stayed below 250 ms; a "double-tap" is two such taps within 300 ms of each other and within 8 px of each other. Both still feed `InputHandler` via the existing `TouchEvent(tap, ...)` / `TouchEvent( doubleTap, ...)` events, so callers see no API change. Verified: `flutter build ios --debug --no-codesign` succeeds; running on iPhone the multi-viewer stress test (8 viewers in a vertical ListView) now lets the parent scroll only when drags start in the gutter and routes every touch starting on a viewer to the viewport deterministically. Filing alongside the iOS vulkan-source exclusion (PR nmfisher#160).
The Windows branch of the build hook gathers cl.exe options (includes, defines, response file with sources) but stops short of adding the /link separator and /LIBPATH:$libDir. Without those, the linker has no idea where the Filament .lib artifacts that pub get downloaded actually live, so it fails with LNK1104 / LNK2019 on every Filament symbol reference and cl.exe exits 2. The library *inputs* themselves don't need to be added on the command line because native/include/ThermionWin32.h declares all of them via #pragma comment(lib, "filament.lib") and friends, and that header is transitively included by the Windows vulkan/d3d sources plus the generic c_api headers (TCamera.h, TRenderer.cpp, TRenderManager.cpp). The pragma directives emit linker directives that the linker honours automatically — but it still has to find the .lib files via a search path, hence /LIBPATH. Uncomment /link and /LIBPATH:$libDir; leave /DLL out (CBuilder.library already passes /LD which makes /DLL redundant) and leave the individual sources out (they're already in the response file). Verification needed on Windows: I don't have a Windows dev box. Filing this as a separate fork branch so it can be rebased / dropped independently of the iOS fixes (nmfisher#160 vulkan-source exclusion, nmfisher#163 eager scale).
Thermion's build hook routes all log output through a Logger that writes to .dart_tool/thermion_dart/log/build.log. native_toolchain_c captures subprocess (cl.exe, clang, ld) stdout/stderr and routes the stderr through `logger.severe(...)`. On a build failure runProcess throws a ProcessException whose message is just the command line and exit code — the actual compiler / linker output stays inside that build.log file in the pub cache, where it is invisible to CI logs and to anyone running `flutter build` who isn't already poking around in `.dart_tool/`. Result: compile / link failures look completely opaque downstream. The exception output shows the cl.exe command, exit code 2, and nothing else. Has been blocking diagnosis of the Windows native build for several CI iterations. Mirror SEVERE-level records to stderr inside the existing log handler so the real error reaches whoever's watching. Successful builds aren't noisier — compilers don't emit much stderr on success — and the build.log file behavior is unchanged.
native_toolchain_c.runCl already injects /Fe:, /LD, its own /link
separator, /MACHINE, and /LIBPATH (from libraryDirectories) when
dynamicLibrary is set. By manually adding our own '/link
/LIBPATH:$libDir' to the user flags list, cl.exe's parser stopped
reading compile-time flags at our separator — pushing the toolchain's
auto-injected /LD and /Fe: into LINK's argument vector, where LINK
ignored them as LNK4044. With /LD never reaching cl.exe, no /DLL was
forwarded to the linker, which then tried to produce an EXE and bailed
out with LNK1561 ("entry point must be defined").
Drop the redundant /link /LIBPATH and let `libraryDirectories: [libDir]`
do its job. Also drop /VERBOSE from the compile flag list — it's a
linker option that cl.exe parses as deprecated /V<string> (warning
D9035); if the verbose link map is ever needed, it belongs after
native_toolchain_c's own /link separator.
Diagnosed via the stderr-tee added in the prior commit; the smoking
gun was the cl.exe command dump in build.log showing two /link
separators and unrecognized cl flags trailing the first one.
Windows filesystems are case-insensitive, so the basename collision
between native/src/scene/Gizmo.cpp and native/include/material/gizmo.c
caused both source files to compile to the same gizmo.obj path. The
material was added to the source list AFTER the scene class, so its
.obj write-clobbered the class's, leaving thermion::Gizmo's
constructor, pick, highlight, and unhighlight symbols out of the link.
TGizmo.cpp's c_api wrappers then failed with four LNK2019s and a
final LNK1120 ("4 unresolved externals"). LINK's earlier LNK4042
warning ("object specified more than once; extras ignored") was the
breadcrumb.
Rename the material .c file to gizmo_material.c (gizmo.h stays put,
since only the .c basename collided). materialSources entry in
build.dart updated to match. The "gizmo" key still drives the
GIZMO_ENABLED=1 define so existing call sites and tests are
unaffected.
Verified with `flutter test` on Windows after both this and the
preceding cl/link fix: 500/500 tests pass, thermion_dart.dll
produced.
The link hook called CLinker.library(... LinkerOptions.manual(...))
with no `sources`. native_toolchain_c.runCl translates that into:
cl.exe /O2 /LD /Fe:thermion_dart.dll /link /OPT:REF /MACHINE:X64
/LIBPATH:<empty link/ output dir>
i.e. cl.exe with no inputs at all, which exits immediately with
`cl : Command line error D8003 : missing source filename`. Under
`flutter test` the link hook is not invoked, so this only surfaces
during a real `flutter build windows --release` — where it shows up
as the terse "Linking native assets failed" through MSBuild, with
the actual D8003 stranded inside the per-package build.log in the
pub cache.
The link phase here is optional: the build hook already produced
thermion_dart.dll and added a CodeAsset for it. Pass those
build-hook assets through unchanged on Windows. Keep the CLinker
call on platforms where it currently works.
…call
`thermion_flutter_plugin`'s own translation units include Filament
headers (transitively, via `WindowsVulkanContext.h` → `Platform.h`),
but `target_include_directories(... INTERFACE ...)` only exposes the
DART_PKG_HEADERS paths to consumers, not to the plugin itself. cl.exe
then could not find `<utils/...>` headers when compiling the plugin,
producing C3083 / C2039 / C4430 on filament headers.
Switch the scope to PUBLIC so the plugin's own compile and any
consumer both see the headers.
Also drop the immediately-following
include_directories(${PLUGIN_NAME} INTERFACE ...)
call. `include_directories()` accepts only path arguments — the
target name and `INTERFACE` keyword are silently treated as bogus
"directory" entries, so the call doesn't do what its author
intended; it just adds two non-existent paths to the directory-level
include set.
Filament's `backend/Platform.h` friend-declares
`utils::io::ostream& operator<<(...)` without first declaring the
nested namespace. Sibling Filament headers like
`backend/DriverEnums.h` carry the forward declaration
namespace utils::io { class ostream; }
so the rest of Filament compiles cleanly. The plugin's translation
unit reaches `Platform.h` (via `WindowsVulkanContext.h`) before any
of those siblings, leaving the namespace undeclared and cl.exe
rejecting the friend with C3083 ("the symbol to the left of '::'
must be a type"), C2039 ("'ostream': is not a member of 'utils'"),
and C4430.
Mirror Filament's own forward declaration in the plugin header,
just before the WindowsVulkanContext include. Pulling the full
`<utils/ostream.h>` would also work but drags in more than the
friend needs; the namespace + class fwd-decl matches what
DriverEnums.h does.
…ying its own swapchain
Real cause of the SIGABRT-on-mount on Android. Not a destroy-vs-
endFrame race (the previous flush() patch was wrong). It's a
straight-up API bug in FFIFilamentApp.getSwapChains():
Future<Iterable<SwapChain>> getSwapChains() async {
return _swapChains; // ← live reference, not a snapshot
}
In `thermion_flutter_plugin_native.dart` the Android branch of
createTextureAndBindToView reads this list, creates a new swap
chain, then destroys whatever was in the snapshot:
final swapChains = await FilamentApp.instance!.getSwapChains();
final swapChain = await FilamentApp.instance!.createSwapChain(...);
if (swapChains.isNotEmpty) {
await FilamentApp.instance!.destroySwapChain(swapChains.first);
}
await FilamentApp.instance!.renderManager.attach(view, swapChain);
The intent reads correctly: "snapshot existing chains; create the
new one; tear down old chains". But because getSwapChains returned
the live list reference rather than a copy, `swapChains` aliased
`_swapChains`. createSwapChain appended to _swapChains, so
swapChains.first immediately referred to the *new* swap chain we
just created. The plugin then destroyed its own new swap chain
and attached the view to a freed pointer. The next render hit
Filament's `SwapChain must remain valid until endFrame is called`
precondition and the process aborted with SIGABRT.
Reproduced on every viewer mount on Android — including
examples/flutter/quickstart, ARM64 emulator and physical device.
iOS / macOS / Windows are unaffected because their plugin paths
don't iterate getSwapChains().
Fix: return List.unmodifiable(_swapChains). Defensive at the API
layer; the call-site logic in plugin_native.dart is now correct
without modification.
The Android branch of `createTextureAndBindToView` (in `thermion_flutter_plugin_native.dart`) destroys the *previous* swap chain for the size-change case so the renderer doesn't keep an old, wrong-sized surface around. The previous version chose what to destroy by iterating FilamentApp's global swap chain list and picking `swapChains.first`. After the upstream getSwapChains() snapshot fix (nmfisher#167) the list is correctly snapshot, but in a multi-viewer app `swapChains.first` is whichever viewer's swap chain happens to be earliest in the list — usually a *different* viewer's, not this one's. So mounting viewer #N destroyed viewer #(N-1)'s swap chain. The renderer for #(N-1) then ran with a freed pointer; symptom on screen was middle viewers freezing while only the most recently mounted one (whose swap chain nothing destroyed) kept rendering. Fix: track the swap chain currently bound to each view in a `<View, SwapChain>` map and destroy *that* one, scoped to this view only. Other viewers' swap chains stay live. Order also rearranged: attach the new swap chain *before* destroying the old one, so the view never has a window of being unattached. Other ordering inside the function is unchanged. `_viewSwapChains` mirrors the pattern of the existing `_viewRenderTargets` map a few lines above. No cleanup hook on view-destroy, same as `_viewRenderTargets`'s existing convention; that's a separate lifecycle concern. Reproduces with the spike's stress-test screen at 4 / 8 viewers on Android emulator. Single-viewer is unaffected (map starts empty, no destroy on first mount). iOS / macOS / Windows are unaffected — only the Android plugin branch ever ran the destroy logic.
Fixes a bug introduced by this same patch: the synthesised tap was dispatched with `event.focalPoint`, which Flutter documents as *global* coordinates (relative to the screen / Flutter window). `TouchEvent.localPosition` consumers — most importantly `View.pick(x, y, …)` — expect coordinates relative to the rendered viewport, not the entire screen. Whenever the ThermionWidget wasn't anchored at (0, 0) on screen (i.e. every realistic layout: anything inside a Scaffold body, a ListView, a Padding, etc.), picks landed outside the viewport and the result handler always returned the background entity. Symptom on Android: region clicks did nothing in any layout that nested the viewer below an AppBar / inside a list. iOS / macOS hadn't been hit yet because the spike author's iOS test layout happened to have the viewer near the screen origin. Fix: capture `event.localFocalPoint` (widget-relative) into `_scaleStartFocal`, and compare with `event.localFocalPoint` in onUpdate so the tap-vs-drag movement threshold is also computed in widget-local space. The `inputHandler.handle(ScaleStart/Update Event(localFocalPoint: …))` calls keep passing `event.focalPoint` (global) since that matches the upstream convention used elsewhere in this file for those events — and those events are consumed by orbit-camera code that uses deltas, not absolute positions, so the global/local distinction doesn't affect them. Doesn't change behaviour for any caller that doesn't use the synthesised tap (orbit, pinch-zoom, double-tap detection's duration-based half is already correct).
After the previous patch (per-view swap chain tracking) the Android plugin correctly created and reused a SwapChain bound to each View. Disposal of the chain when the widget unmounted, however, was not wired up — the SwapChain stayed in `_viewSwapChains` and stayed attached to the RenderManager, while `ThermionWidget.dispose()` released the underlying SurfaceTexture (and thus the native window). Result on the next frame: Filament's render thread iterates the attached swap chains, hits the now-orphaned one, and calls `eglSwapBuffers` on the freed native window. The emulator's EGL backend logs: E/EGL_emulation: egl_window_surface_t::swapBuffers called with NULL buffer E/EGL_emulation: tid <n>: swapBuffers(764): error 0x300d (EGL_BAD_SURFACE) repeatedly until the next frame after the SwapChain is collected elsewhere. On the spike's stress-test screen at 4 / 8 viewers, even brief widget rebuilds (e.g. toggling viewer count) flood the log and cause visible frame drops because every render iteration attempts the bad swap. Fix: add a public `releaseTextureBindingForView(View view)` method to `ThermionFlutterPlugin` that releases plugin-side resources bound to a view (currently: the Android per-view SwapChain). The native plugin removes the entry from `_viewSwapChains` and calls `destroySwapChain` (which detaches from RenderManager and destroys the Filament SwapChain). Web plugin is a no-op — it doesn't track per-view bindings. `ThermionWidget.dispose()` now calls this BEFORE destroying the texture descriptor. The two are sequenced inside an unawaited async closure so dispose() stays synchronous for the framework but the swap chain destroy completes before the SurfaceTexture release. Other platforms aren't affected — releaseTextureBinding ForView is a no-op when no per-view binding exists.
`ThermionViewerFFI.dispose()` destroyed the underlying Filament
view without first removing it from `FFIRenderManager`'s Dart-side
`_attachments` map. The map keys each swap-chain entry to a list
of (renderOrder, View) tuples; if a view is destroyed while still
listed, any later `attach` / `detach` / `_syncViews` call from a
*different* viewer will iterate the now-stale tuple, retrieve the
freed view's native pointer, and pass it to
`RenderManager_setRenderableRenderThread`. Filament then does a
`handle_cast` and aborts with the generic
"Postcondition: corrupted heap Handle ... tag=(no tag)".
Reproduces reliably in multi-viewer Flutter apps when one viewer
is disposed concurrently with another viewer being mounted:
- Old viewer.dispose() runs (queued onto whatever the host
serialises on, e.g. a static `_createSerial` Future chain).
- New viewer's createTextureAndBindToView calls
renderManager.attach(newView, newSwapChain), which calls
`_syncViews()` — this walks every entry in `_attachments`,
including the entry containing the just-destroyed old view.
- The freed pointer hits Filament's handle table → SIGABRT.
Symptom on Android (which is the most multi-viewer-heavy platform
in practice) is an immediate crash on toggling viewer count.
iOS / macOS happen to work because their host plugins don't
require the same reattach pattern, but the bug is real on every
platform — the symptom is just less reproducible.
Fix: call `FilamentApp.instance!.renderManager.detach(view)`
before `View_setScene(view, nullptr)` and `destroyView(view)`.
`detach(view)` (no swap chain argument) iterates `_attachments`
and removes the view from every entry, then runs `_syncViews()`
to push the cleaned state to C++ RenderManager. After that,
destroying the view is safe — no other code path will try to
dereference its handle.
…ing endFrame Filament's `Engine::destroy(SwapChain*)` is documented as not synchronising with the GPU; explicit guidance is to call `flushAndWait()` first if a frame might still be in flight. `destroySwapChain` previously called `detachAll(swapChain)` (which takes RenderManager's mutex and removes the swap chain from C++ mViewAttachments synchronously, so no further RenderManager::render iteration touches it) and then immediately `Engine_destroySwapChainRenderThread`. That's enough on Thermion's own render thread, but it's not enough for Filament's *backend* command queue: Filament's BEGIN_FRAME / render / END_FRAME commands are processed on its backend thread independently of the caller. An endFrame queued during the last RenderManager::render before detachAll could still be sitting in the backend queue when the synchronous `engine->destroy(swapChain)` frees the SwapChain object. The backend then runs endFrame on freed memory and aborts at Renderer.cpp:490 — "SwapChain must remain valid until endFrame is called." Symptom: SIGABRT on multi-viewer dispose / mount toggles in Flutter apps. Single-viewer is unaffected because the frame scheduler doesn't have anything queued to drain at dispose time. The earlier hypothesis (Issue nmfisher#167) was that the precondition itself was the bug; that turned out to be a separate list-aliasing issue. With nmfisher#167 / nmfisher#168 / nmfisher#170 / nmfisher#171 in place, this surfaced as the real underlying race that all four were downstream of. Fix: insert `await flush()` (which calls Engine_flushAndWaitRenderThread) between detachAll and Engine_destroySwapChainRenderThread. flushAndWait blocks until the backend has processed every command submitted up to this point, including any pending endFrame referencing the swap chain. Then destroy is safe. flush() is heavy but acceptable on a destroy path that already implies the surface is going away.
…ched views Filament's `Renderer` class is documented as a per-window primitive (see Renderer.h: "endFrame() schedules the current frame to be displayed on the Renderer's window."), but Thermion's RenderManager shares a single Renderer across every SwapChain in the Engine, calling `beginFrame(SC_i)` / `endFrame()` for each entry in `mViewAttachments` inside one render() call. That works fine when each entry has views — Filament's internal `mSwapChain` is set, read, and reset cleanly per pair. It misbehaves when an entry has ZERO views attached: the begin/end pair fires the same state machine as a real frame but does nothing useful, opening a window for SwapChain validity assertions to trip during the dispose / mount transitions multi-viewer apps create. Empty-views entries arise easily: `setRendering(false)` and `renderManager.detach(view)` both remove the view from the entry's view list but leave the SwapChain entry in `mViewAttachments` for later reattachment. Each render iteration then ran a wasted begin/end against that empty entry, while concurrent destroy / attach traffic from the disposing or mounting viewer hit Filament's internal state. On Android and Windows this reliably fired `endFrame:490 — SwapChain must remain valid until endFrame is called.` on multi-viewer count toggles. Single-viewer never triggered it because there was never an empty entry to iterate. Fix: scan `attachment.views` before calling beginFrame and return false (skip) if all entries are null. The begin/end pair only happens when there's actual rendering work, which is what Filament's Renderer expects. Single-viewer behaviour is unchanged. Multi-viewer dispose / mount no longer races against empty begin/end cycles.
Two fixes for the residual `endFrame:490 — SwapChain must remain valid` precondition that fires on mass dispose / mount in multi- viewer apps (8 viewers in particular): 1) **`FFIRenderManager._syncViews` snapshots before iterating.** The old loop iterated `_attachments.keys` and `_attachments[key]` directly. Each iteration awaits a render-thread round-trip, and while awaiting, concurrent `attach` / `detach` calls (common when multiple viewers mount/dispose at once) mutate the live map. The iterator state across awaits ends up stale: some swap chains get skipped, others written with old data. The C++ RenderManager then sees inconsistent attachments and the next render tickles the SwapChain validity assertion. Snapshot the keys + per-key view list at function entry, then iterate the snapshot. Tolerates concurrent modifications: if an entry is removed by the time we get back to it, the views list is null and we skip it cleanly. 2) **`FFIFilamentApp.destroySwapChain` serialises through a static Future chain.** With 8 simultaneous viewers disposing, eight `destroySwapChain` calls run concurrently. Each calls `flush()` (Engine::flushAndWait) between detach and engine->destroy — but if other destroys are queueing more commands on Filament's backend at the same instant, no individual flush guarantees a quiet backend. The shared Renderer's `mSwapChain` state ends up pointing at a SwapChain another destroy is in the middle of freeing. Serialise: each destroy awaits the previous one before starting. The flush in each destroy now has a stable target — nothing else is queueing destroy work — so it truly drains, and the synchronous engine->destroy can proceed safely. Together with the empty-views skip in `RenderManager::renderSwap- ChainAt` (52c33f7), these fixes close the timing windows for the shared-Renderer-with-multiple-SwapChains pattern. Filament's Renderer is documented as per-window, but Thermion's design shares it; both patches make the shared use safe under contention. Tested scenario: stress-test screen toggle 1 → 4 → 8 → 1 → 4. The 4-viewer state had been working (with frame drops); 8-viewer was stable but crashed on extended use; 8 → 1 transitions crashed reliably. All three are race surfaces with shared destroy/sync flow; serialisation + snapshotting eliminate them.
… Future chain Previous fix (d261b8f) snapshotted `_syncViews` iteration so the loop wouldn't see stale state across awaits. That kept iteration self-consistent, but didn't address concurrent calls *between* sibling viewers: in multi-viewer apps where multiple viewers mount or dispose simultaneously, sibling viewers' `attach`, `detach`, and `detachAll` calls run in parallel. Each takes its own snapshot at a different moment — and one viewer's snapshot can capture another viewer's view that's about to be destroyed via `destroyView`. By the time the snapshot's `_syncViews` reaches `RenderManager_setRenderableRenderThread`, that view's native handle has been freed; Filament dereferences the dangling pointer and the process takes a SIGSEGV. (The previous SIGABRT preconditions all triggered on detected inconsistent state at the same boundary; SIGSEGV happens when the inconsistent state slips past the precondition checks and hits real memory.) Fix: introduce a static `_opChain` Future and a `_serialize` helper that wraps each mutation of `_attachments`. attach, detach, and detachAll now each await the previous op before running — concurrent calls queue up cleanly, each runs to completion (including its `_syncViews` round-trips) before the next starts. View destruction (`destroyView` awaits `detach(view)` first) cannot interleave inside another viewer's sync. Combined with the destroySwapChain serialisation in ffi_filament_app.dart (also d261b8f), every shared-state mutation in the multi-viewer pipeline now runs serially. Tested scenario was the same one that surfaced the SIGSEGV: toggle stress-test from 8 viewers back to 1. The dispose flow fires 8 simultaneous viewer.dispose chains plus 8 simultaneous ThermionWidget.dispose async closures, each touching the shared `_attachments` and View handle state. Without serialisation the attach for the new single viewer would race against destroyView for the eight outgoing viewers; with serialisation, the ordering is enforced at the boundary that matters.
|
Closing in favour of more focused PRs. This branch was originally about a single bug (`getSwapChains()` returning a live list reference) but accumulated nine commits as we kept finding adjacent races in the multi-viewer dispose / mount path on Android. That makes for an unwieldy review. I've split the contents into three independently-reviewable PRs against `develop`, each with a single coherent narrative:
Also: #168 — follow-up to the merged #163 fixing the synthesised tap to use `event.localFocalPoint` (Android picking) — the cherry-pick of that fix was on this branch as ec6b305; it's now its own PR against develop since it logically belongs with #163. The branch (`fix/android-swapchain-self-destroy`) is preserved on the fork for spike continuity but is no longer the upstream PR. |
Problem
On Android, every `ThermionViewer` mount aborts the process with:
```
E/Filament: Precondition
E/Filament: in endFrame:490
E/Filament: reason: SwapChain must remain valid until endFrame is called.
F/libc: Fatal signal 6 (SIGABRT), code -1 (SI_QUEUE)
```
Reproducible with `examples/flutter/quickstart` unmodified, both on the emulator (`sdk gphone64 arm64`, API 35) and on a physical Android device. iOS, macOS, and Windows are unaffected.
Cause
The crash is not a destroy-vs-`endFrame` race. `Filament` complains because the SwapChain pointer was freed before `endFrame` ran. How that happened is a list-aliasing bug in `FFIFilamentApp.getSwapChains()`:
```dart
// thermion_dart/lib/src/filament/src/implementation/ffi_filament_app.dart
final _swapChains = [];
Future<Iterable> getSwapChains() async {
return _swapChains; // ← live reference, not a snapshot
}
```
The Android branch of `thermion_flutter_plugin_native.dart` (`createTextureAndBindToView`) reads the swap chains, creates a new one for the surface texture, then destroys whatever was in the snapshot:
```dart
if (Platform.isAndroid) {
final swapChains = await FilamentApp.instance!.getSwapChains(); // aliases _swapChains
final swapChain = await FilamentApp.instance!.createSwapChain( // appends to _swapChains
Pointer.fromAddress(descriptor.windowHandle!));
if (swapChains.isNotEmpty) { // now contains [newSwapChain]
await FilamentApp.instance!.destroySwapChain(swapChains.first); // destroys the NEW one
}
await FilamentApp.instance!.renderManager.attach(view, swapChain);
// `swapChain` now points to freed memory.
}
```
The intent reads cleanly — "snapshot the existing chains, create the new one, tear down old chains" — but because `getSwapChains()` returned the live list reference, `createSwapChain` mutated the very list the plugin was iterating. `swapChains.first` was the swap chain the plugin had just created. The plugin destroyed it, attached the view to a freed pointer, and the next render hit the precondition.
iOS, macOS, and Windows don't reach this code path (their plugin branches don't iterate `getSwapChains()`).
Reproduction
```bash
git clone https://github.com/nmfisher/thermion
cd thermion/examples/flutter/quickstart
flutter pub get
flutter run -d
```
Crashes on first viewer mount, before any user interaction. Log signature:
```
[INFO] FFIFilamentApp: Created swapchain from window
[INFO] FFIFilamentApp: Destroying swapchain
[INFO] FFIFilamentApp: Destroyed swapchain
E/Filament: Precondition in endFrame:490 — SwapChain must remain valid until endFrame is called.
F/libc: Fatal signal 6 (SIGABRT)
```
Fix
One-line change in `getSwapChains` to return an unmodifiable snapshot:
```dart
Future<Iterable> getSwapChains() async {
return List.unmodifiable(_swapChains);
}
```
Defensive at the API layer — fixes the Android plugin's logic without modifying the plugin, and protects any future caller from the same aliasing pitfall.
Verification
Related fork patches
This is the fourth Android-/iOS-/Windows-targeted fix on the mushogenshin/thermion fork. The other three are filed as #160 (iOS vulkan exclusion), #163 (eager-scale gesture for arena conflict), and #164 (Windows native build hook + plugin includes). All independent of this one.