fix(rade): restore slice state when RADE engine fails to start#2861
fix(rade): restore slice state when RADE engine fails to start#2861NF0T wants to merge 1 commit into
Conversation
If rade_open() or lpcnet_encoder_create() fails in RADEEngine::start(), activateRADE() previously returned bare, leaving the slice stranded: mode stuck at DIGU/DIGL, audio_mute=1, m_radeSliceId set, but m_digitalVoiceTxSliceId never assigned. Any PTT attempt then hit the DIGU/DIGL transmit guard with no way to recover short of manually changing the slice mode. Call deactivateRADE() on failure — it handles null m_rade safely and restores audio_mute via m_radePrevMute. Also save prevMode before setMode() and restore it after cleanup so the slice returns to its pre-RADE state rather than stranding the user in DIGU/DIGL. Fix qWarning() → qCWarning(lcRade) so the failure appears in the aether.rade support bundle category where it is actionable. Investigated via aethersdr#2856 (RADE unusable on FLEX-6400M) — this cleanup gap is a confirmed latent bug but may not be the root cause of that report, which remains open pending a support bundle. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Thanks @NF0T — this is a clean, well-scoped fix. Traced the failure path and the claims in the description check out:
prevModeis captured beforesetMode(DIGU/DIGL), so the restore at the end uses the actual pre-RADE mode. ✓- By the time
start()returns false,m_radeSliceId(13802),m_radePrevMute(13803),m_audio_mute=true(13804), themodeChangedconnection (13809), and the engine + worker thread (13814–13821) have all been established.deactivateRADE()correctly unwinds each: disconnectmodeChanged, restoreaudioMute(m_radePrevMute), stop the engine on the worker thread, quit/wait/delete the thread, nullm_radeEngine. ✓ RADEEngine::stop()is safe on a non-started engine —src/core/RADEEngine.cpp:100early-returns whenm_radeis null, so theBlockingQueuedConnectionis near-zero cost as you described.start()itself also callsrade_finalize()onrade_open()failure (RADEEngine.cpp:48), so we're not leaving librade state half-initialized either. ✓- The
modeChangedconnection is disconnected insidedeactivateRADE()before we calls->setMode(prevMode)on the slice, so the final restore can't recursively re-triggeronRadeSliceModeChanged → deactivateRADE. ✓ - Re-entry safety:
m_radeEngineis nulled bydeactivateRADE, so a subsequentactivateRADEallocates a fresh engine and thread. Matches your test #3.
One observation, not a blocker — purely a follow-up question:
Filter width and TX-slice promotion aren't reverted. setFilterWidth(±3500, 0/0, 3500) (13796–13799) and setTxSlice(true) (13782) still execute before start(). On the failure path those persist after setMode(prevMode). The filter width is the more visible artifact — if a slice was on (say) USB with a 2.4 kHz filter, a failed RADE attempt leaves it on USB with a 3.5 kHz filter. Probably acceptable since the slice is back in a usable voice mode and the user is unlikely to notice, but worth flagging in case you want to extend the restore. Reasonable to defer.
Also a nice incidental touch: switching the failure log from qWarning() to qCWarning(lcRade) lands it in the support-bundle category where it's actually findable. And the honesty in the "Relation to #2856" section ("we cannot confirm this is the root cause … do not close #2856 based on this PR") is appreciated — exactly the right call given the reporter's bundle is still outstanding.
LGTM as a focused latent-bug fix.
There was a problem hiding this comment.
Thanks @NF0T — clean, well-scoped fix, and the PR description is exemplary (clear root cause, honest caveat about #2856, re-entry test included). The reuse of deactivateRADE() is the right call: it's idempotent over the fields that haven't been set yet (m_audio->setRadeMode(false), setDigitalVoiceTxSlice(-1), the disconnect() calls on never-made connections, etc.), so there's no harm in walking the full teardown.
I verified the recovery sequence at MainWindow.cpp:13776–13830 and it does what you describe — m_radePrevMute is captured before setAudioMute(true) so the restore is faithful, the modeChanged signal is disconnected by deactivateRADE() before you restore prevMode, and the early-entry guards (m_radeSliceId == sliceId && m_radeEngine && ...) are happy on retry because m_radeEngine is nulled.
Two residual gaps in the same activation path that this PR does not cover, both with the same shape as the bug you're fixing — activateRADE() mutates pre-start() slice state that the failure path doesn't roll back:
-
Filter width not restored (
MainWindow.cpp:13796–13799). When mode is restored toprevMode, the filter widths stay clamped at(-3500, 0)or(0, 3500). The slice ends up in, e.g., LSB but with RADE-shaped filter edges — visible to the user and visible on-radio. Could be addressed by capturingprevLow, prevHigh = s->filterWidth()alongsideprevModeand restoring both, or by letting the mode change naturally drive filter defaults (depending onSliceModelsemantics here). -
setTxSlice(true)not reverted (MainWindow.cpp:13781–13782). If the slice wasn't already the TX slice, the TX badge has been moved by the timestart()fails. The PR's recovery leaves it on the wrong slice. CapturingprevTxSlice = s->isTxSlice()and restoring it in the failure branch would close that gap.
Neither is critical and neither is a regression from this PR — they're pre-existing cleanup gaps that became visible once you started reasoning about the failure path. Happy for these to land in a follow-up if you'd rather keep this PR tightly scoped.
LGTM otherwise.
Summary
When
RADEEngine::start()fails (e.g.rade_open()orlpcnet_encoder_create()returns null),activateRADE()previously returned bare without cleanup, leaving the slice in a broken half-initialized state:audio_mute=1sent to radio — not restoredm_radeSliceIdset — not clearedm_radeEngineallocated with a running idle thread — not torn downm_digitalVoiceTxSliceIdnever set (that line comes afterstart())The practical result: every PTT attempt hit the DIGU/DIGL voice transmit guard (
localPttInterlockMessage/radioInterlockNotificationMessage BAD_MODE) with no user-visible indication of why, and no recovery path short of manually changing the slice mode.Changes
src/gui/MainWindow.cpp—activateRADE()onlyprevMode = s->mode()beforesetMode(DIGU/DIGL)so the failure path can restore it.!ok: calldeactivateRADE()instead of barereturn.deactivateRADE()already handles nullm_radesafely —RADEEngine::stop()returns immediately onif (!m_rade) return, so the blocking invocation is near-zero cost. It restoresaudio_muteviam_radePrevMute, clearsm_radeSliceId, and tears down the engine and thread cleanly.deactivateRADE(), restoreprevModeso the slice returns to its pre-RADE mode rather than stranding the user in DIGU/DIGL.qWarning()→qCWarning(lcRade)so the failure is visible in theaether.radesupport bundle category where it is actionable.Relation to issue #2856
This fix was identified while triaging #2856 (RADE unusable on FLEX-6400M fw 4.13 — "cannot transmit in DIGI mode"). The stranded-state symptom described there is consistent with this code path. However, we cannot confirm this is the root cause of #2856 — RADE engine startup succeeds in our test environment (Windows, FLEX-8400, fw 4.2.x) and the reporter has not yet provided a support bundle with RADE logging enabled. Do not close #2856 based on this PR. That issue remains open awaiting the reporter's log.
This cleanup gap is a latent bug independently of #2856 and warrants fixing regardless of what the reporter's bundle ultimately shows.
Test plan
falsefromRADEEngine::start(), selected RADE — slice mode reverted to pre-RADE mode, audio unmuted, no transmit block,aether.radelog showsRADE engine failed to start — restoring slice state🤖 Generated with Claude Code