Skip to content

Queue UI shows stale queued state after tab/app regains focus #333

@Th0rgal

Description

@Th0rgal

Summary

When a user unfocuses the browser tab (web) or backgrounds the app (iOS), queued messages that were processed while the window was inactive still appear as "Queued" in the UI when the user returns. The correct state (processed / in chat) only appears after the async reconciliation completes, causing a brief but confusing stale state.

Reproduction

  1. Start a mission and wait for the agent to begin working
  2. Send a second message while the agent is busy — it appears in the queue strip with the "Queued" badge
  3. Switch to another tab/app and wait for the agent to finish the first message and process the queued one
  4. Switch back to the sandboxed.sh tab — the message still shows as queued for a moment before the reload catches up

Root cause analysis

The dashboard has four sync mechanisms for queue state:

Mechanism Trigger What it does
SSE user_message Real-time event with queued: false Updates item's queued flag inline (control-client.tsx:5007-5025)
SSE status queue_len decreases Calls syncQueueForMission() → fetches fresh queue from /api/control/queue (control-client.tsx:4967-4969)
Visibility change document.visibilityState === "visible" Calls reloadMissionHistory() → full parallel fetch of mission + events + queue (control-client.tsx:6258-6266)
Periodic poll Every 15 seconds (running missions only) Same reloadMissionHistory() (control-client.tsx:6269-6277)

The problem manifests because:

Web (Next.js dashboard)

  1. Browser throttling of background tabs. Chrome and other browsers aggressively throttle background tabs: timers are reduced to 1/sec, requestAnimationFrame stops, and while SSE connections stay open, React state updates from event handlers may be batched/deferred. The user_message { queued: false } SSE event arrives but React may not commit the state update to the DOM until the tab regains focus.

  2. Stale-then-fresh flash. When the user refocuses, the visibility change handler fires reloadMissionHistory() which is async. Between the moment the tab becomes visible (showing the stale queued state from the last committed render) and when the API response arrives and setItems() runs, the user sees the outdated "Queued" badge. This window can be 200-500ms depending on API latency.

  3. syncQueueForMission guard. The syncingQueueRef mutex (control-client.tsx:6164) prevents concurrent queue syncs. If a status event fires at roughly the same time as the visibility change reload, one of them is skipped, potentially extending the stale window.

iOS

  1. URLSession stream suspension. When the app enters the background, iOS suspends the URLSession, which stops SSE delivery entirely. Events emitted during this time are lost — the TCP connection may even be torn down.

  2. Foreground recovery relies solely on scenePhase. The .onChange(of: scenePhase) handler (ControlView.swift:315-326) calls reloadMissionFromServer() when the app becomes active, but this has the same stale-then-fresh flash as the web since it's async. Additionally, if the SSE connection was torn down, the reconnection and the foreground reload race against each other.

Proposed fixes (ordered by impact)

1. Optimistic queue state invalidation on focus (quick win)

When the tab/app regains focus, immediately clear the queued flag on all items before the async reload confirms. Since the user was away and the agent was working, the overwhelmingly common case is that queued items have been processed.

// In visibility change handler (control-client.tsx:6259)
const handleVisibilityChange = () => {
  if (document.visibilityState === "visible" && viewingMissionId) {
    // Immediately mark all items as not-queued to avoid stale flash
    setItems((prev) =>
      prev.map((item) =>
        item.kind === "user" && item.queued ? { ...item, queued: false } : item
      )
    );
    reloadMissionHistory(viewingMissionId);
  }
};

The subsequent reloadMissionHistory will re-apply correct queued flags if any items are still genuinely queued. The brief false-negative (showing a queued item as not-queued for 200ms) is far less confusing than the current false-positive.

2. SSE reconnect with event replay / sequence tracking (robust fix)

Add a monotonic sequence number to SSE events. On reconnection or focus-recovery, the client sends Last-Event-ID and the backend replays missed events. This is the standard SSE spec mechanism (EventSource supports it natively, but the current fetch-based streaming in api.ts:475-630 doesn't use it).

Backend changes:

  • Add an incrementing id: field to each SSE event in the stream response
  • Keep a bounded ring buffer of recent events per mission (last ~100)
  • On reconnect with Last-Event-ID, replay missed events before resuming live stream

Frontend changes:

  • Track the last received event ID
  • On reconnect, send it as Last-Event-ID header
  • Process replayed events through the existing handler

3. Periodic queue heartbeat while running (incremental)

Reduce the 15-second periodic sync interval to 5 seconds while the agent is actively running and the queue is non-empty. This narrows the window for stale state even without tab focus changes.

const syncInterval = queueLen > 0 ? 5_000 : 15_000;

4. iOS-specific: Background task for SSE keepalive

Use BGAppRefreshTask or URLSessionConfiguration.background to maintain the SSE connection while the app is in the background. This allows iOS to deliver events even when the app is suspended, eliminating the gap entirely.

Affected files

  • dashboard/src/app/control/control-client.tsx — visibility handler, queue sync, SSE processing
  • dashboard/src/lib/api.ts — SSE streaming, reconnection logic
  • ios_dashboard/SandboxedDashboard/Views/Control/ControlView.swift — scenePhase handler, stream management
  • ios_dashboard/SandboxedDashboard/Services/APIService.swift — SSE connection lifecycle
  • src/api/control.rs — SSE event emission (if implementing sequence tracking)

Recommendation

Fix #1 (optimistic invalidation) is the lowest-effort change that eliminates the user-visible symptom. Fix #2 (SSE sequence tracking) is the principled long-term solution that also fixes other missed-event scenarios (network blips, mobile network switches, etc.). They can be done incrementally — #1 first for immediate relief, #2 as follow-up infrastructure.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions