Skip to content

feat(extension): add get_status request to hub WS protocol#439

Open
ykswang wants to merge 2 commits intoalibaba:mainfrom
ykswang:feat/hub-heartbeat-while-busy
Open

feat(extension): add get_status request to hub WS protocol#439
ykswang wants to merge 2 commits intoalibaba:mainfrom
ykswang:feat/hub-heartbeat-while-busy

Conversation

@ykswang
Copy link
Copy Markdown

@ykswang ykswang commented Apr 12, 2026

Updated per maintainer feedback. Original heartbeat-push proposal is preserved as commit 1; commit 2 pivots to a get_status request/response design. The discussion thread tracks the rationale.

Motivation

The hub-WS protocol currently has no way for a caller to confirm the hub's view of "is a task still running". A caller can be stuck waiting for result while the hub already considers itself idle (e.g. the result message was lost, or local state on the caller drifted). There is no protocol affordance to verify.

Recommended caller layering (from JSDoc)

Layer Purpose Mechanism Cost
1 Dead-connection detection (frozen tab, closed window, blocked main thread) WS-level ws.ping() — browsers auto-respond at protocol layer zero protocol changes; works today
2 Application-level state-drift check get_status (this PR) — sent on demand, not periodically one round-trip when the caller is in doubt

Layer 1 already works for any caller — the WS server can call ws.ping() and rely on the browser's automatic pong. So this PR only adds layer 2, and only the bits the protocol needs.

Protocol additions

Inbound:  { type: "get_status" }
Outbound: { type: "status", busy: boolean }

busy reflects the hub's #busy flag — true while a task is in flight, false otherwise.

Compatibility

Purely additive. Existing callers (e.g. packages/mcp/src/hub-bridge.js) silently ignore unknown message types and are unaffected.

Commit history

  1. feat(extension): emit periodic heartbeat from hub while task is running — initial heartbeat-push proposal
  2. refactor(extension): replace heartbeat push with get_status request — pivot to pull-based per discussion below

Test plan

  • npm run typecheck passes
  • npx eslint clean
  • Manual: send {type:'get_status'} while hub idle → receive {type:'status', busy:false}
  • Manual: send {type:'get_status'} mid-task → receive {type:'status', busy:true}

The hub-WS protocol previously offered no signal between `execute` and
the final `result`/`error`, forcing callers to choose between a long
wall-clock timeout (false positives on slow tasks) and a short one
(false negatives when the hub is actually alive and progressing).

Send `{ type: "heartbeat", at: number }` every 5s while a task is
running so callers can reset an idle-based deadline on each tick and
detect a dead hub (throttled tab, blocked main thread) without
guessing a total-duration budget up front.

Backward-compatible: existing callers (e.g. packages/mcp/hub-bridge.js)
silently ignore unknown message types.
@gaomeng1900
Copy link
Copy Markdown
Collaborator

Thanks for your contribution!

  • Is it possible for the client to call get_status to achieve the same result?
  • The semantics of a "heartbeat" can be confused with the ping/pong heartbeat in the WebSocket protocol that keeps the connection alive.

Per maintainer feedback on PR alibaba#439:

1. The "heartbeat" name overlaps with WS-level ping/pong and is
   confusing — the protocol's own framing already provides
   connection-level liveness.
2. A `get_status` request/response can let callers achieve the same
   "is the hub still working" check without the hub pushing periodic
   messages every task.

This drops the periodic heartbeat in favor of a pull model:

  Inbound:  { type: "get_status" }
  Outbound: { type: "status", busy: boolean }

Recommended caller usage (kept in JSDoc):

- WS-level ping/pong (browsers auto-respond at the protocol layer) is
  the primary dead-connection check. No protocol changes needed; the
  WS server can just call `ws.ping()`.
- `get_status` is for application-level state-drift checks — e.g. when
  a `result` was lost and both sides disagree on whether a task is
  still running. Callers send it on demand, not periodically.

Net protocol change vs main: one inbound type, one outbound type.
Existing callers that ignore unknown types are unaffected.
@ykswang ykswang changed the title feat(extension): emit periodic heartbeat from hub while task is running feat(extension): add get_status request to hub WS protocol Apr 14, 2026
@ykswang
Copy link
Copy Markdown
Author

ykswang commented Apr 14, 2026

Pivoted in e063960 — heartbeat is gone, replaced with {type:'get_status'} / {type:'status', busy: boolean}. Both points addressed: the naming overlap with WS-level ping/pong is gone, and the hub no longer pushes anything periodically.

The JSDoc now spells out the recommended layering for callers:

  • Dead connection → WS-level ping/pong. Browsers auto-respond at the protocol layer, so the WS server can just call ws.ping() — no protocol change needed and it already works today.
  • State driftget_status. Sent on demand only when the caller has a reason to doubt (e.g. ping/pong is fine but no result after a long time), so the cost on the hub is one round-trip when asked, not a 5-second push.

Title and description updated; both commits kept so this thread reads in order.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants