Skip to content

feature: make dashboard workflows and OpenClaw state restart-safe and server-owned #1609

@Xunzhuo

Description

@Xunzhuo

Describe the feature

Make dashboard workflow state restart-safe and server-owned for ML pipeline, model research, and OpenClaw collaboration or control surfaces, instead of relying on in-memory maps, workspace-local JSON, or browser-only state.

Primary layer

global level

Why this layer?

This crosses dashboard backend persistence, SSE and WebSocket delivery, browser assumptions, local workspace mounts, and operator-facing recovery behavior. It is a control-plane state problem, not one isolated UI or backend handler change.

Why do you need this feature?

Several dashboard-native workflows already look like product surfaces, but their state ownership is still fragmented:

  • dashboard/backend/mlpipeline/runner.go keeps jobs in an in-memory map and pushes progress through an in-memory channel
  • dashboard/backend/modelresearch/manager.go persists JSON snapshots, but still keeps active campaign truth in memory and marks running work as failed after a dashboard restart
  • dashboard/backend/handlers/openclaw.go and openclaw_rooms.go keep registry, team, worker, room, and message state in workspace-local JSON files
  • OpenClaw room message appends rewrite whole JSON files, which will not scale to larger rooms or longer histories
  • the frontend still keeps some chat or auth state in localStorage

As a result, live connection state, durable workflow state, and browser convenience state are not cleanly separated today.

Additional context

Child of #1606.

Repository evidence:

  • docs/agent/tech-debt/td-034-runtime-and-dashboard-state-durability-and-telemetry-contract.md
  • docs/agent/state-taxonomy-and-inventory.md
  • dashboard/backend/{mlpipeline/runner.go,modelresearch/manager.go,auth/store.go}
  • dashboard/backend/handlers/{mlpipeline.go,evaluation.go,modelresearch.go,openclaw.go,openclaw_rooms.go}
  • dashboard/frontend/src/{hooks/useConversationStorage.ts,utils/authFetch.ts}

Related issues to coordinate with, not replace:

Suggested acceptance:

  • keep SSE and WebSocket client registries in memory only, but move workflow jobs, typed progress, terminal state, and collaboration entities into server-owned durable records
  • make ML pipeline and model research progress reconstructable after restart
  • move OpenClaw teams, workers, rooms, and room messages off ad hoc JSON files into a persistence seam that can support larger histories and multiple operators
  • decide explicitly which browser chat or auth surfaces remain demo-only or ephemeral and which become supported server-owned state
  • add at least one restart-aware dashboard workflow test and one typed progress/health contract that does not depend on log scraping

Metadata

Metadata

Assignees

Type

No type

Projects

Status

Backlog

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions