Skip to content

feature: make router replay, response history, vector-store metadata, and startup status restart-safe #1608

@Xunzhuo

Description

@Xunzhuo

Describe the feature

Make the router-side state surfaces that already look product-facing restart-safe and explicitly durable where appropriate, instead of leaving them on process memory or temp-file conventions.

Primary layer

global level

Why this layer?

This spans router defaults, extproc runtime storage, vector-store metadata, file registries, startup status, and dashboard-facing operator APIs. The problem is intentionally cross-cutting rather than belonging to one signal, plugin, or single package.

Why do you need this feature?

The current router runtime still mixes durable-looking features with process-local ownership:

  • global.services.response_api.store_backend defaults to memory
  • global.services.router_replay.store_backend defaults to memory
  • src/semantic-router/pkg/vectorstore/manager.go keeps vector-store metadata in an in-memory registry even when backend collections persist elsewhere
  • src/semantic-router/pkg/vectorstore/filestore.go keeps file metadata in memory while file bytes live on disk
  • src/semantic-router/pkg/startupstatus/status.go falls back to temp-owned JSON for runtime readiness and model download progress

That means restart behavior is inconsistent today: a router can retain some bytes or backend collections but still lose the metadata and control-plane truth required to recover cleanly.

Additional context

Child of #1606.

Repository evidence:

  • docs/agent/tech-debt/td-034-runtime-and-dashboard-state-durability-and-telemetry-contract.md
  • docs/agent/state-taxonomy-and-inventory.md
  • src/semantic-router/pkg/responsestore/{factory.go,memory_store.go}
  • src/semantic-router/pkg/routerreplay/store/{factory.go,memory.go}
  • src/semantic-router/pkg/extproc/router_storage.go
  • src/semantic-router/pkg/extproc/router_replay_setup.go
  • src/semantic-router/pkg/vectorstore/{manager.go,filestore.go}
  • src/semantic-router/pkg/startupstatus/status.go

Related issues to keep aligned, not duplicated:

Suggested acceptance:

  • define the intended durability contract for Response API storage, router replay, vector-store metadata, file metadata, and startup status
  • make router-visible metadata restart-safe without requiring process memory to remain intact
  • keep cache-like surfaces as caches; do not force semantic cache or RAG cache into relational storage when a shared cache is the correct abstraction
  • expose typed startup and recovery status through a documented persistence seam rather than temp-file-only behavior
  • add at least one restart-recovery test that proves router-side metadata survives a process restart

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    Status

    Backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions