Skip to content

Design doc: EpisodeResult ↔ trajectory format seam #242

@larstalian

Description

@larstalian

The design-doc-first half of #198, split out so it's independently
claimable. Specify how an OpenRange episode maps to an
open-trajectory-gym trajectory. No implementation — just the seam shape.

What OpenRange emits

From src/openrange/core/episode.py:

  • per step: Observation (visible_state, events) + the harness-supplied
    AgentTurn (message, tool_calls, tool_results)
  • terminal: EpisodeReport.episode_result: EpisodeResult
    {success, subgoals, reason}, structured and never a scalar
    (openrange-pack-sdk _types.py)

To specify

  1. Observation encoding — what an observation is when the surface is
    HTTP routes / MCP tools / files (map visible_state + events).
  2. Action encodingAgentTurn.tool_calls → trajectory action.
  3. Reward — defer to the RewardAdapter from Add per-domain RewardAdapter #199 (EpisodeResult
    scalar / vector); do not bake a scalar into the trajectory.
  4. done / terminal — from the episode's terminal_reason.
  5. Runtime shape — fresh-world-per-rollout vs. pooled; curriculum via
    auto_evolve(...).

Acceptance

An ADR or DESIGN.md section answering (1)–(5), reviewed and merged.
Unblocks #198 (implementation); pairs with #199 (reward) and #200
(curriculum loop). Read open-trajectory-gym's trajectory format first.

Metadata

Metadata

Assignees

No one assigned

    Labels

    design-neededNeeds a design pass before coderoadmapTracked on the public roadmaptrainingTraining pipeline

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions