docs(roadmap): core-api rename, SlimeRunner plans committed; slime data-contract done#59
docs(roadmap): core-api rename, SlimeRunner plans committed; slime data-contract done#59lliquid wants to merge 5 commits into
Conversation
|
|
||
| | Today | Renamed | | ||
| |---|---| | ||
| | `app.py` | `rollout_server.py` | |
There was a problem hiding this comment.
I think it is better to rename app.py to rollout_executor.py? Conceptually AgentCoreRLApp class does not start a server that handles multiple rollout requests, but packs agent codes into an app that will be executed in every ACR session.
There was a problem hiding this comment.
wait agentcore rl app does start a server?
There was a problem hiding this comment.
@luyuzhe111 I think if naming to rollout_server, people may easily misunderstand it works like an agent server --- handle input requests and return full agent rollout traces, but actually it does not, rollout traces are collected from model gateway.
There was a problem hiding this comment.
makes sense. so the app.py naming comes from here, which holds the server app agentcore rl app inherits.
There was a problem hiding this comment.
@lliquid so for me personally i would just keep it as is due to 1) similar bedrock sdk convention, and 2) app.py is not really public-facing as users import the AgentCoreRLApp directly, not via an app module.
lyzustc
left a comment
There was a problem hiding this comment.
I feel renaming app.py to rollout_server.py is not appropriate, other things look good to me.
revised the PR , pls check |
Starts tracking docs/roadmap/committed/ for approved design plans. Drafts (docs/roadmap/draft/), PR-body scratchpads (docs/pr/), and personal-tooling output (docs/superpowers/) stay untracked by .gitignore so ad-hoc thinking doesn't accidentally land in the repo. The core-api-rename plan covers the 0.2.0 breaking rename: app.py/client.py/reward_function.py -> rollout_server.py/ rollout_launcher.py/reward.py, RolloutClient -> RolloutLauncher (plus BatchResult/BatchItem renames), and the payload["_rollout"] -> payload["rollout_config"] flip. No shims — single PR when implemented. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Commits two approved plans to docs/roadmap/committed/: - slime-data-contract.md: make the JSONL row's metadata dict the agent payload verbatim, so non-GSM8K agents train without slime-integration changes. Includes Appendix A on slime's group_index assignment. - slime-runner-entrypoint.md: wrap train.sh + config.yaml behind a single SlimeRunner Python class as the primary entry point; train.sh stays as the escape hatch. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…e-api-rename plan AgentCoreRLApp is not a multi-request server — the ACR runtime loads the app and executes it once per session to produce a single rollout. "rollout_executor.py" captures that semantics better than "rollout_server.py", which implied a long-lived request handler. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…a-contract PR Unit tests pin the _sample_to_payload contract in isolation, but an integration bug would still hit the first user. Add a required pre-PR smoke-test gate to the slime-data-contract plan: regenerate the JSONL, run train.sh with NUM_ROLLOUT=10 on Qwen2.5-3B-Instruct, confirm rollout metrics and training steps, paste evidence in the PR body. Also adds a unit-test item for the shallow-copy invariant (mutations to the returned payload must not leak into Sample.metadata). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Implemented in #61. Moves the plan out of committed/ (in-flight) into done/ (shipped), alongside the other completed plans. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
fca42d8 to
6cd781b
Compare
Collapses _sample_to_payload to return a shallow copy of
Sample.metadata. Previously it synthesized a hybrid payload shape
(sample.prompt -> payload["prompt"], sample.label -> payload["answer"],
sample.metadata nested under payload["metadata"], plus a fall-through
copy of Sample fields), which locked the slime backend into the math
agent's shape and forced other agents (appworld, migration,
officebench) into workarounds.
After this change, the JSONL row's metadata dict is the agent payload
exactly, so each agent declares whatever payload shape it wants by
choosing what keys to put in metadata. The JSONL top-level prompt
field still drives slime's tokenizer and length filter.
Breaking change for existing math JSONLs: rows using {prompt, label}
now produce an empty payload. Regenerate with the updated SETUP.md
data-prep snippet which emits {prompt, metadata: {prompt, answer}}.
Also drops --label-key from train.sh (nothing reads sample.label
under the new rule).
Verified end-to-end on Qwen2.5-3B-Instruct + GSM8K with NUM_ROLLOUT=10:
raw_reward climbed 0.27 -> 0.63, train/loss and grad_norm move as
expected, no rollout failures.
Plan: docs/roadmap/committed/slime-data-contract.md (committed on
docs/core-api-rename-roadmap in PR awslabs#59).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Summary
Tracks design plans in `docs/roadmap/`, split into `committed/` (approved, not yet implemented) and `done/` (shipped).
What's tracked:
Not tracked: roadmap drafts stay local until they're promoted to `committed/`; `done/` plans are committed after their implementation PR merges.
Plan scope at a glance
Test plan