docs(roadmap): core-api rename, SlimeRunner plans committed; slime data-contract done by lliquid · Pull Request #59 · awslabs/agentcore-rl-toolkit

lliquid · 2026-05-04T19:02:21Z

Summary

Tracks design plans in `docs/roadmap/`, split into `committed/` (approved, not yet implemented) and `done/` (shipped).

What's tracked:

`docs/roadmap/committed/core-api-rename.md` — approved plan for the 0.2.0 rename (`app.py` / `client.py` / `reward_function.py` → `rollout_executor.py` / `rollout_launcher.py` / `reward.py`; `RolloutClient` → `RolloutLauncher`; `payload["_rollout"]` → `payload["rollout_config"]`; no shims, single PR when implemented).
`docs/roadmap/committed/slime-runner-entrypoint.md` — approved plan to wrap `train.sh` + `config.yaml` behind a single `SlimeRunner` Python class as the primary entry point; `train.sh` stays as the low-level escape hatch.
`docs/roadmap/done/slime-data-contract.md` — shipped in feat(slime): support arbitrary agent payload shapes in the training backend #61. Makes the JSONL row's `metadata` dict the agent payload verbatim, so non-GSM8K agents (migration / AppWorld / OfficeBench) train without slime-integration changes. Appendix A documents how slime's `group_index` is assigned, for future readers.
`.gitignore` updates so drafts (`docs/roadmap/draft/`), PR-body scratchpads (`docs/pr/`), and personal tooling (`docs/superpowers/`) stay untracked.

Not tracked: roadmap drafts stay local until they're promoted to `committed/`; `done/` plans are committed after their implementation PR merges.

Plan scope at a glance

core-api-rename: one PR, one version bump (0.2.0). No deprecation scaffolding — at this adoption stage the carrying cost outweighs the migration burden. Three files rename, four classes rename, one payload key flips. Backend impact: slime (in-tree, bundled), rllm (out-of-tree, tracking issue only), verl (not integrated yet).
slime-runner-entrypoint: additive convenience layer. `SlimeRunner(...).train()` replaces the train.sh + config.yaml + env-var combo. `train.sh` stays untouched.
slime-data-contract (done): implemented in feat(slime): support arbitrary agent payload shapes in the training backend #61. End-to-end smoke test verified reward climbs 0.27 → 0.63 over 10 rollouts on Qwen2.5-3B + GSM8K.

Test plan

This is a roadmap/documentation PR — no code changes, nothing to functionally test.
Reviewers confirm each plan reads clearly and the scope is correct.

lyzustc · 2026-05-04T23:41:44Z

+
+| Today | Renamed |
+|---|---|
+| `app.py` | `rollout_server.py` |


I think it is better to rename app.py to rollout_executor.py? Conceptually AgentCoreRLApp class does not start a server that handles multiple rollout requests, but packs agent codes into an app that will be executed in every ACR session.

wait agentcore rl app does start a server?

@luyuzhe111 I think if naming to rollout_server, people may easily misunderstand it works like an agent server --- handle input requests and return full agent rollout traces, but actually it does not, rollout traces are collected from model gateway.

makes sense. so the app.py naming comes from here, which holds the server app agentcore rl app inherits.

@lliquid so for me personally i would just keep it as is due to 1) similar bedrock sdk convention, and 2) app.py is not really public-facing as users import the AgentCoreRLApp directly, not via an app module.

lyzustc

I feel renaming app.py to rollout_server.py is not appropriate, other things look good to me.

lliquid · 2026-05-05T16:50:05Z

I feel renaming app.py to rollout_server.py is not appropriate, other things look good to me.

revised the PR , pls check

Starts tracking docs/roadmap/committed/ for approved design plans. Drafts (docs/roadmap/draft/), PR-body scratchpads (docs/pr/), and personal-tooling output (docs/superpowers/) stay untracked by .gitignore so ad-hoc thinking doesn't accidentally land in the repo. The core-api-rename plan covers the 0.2.0 breaking rename: app.py/client.py/reward_function.py -> rollout_server.py/ rollout_launcher.py/reward.py, RolloutClient -> RolloutLauncher (plus BatchResult/BatchItem renames), and the payload["_rollout"] -> payload["rollout_config"] flip. No shims — single PR when implemented. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Commits two approved plans to docs/roadmap/committed/: - slime-data-contract.md: make the JSONL row's metadata dict the agent payload verbatim, so non-GSM8K agents train without slime-integration changes. Includes Appendix A on slime's group_index assignment. - slime-runner-entrypoint.md: wrap train.sh + config.yaml behind a single SlimeRunner Python class as the primary entry point; train.sh stays as the escape hatch. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…e-api-rename plan AgentCoreRLApp is not a multi-request server — the ACR runtime loads the app and executes it once per session to produce a single rollout. "rollout_executor.py" captures that semantics better than "rollout_server.py", which implied a long-lived request handler. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…a-contract PR Unit tests pin the _sample_to_payload contract in isolation, but an integration bug would still hit the first user. Add a required pre-PR smoke-test gate to the slime-data-contract plan: regenerate the JSONL, run train.sh with NUM_ROLLOUT=10 on Qwen2.5-3B-Instruct, confirm rollout metrics and training steps, paste evidence in the PR body. Also adds a unit-test item for the shallow-copy invariant (mutations to the returned payload must not leak into Sample.metadata). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Implemented in #61. Moves the plan out of committed/ (in-flight) into done/ (shipped), alongside the other completed plans. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Collapses _sample_to_payload to return a shallow copy of Sample.metadata. Previously it synthesized a hybrid payload shape (sample.prompt -> payload["prompt"], sample.label -> payload["answer"], sample.metadata nested under payload["metadata"], plus a fall-through copy of Sample fields), which locked the slime backend into the math agent's shape and forced other agents (appworld, migration, officebench) into workarounds. After this change, the JSONL row's metadata dict is the agent payload exactly, so each agent declares whatever payload shape it wants by choosing what keys to put in metadata. The JSONL top-level prompt field still drives slime's tokenizer and length filter. Breaking change for existing math JSONLs: rows using {prompt, label} now produce an empty payload. Regenerate with the updated SETUP.md data-prep snippet which emits {prompt, metadata: {prompt, answer}}. Also drops --label-key from train.sh (nothing reads sample.label under the new rule). Verified end-to-end on Qwen2.5-3B-Instruct + GSM8K with NUM_ROLLOUT=10: raw_reward climbed 0.27 -> 0.63, train/loss and grad_norm move as expected, no rollout failures. Plan: docs/roadmap/committed/slime-data-contract.md (committed on docs/core-api-rename-roadmap in PR awslabs#59). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

lyzustc reviewed May 4, 2026

View reviewed changes

lliquid changed the title ~~docs(roadmap): add committed core-api-rename plan~~ docs(roadmap): add committed core-API rename, slime data-contract, and SlimeRunner plans May 5, 2026

lliquid added the roadmap Design plans tracked in docs/roadmap/committed/ label May 5, 2026

lliquid changed the title ~~docs(roadmap): add committed core-API rename, slime data-contract, and SlimeRunner plans~~ docs(roadmap): core-api contract rename, slime data contract refactoring, add SlimeRunner class May 5, 2026

lliquid mentioned this pull request May 5, 2026

feat(slime): support arbitrary agent payload shapes in the training backend #61

Merged

3 tasks

lliquid changed the title ~~docs(roadmap): core-api contract rename, slime data contract refactoring, add SlimeRunner class~~ docs(roadmap): core-api rename, SlimeRunner plans committed; slime data-contract done May 5, 2026

lliquid mentioned this pull request May 5, 2026

feat(slime): add SlimeRunner as the primary Python entry point #62

Merged

3 tasks

lliquid and others added 5 commits May 6, 2026 00:11

docs(roadmap): move slime-data-contract plan to done/

6cd781b

Implemented in #61. Moves the plan out of committed/ (in-flight) into done/ (shipped), alongside the other completed plans. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

lliquid force-pushed the docs/core-api-rename-roadmap branch from fca42d8 to 6cd781b Compare May 6, 2026 00:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

docs(roadmap): core-api rename, SlimeRunner plans committed; slime data-contract done#59

docs(roadmap): core-api rename, SlimeRunner plans committed; slime data-contract done#59
lliquid wants to merge 5 commits into
mainfrom
docs/core-api-rename-roadmap

lliquid commented May 4, 2026 •

edited

Loading

Uh oh!

lyzustc May 4, 2026

Uh oh!

lliquid May 5, 2026

Uh oh!

luyuzhe111 May 8, 2026

Uh oh!

lyzustc May 8, 2026

Uh oh!

luyuzhe111 May 8, 2026

Uh oh!

luyuzhe111 May 8, 2026

Uh oh!

lyzustc left a comment

Uh oh!

lliquid commented May 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

lliquid commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Plan scope at a glance

Test plan

Uh oh!

lyzustc May 4, 2026

Choose a reason for hiding this comment

Uh oh!

lliquid May 5, 2026

Choose a reason for hiding this comment

Uh oh!

luyuzhe111 May 8, 2026

Choose a reason for hiding this comment

Uh oh!

lyzustc May 8, 2026

Choose a reason for hiding this comment

Uh oh!

luyuzhe111 May 8, 2026

Choose a reason for hiding this comment

Uh oh!

luyuzhe111 May 8, 2026

Choose a reason for hiding this comment

Uh oh!

lyzustc left a comment

Choose a reason for hiding this comment

Uh oh!

lliquid commented May 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

lliquid commented May 4, 2026 •

edited

Loading