Skip to content

docs(roadmap): core-api rename, SlimeRunner plans committed; slime data-contract done#59

Open
lliquid wants to merge 5 commits into
mainfrom
docs/core-api-rename-roadmap
Open

docs(roadmap): core-api rename, SlimeRunner plans committed; slime data-contract done#59
lliquid wants to merge 5 commits into
mainfrom
docs/core-api-rename-roadmap

Conversation

@lliquid

@lliquid lliquid commented May 4, 2026

Copy link
Copy Markdown
Contributor

Summary

Tracks design plans in `docs/roadmap/`, split into `committed/` (approved, not yet implemented) and `done/` (shipped).

What's tracked:

  • `docs/roadmap/committed/core-api-rename.md` — approved plan for the 0.2.0 rename (`app.py` / `client.py` / `reward_function.py` → `rollout_executor.py` / `rollout_launcher.py` / `reward.py`; `RolloutClient` → `RolloutLauncher`; `payload["_rollout"]` → `payload["rollout_config"]`; no shims, single PR when implemented).
  • `docs/roadmap/committed/slime-runner-entrypoint.md` — approved plan to wrap `train.sh` + `config.yaml` behind a single `SlimeRunner` Python class as the primary entry point; `train.sh` stays as the low-level escape hatch.
  • `docs/roadmap/done/slime-data-contract.md` — shipped in feat(slime): support arbitrary agent payload shapes in the training backend #61. Makes the JSONL row's `metadata` dict the agent payload verbatim, so non-GSM8K agents (migration / AppWorld / OfficeBench) train without slime-integration changes. Appendix A documents how slime's `group_index` is assigned, for future readers.
  • `.gitignore` updates so drafts (`docs/roadmap/draft/`), PR-body scratchpads (`docs/pr/`), and personal tooling (`docs/superpowers/`) stay untracked.

Not tracked: roadmap drafts stay local until they're promoted to `committed/`; `done/` plans are committed after their implementation PR merges.

Plan scope at a glance

  • core-api-rename: one PR, one version bump (0.2.0). No deprecation scaffolding — at this adoption stage the carrying cost outweighs the migration burden. Three files rename, four classes rename, one payload key flips. Backend impact: slime (in-tree, bundled), rllm (out-of-tree, tracking issue only), verl (not integrated yet).
  • slime-runner-entrypoint: additive convenience layer. `SlimeRunner(...).train()` replaces the train.sh + config.yaml + env-var combo. `train.sh` stays untouched.
  • slime-data-contract (done): implemented in feat(slime): support arbitrary agent payload shapes in the training backend #61. End-to-end smoke test verified reward climbs 0.27 → 0.63 over 10 rollouts on Qwen2.5-3B + GSM8K.

Test plan

  • This is a roadmap/documentation PR — no code changes, nothing to functionally test.
  • Reviewers confirm each plan reads clearly and the scope is correct.


| Today | Renamed |
|---|---|
| `app.py` | `rollout_server.py` |

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is better to rename app.py to rollout_executor.py? Conceptually AgentCoreRLApp class does not start a server that handles multiple rollout requests, but packs agent codes into an app that will be executed in every ACR session.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good idea.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wait agentcore rl app does start a server?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@luyuzhe111 I think if naming to rollout_server, people may easily misunderstand it works like an agent server --- handle input requests and return full agent rollout traces, but actually it does not, rollout traces are collected from model gateway.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense. so the app.py naming comes from here, which holds the server app agentcore rl app inherits.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lliquid so for me personally i would just keep it as is due to 1) similar bedrock sdk convention, and 2) app.py is not really public-facing as users import the AgentCoreRLApp directly, not via an app module.

@lyzustc lyzustc left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel renaming app.py to rollout_server.py is not appropriate, other things look good to me.

@lliquid lliquid changed the title docs(roadmap): add committed core-api-rename plan docs(roadmap): add committed core-API rename, slime data-contract, and SlimeRunner plans May 5, 2026
@lliquid lliquid added the roadmap Design plans tracked in docs/roadmap/committed/ label May 5, 2026
@lliquid lliquid changed the title docs(roadmap): add committed core-API rename, slime data-contract, and SlimeRunner plans docs(roadmap): core-api contract rename, slime data contract refactoring, add SlimeRunner class May 5, 2026
@lliquid

lliquid commented May 5, 2026

Copy link
Copy Markdown
Contributor Author

I feel renaming app.py to rollout_server.py is not appropriate, other things look good to me.

revised the PR , pls check

@lliquid lliquid changed the title docs(roadmap): core-api contract rename, slime data contract refactoring, add SlimeRunner class docs(roadmap): core-api rename, SlimeRunner plans committed; slime data-contract done May 5, 2026
lliquid and others added 5 commits May 6, 2026 00:11
Starts tracking docs/roadmap/committed/ for approved design plans.
Drafts (docs/roadmap/draft/), PR-body scratchpads (docs/pr/), and
personal-tooling output (docs/superpowers/) stay untracked by
.gitignore so ad-hoc thinking doesn't accidentally land in the repo.

The core-api-rename plan covers the 0.2.0 breaking rename:
app.py/client.py/reward_function.py -> rollout_server.py/
rollout_launcher.py/reward.py, RolloutClient -> RolloutLauncher (plus
BatchResult/BatchItem renames), and the payload["_rollout"] ->
payload["rollout_config"] flip. No shims — single PR when implemented.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Commits two approved plans to docs/roadmap/committed/:
- slime-data-contract.md: make the JSONL row's metadata dict the agent
  payload verbatim, so non-GSM8K agents train without slime-integration
  changes. Includes Appendix A on slime's group_index assignment.
- slime-runner-entrypoint.md: wrap train.sh + config.yaml behind a
  single SlimeRunner Python class as the primary entry point; train.sh
  stays as the escape hatch.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…e-api-rename plan

AgentCoreRLApp is not a multi-request server — the ACR runtime loads
the app and executes it once per session to produce a single rollout.
"rollout_executor.py" captures that semantics better than
"rollout_server.py", which implied a long-lived request handler.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…a-contract PR

Unit tests pin the _sample_to_payload contract in isolation, but an
integration bug would still hit the first user. Add a required pre-PR
smoke-test gate to the slime-data-contract plan: regenerate the JSONL,
run train.sh with NUM_ROLLOUT=10 on Qwen2.5-3B-Instruct, confirm
rollout metrics and training steps, paste evidence in the PR body.

Also adds a unit-test item for the shallow-copy invariant (mutations
to the returned payload must not leak into Sample.metadata).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Implemented in #61. Moves the plan out of committed/ (in-flight) into
done/ (shipped), alongside the other completed plans.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@lliquid lliquid force-pushed the docs/core-api-rename-roadmap branch from fca42d8 to 6cd781b Compare May 6, 2026 00:11
lliquid pushed a commit to lliquid/agentcore-rl-toolkit that referenced this pull request May 6, 2026
Collapses _sample_to_payload to return a shallow copy of
Sample.metadata. Previously it synthesized a hybrid payload shape
(sample.prompt -> payload["prompt"], sample.label -> payload["answer"],
sample.metadata nested under payload["metadata"], plus a fall-through
copy of Sample fields), which locked the slime backend into the math
agent's shape and forced other agents (appworld, migration,
officebench) into workarounds.

After this change, the JSONL row's metadata dict is the agent payload
exactly, so each agent declares whatever payload shape it wants by
choosing what keys to put in metadata. The JSONL top-level prompt
field still drives slime's tokenizer and length filter.

Breaking change for existing math JSONLs: rows using {prompt, label}
now produce an empty payload. Regenerate with the updated SETUP.md
data-prep snippet which emits {prompt, metadata: {prompt, answer}}.

Also drops --label-key from train.sh (nothing reads sample.label
under the new rule).

Verified end-to-end on Qwen2.5-3B-Instruct + GSM8K with NUM_ROLLOUT=10:
raw_reward climbed 0.27 -> 0.63, train/loss and grad_norm move as
expected, no rollout failures.

Plan: docs/roadmap/committed/slime-data-contract.md (committed on
docs/core-api-rename-roadmap in PR awslabs#59).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

roadmap Design plans tracked in docs/roadmap/committed/

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants