feat(slime): add SlimeRunner as the primary Python entry point by lliquid · Pull Request #62 · awslabs/agentcore-rl-toolkit

lliquid · 2026-05-05T23:47:07Z

Goal

Replace the `train.sh` + `config.yaml` + env-vars combo with a single Python class. Users write:

```python
from agentcore_rl_toolkit.backends.slime import SlimeRunner

SlimeRunner(
exp_id="gsm8k-3b-smoke",
agent_runtime_arn="arn:aws:bedrock-agentcore:...",
s3_bucket="my-bucket",
model_dir="/path/to/Qwen2.5-3B-Instruct",
data_path="/path/to/gsm8k_tiny.jsonl",
model_type="qwen2.5-3B",
).train(num_rollout=10)
```

…instead of editing 121 lines of bash + 9 lines of YAML + exporting 4 env vars.

Design

`SlimeRunner` (`src/agentcore_rl_toolkit/backends/slime/runner.py`) — a dataclass with six required per-experiment fields (`exp_id`, `agent_runtime_arn`, `s3_bucket`, `model_dir`, `data_path`, `model_type`) and ~20 optional kwargs covering cluster, ACR/toolkit, hyperparameters, and wandb. An `extra_flags: list[str]` escape hatch passes any slime / Megatron-LM / SGLang CLI flag through verbatim.

`.train(num_rollout=1)` shells out to reproduce `train.sh` step-by-step:

`pkill -9 sglang` + `ray stop --force` to clear stale processes
`ray start --head`
Source `${slime_dir}/scripts/models/${model_type}.sh` to get `MODEL_ARGS`
Write toolkit config (ARN, bucket, exp_id, etc.) to a temp YAML
`ray job submit -- python3 ${slime_dir}/train.py ` with `--custom-config-path` pointing at the temp YAML

`init.py` now re-exports `SlimeRunner` as the only public symbol. `integration/` and `patches/` stay where they are — `integration.rollout.generate_rollout` is still passed to slime as a dotted-path string.

`train.sh` stays in the repo, unchanged. It's the low-level escape hatch for users who need to debug the raw slime CLI, and serves as the reference for what the Python class replicates.

Plan: `docs/roadmap/committed/slime-runner-entrypoint.md` (on the PR #59 branch).

End-to-end smoke test

Ran `python train.py` (which calls `SlimeRunner(...).train(num_rollout=10)`) on Qwen2.5-3B-Instruct + 64-row GSM8K, 8 × H100, `slimerl/slime:latest` container:

Rollout	raw_reward
0	0.252
1	0.355
2	0.344
3	0.488
4	0.377
5	0.594
6	0.533
7	0.676
8	0.574
9	0.675

Reward climbs 0.25 → 0.68 — matches PR #61's `train.sh` smoke (0.27 → 0.63), confirming the Python entry point produces the same training dynamics as bash. All 10 train steps logged `train/loss`, `train/grad_norm`, `train/step` and progressed monotonically. No session failures; only Ray atexit teardown tracebacks (cosmetic).

Test plan

Unit tests (`tests/test_slime_runner.py`, 6 tests) cover:
- `_build_runtime_env` required keys (PYTHONPATH, CUDA_DEVICE_MAX_CONNECTIONS)
- Wandb env forwarding (opt-in only when key is set)
- `from_yaml` round-trip
- Key kwargs reach the slime CLI (`--num-rollout`, `--lr`, `--tensor-model-parallel-size`, `--rollout-batch-size`, plus load-bearing integration dotted paths)
- `extra_flags` escape hatch appends verbatim
- Toolkit config YAML carries ACR pointers
Full test suite passes (81 tests).
End-to-end smoke (10-rollout training on Qwen2.5-3B, rewards climb, loss moves).

🤖 Generated with Claude Code

Wraps train.sh + config.yaml + env vars behind a single Python class. Users instantiate SlimeRunner with a handful of per-experiment fields (exp_id, agent_runtime_arn, s3_bucket, model_dir, data_path, model_type) and call .train(num_rollout=N); everything else has a sensible default and an extra_flags escape hatch for slime / Megatron / SGLang CLI passthrough. Internally, .train() shells out to reproduce train.sh step-by-step: stop stale sglang/ray, ray start --head, source the slime model script, submit the slime training job via ray job submit. The toolkit config (agent_runtime_arn, s3_bucket, exp_id, etc.) is written to a temp YAML and passed via --custom-config-path so SlimeArtConfig.from_args continues to read it. train.sh stays in the repo as the low-level escape hatch and as the reference for what the class replicates. SETUP.md 3.4 now leads with the Python recipe; the bash recipe is kept as "advanced / debugging." Verified end-to-end on Qwen2.5-3B-Instruct + GSM8K with .train(num_rollout=10): reward climbs 0.25 -> 0.68 over 10 rollouts, all 10 train steps healthy, no failures. Plan: docs/roadmap/committed/slime-runner-entrypoint.md. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Both tests exercise the same method on the same code path; the only difference was whether WANDB_* env vars were set. Merged into one test covering both the always-present keys and the opt-in wandb forwarding. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

lliquid force-pushed the feat/slime-runner branch from 1b4c428 to efcad5d Compare May 5, 2026 23:48

lliquid self-assigned this May 5, 2026

lliquid and others added 2 commits May 6, 2026 00:10

lliquid force-pushed the feat/slime-runner branch from 722c63e to 76a690a Compare May 6, 2026 00:10

luyuzhe111 approved these changes May 6, 2026

View reviewed changes

lliquid merged commit e952dae into main May 6, 2026
5 checks passed

lliquid deleted the feat/slime-runner branch May 19, 2026 00:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(slime): add SlimeRunner as the primary Python entry point#62

feat(slime): add SlimeRunner as the primary Python entry point#62
lliquid merged 2 commits into
mainfrom
feat/slime-runner

lliquid commented May 5, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

lliquid commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Goal

Design

End-to-end smoke test

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

lliquid commented May 5, 2026 •

edited

Loading