feat(slime): add SlimeRunner as the primary Python entry point#62
Merged
Conversation
Wraps train.sh + config.yaml + env vars behind a single Python class. Users instantiate SlimeRunner with a handful of per-experiment fields (exp_id, agent_runtime_arn, s3_bucket, model_dir, data_path, model_type) and call .train(num_rollout=N); everything else has a sensible default and an extra_flags escape hatch for slime / Megatron / SGLang CLI passthrough. Internally, .train() shells out to reproduce train.sh step-by-step: stop stale sglang/ray, ray start --head, source the slime model script, submit the slime training job via ray job submit. The toolkit config (agent_runtime_arn, s3_bucket, exp_id, etc.) is written to a temp YAML and passed via --custom-config-path so SlimeArtConfig.from_args continues to read it. train.sh stays in the repo as the low-level escape hatch and as the reference for what the class replicates. SETUP.md 3.4 now leads with the Python recipe; the bash recipe is kept as "advanced / debugging." Verified end-to-end on Qwen2.5-3B-Instruct + GSM8K with .train(num_rollout=10): reward climbs 0.25 -> 0.68 over 10 rollouts, all 10 train steps healthy, no failures. Plan: docs/roadmap/committed/slime-runner-entrypoint.md. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Both tests exercise the same method on the same code path; the only difference was whether WANDB_* env vars were set. Merged into one test covering both the always-present keys and the opt-in wandb forwarding. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
luyuzhe111
approved these changes
May 6, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Goal
Replace the `train.sh` + `config.yaml` + env-vars combo with a single Python class. Users write:
```python
from agentcore_rl_toolkit.backends.slime import SlimeRunner
SlimeRunner(
exp_id="gsm8k-3b-smoke",
agent_runtime_arn="arn:aws:bedrock-agentcore:...",
s3_bucket="my-bucket",
model_dir="/path/to/Qwen2.5-3B-Instruct",
data_path="/path/to/gsm8k_tiny.jsonl",
model_type="qwen2.5-3B",
).train(num_rollout=10)
```
…instead of editing 121 lines of bash + 9 lines of YAML + exporting 4 env vars.
Design
`SlimeRunner` (`src/agentcore_rl_toolkit/backends/slime/runner.py`) — a dataclass with six required per-experiment fields (`exp_id`, `agent_runtime_arn`, `s3_bucket`, `model_dir`, `data_path`, `model_type`) and ~20 optional kwargs covering cluster, ACR/toolkit, hyperparameters, and wandb. An `extra_flags: list[str]` escape hatch passes any slime / Megatron-LM / SGLang CLI flag through verbatim.
`.train(num_rollout=1)` shells out to reproduce `train.sh` step-by-step:
`init.py` now re-exports `SlimeRunner` as the only public symbol. `integration/` and `patches/` stay where they are — `integration.rollout.generate_rollout` is still passed to slime as a dotted-path string.
`train.sh` stays in the repo, unchanged. It's the low-level escape hatch for users who need to debug the raw slime CLI, and serves as the reference for what the Python class replicates.
Plan: `docs/roadmap/committed/slime-runner-entrypoint.md` (on the PR #59 branch).
End-to-end smoke test
Ran `python train.py` (which calls `SlimeRunner(...).train(num_rollout=10)`) on Qwen2.5-3B-Instruct + 64-row GSM8K, 8 × H100, `slimerl/slime:latest` container:
Reward climbs 0.25 → 0.68 — matches PR #61's `train.sh` smoke (0.27 → 0.63), confirming the Python entry point produces the same training dynamics as bash. All 10 train steps logged `train/loss`, `train/grad_norm`, `train/step` and progressed monotonically. No session failures; only Ray atexit teardown tracebacks (cosmetic).
Test plan
🤖 Generated with Claude Code