Skip to content

feat(slime): add SlimeRunner as the primary Python entry point#62

Merged
lliquid merged 2 commits into
mainfrom
feat/slime-runner
May 6, 2026
Merged

feat(slime): add SlimeRunner as the primary Python entry point#62
lliquid merged 2 commits into
mainfrom
feat/slime-runner

Conversation

@lliquid

@lliquid lliquid commented May 5, 2026

Copy link
Copy Markdown
Contributor

Goal

Replace the `train.sh` + `config.yaml` + env-vars combo with a single Python class. Users write:

```python
from agentcore_rl_toolkit.backends.slime import SlimeRunner

SlimeRunner(
exp_id="gsm8k-3b-smoke",
agent_runtime_arn="arn:aws:bedrock-agentcore:...",
s3_bucket="my-bucket",
model_dir="/path/to/Qwen2.5-3B-Instruct",
data_path="/path/to/gsm8k_tiny.jsonl",
model_type="qwen2.5-3B",
).train(num_rollout=10)
```

…instead of editing 121 lines of bash + 9 lines of YAML + exporting 4 env vars.

Design

`SlimeRunner` (`src/agentcore_rl_toolkit/backends/slime/runner.py`) — a dataclass with six required per-experiment fields (`exp_id`, `agent_runtime_arn`, `s3_bucket`, `model_dir`, `data_path`, `model_type`) and ~20 optional kwargs covering cluster, ACR/toolkit, hyperparameters, and wandb. An `extra_flags: list[str]` escape hatch passes any slime / Megatron-LM / SGLang CLI flag through verbatim.

`.train(num_rollout=1)` shells out to reproduce `train.sh` step-by-step:

  1. `pkill -9 sglang` + `ray stop --force` to clear stale processes
  2. `ray start --head`
  3. Source `${slime_dir}/scripts/models/${model_type}.sh` to get `MODEL_ARGS`
  4. Write toolkit config (ARN, bucket, exp_id, etc.) to a temp YAML
  5. `ray job submit -- python3 ${slime_dir}/train.py ` with `--custom-config-path` pointing at the temp YAML

`init.py` now re-exports `SlimeRunner` as the only public symbol. `integration/` and `patches/` stay where they are — `integration.rollout.generate_rollout` is still passed to slime as a dotted-path string.

`train.sh` stays in the repo, unchanged. It's the low-level escape hatch for users who need to debug the raw slime CLI, and serves as the reference for what the Python class replicates.

Plan: `docs/roadmap/committed/slime-runner-entrypoint.md` (on the PR #59 branch).

End-to-end smoke test

Ran `python train.py` (which calls `SlimeRunner(...).train(num_rollout=10)`) on Qwen2.5-3B-Instruct + 64-row GSM8K, 8 × H100, `slimerl/slime:latest` container:

Rollout raw_reward
0 0.252
1 0.355
2 0.344
3 0.488
4 0.377
5 0.594
6 0.533
7 0.676
8 0.574
9 0.675

Reward climbs 0.25 → 0.68 — matches PR #61's `train.sh` smoke (0.27 → 0.63), confirming the Python entry point produces the same training dynamics as bash. All 10 train steps logged `train/loss`, `train/grad_norm`, `train/step` and progressed monotonically. No session failures; only Ray atexit teardown tracebacks (cosmetic).

Test plan

  • Unit tests (`tests/test_slime_runner.py`, 6 tests) cover:
    • `_build_runtime_env` required keys (PYTHONPATH, CUDA_DEVICE_MAX_CONNECTIONS)
    • Wandb env forwarding (opt-in only when key is set)
    • `from_yaml` round-trip
    • Key kwargs reach the slime CLI (`--num-rollout`, `--lr`, `--tensor-model-parallel-size`, `--rollout-batch-size`, plus load-bearing integration dotted paths)
    • `extra_flags` escape hatch appends verbatim
    • Toolkit config YAML carries ACR pointers
  • Full test suite passes (81 tests).
  • End-to-end smoke (10-rollout training on Qwen2.5-3B, rewards climb, loss moves).

🤖 Generated with Claude Code

@lliquid lliquid force-pushed the feat/slime-runner branch from 1b4c428 to efcad5d Compare May 5, 2026 23:48
@lliquid lliquid self-assigned this May 5, 2026
lliquid and others added 2 commits May 6, 2026 00:10
Wraps train.sh + config.yaml + env vars behind a single Python class.
Users instantiate SlimeRunner with a handful of per-experiment fields
(exp_id, agent_runtime_arn, s3_bucket, model_dir, data_path,
model_type) and call .train(num_rollout=N); everything else has a
sensible default and an extra_flags escape hatch for slime / Megatron /
SGLang CLI passthrough.

Internally, .train() shells out to reproduce train.sh step-by-step:
stop stale sglang/ray, ray start --head, source the slime model
script, submit the slime training job via ray job submit. The toolkit
config (agent_runtime_arn, s3_bucket, exp_id, etc.) is written to a
temp YAML and passed via --custom-config-path so SlimeArtConfig.from_args
continues to read it.

train.sh stays in the repo as the low-level escape hatch and as the
reference for what the class replicates.

SETUP.md 3.4 now leads with the Python recipe; the bash recipe is
kept as "advanced / debugging."

Verified end-to-end on Qwen2.5-3B-Instruct + GSM8K with
.train(num_rollout=10): reward climbs 0.25 -> 0.68 over 10 rollouts,
all 10 train steps healthy, no failures.

Plan: docs/roadmap/committed/slime-runner-entrypoint.md.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Both tests exercise the same method on the same code path; the only
difference was whether WANDB_* env vars were set. Merged into one test
covering both the always-present keys and the opt-in wandb forwarding.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@lliquid lliquid force-pushed the feat/slime-runner branch from 722c63e to 76a690a Compare May 6, 2026 00:10
@lliquid lliquid merged commit e952dae into main May 6, 2026
5 checks passed
@lliquid lliquid deleted the feat/slime-runner branch May 19, 2026 00:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants