feat(slime): multi-turn trace merging, validation at an input dataset and gateway config fix in custom rollout function by lyzustc · Pull Request #78 · awslabs/agentcore-rl-toolkit

lyzustc · 2026-07-01T00:25:51Z

This PR adds drift-free multi-turn trajectory merging to the slime custom rollout
function and fixes several issues that surfaced when running on full training /
validation datasets (eval episodes timing out and crashing the engine, connection-pool
churn, and unreadably noisy logs). It also brings SlimeRunner and the example scripts
in line with these changes.

Custom rollout function (`integration/`)

feat — auto-merge multi-turn conversations (traces.py). Added
merge_traces_to_samples: an episode's turns are folded into one training
Sample per contiguous prefix-extending segment (bridge tokens masked out,
completions trained), instead of one Sample per turn. Pairs with the gateway's
cumulative token mode; falls back to one-Sample-per-turn when a turn breaks the
prefix.
feat — cumulative token mode plumbing (rollout.py, gateway.py). New
cumulative_token_mode / renderer_family config knobs, forwarded to the
gateway; merge_traces_to_samples is wired in as the rollout's sample builder.
feat — validation (rollout.py). Supported doing validation on an input validation
set periodically with an interval in training. Bounded in-flight agent sessions in validation
by max_concurrent in yaml config.
fix — connection-pool churn (rollout.py). Added max_pool_connections
so user can set >= max_concurrentto stop boto3 client logging "Connection pool
is full, discarding connection" under concurrent ACR sessions.
fix — log noise (rollout.py, gateway.py). Replaced per-session logging
with a single per-batch summary (episodes / succeeded / failed / sequences);
failed episodes log once at INFO with the full ACR session_id for CloudWatch
lookup. Added gateway_log_level (default warning) to silence the gateway's
uvicorn/httpx access logs.
fix — gateway always uses SGLang /generate (gateway.py). The slime
backend now enables use_sglang unconditionally and always passes the served
--model; removed the dead use_sglang config field. (The /generate
mechanics live in the rllm-model-gateway PR.)

Training scripts & runner

runner.py — exposed the new knobs on SlimeRunner
(sglang_tool_call_parser, sglang_reasoning_parser, cumulative_token_mode,
renderer_family, max_pool_connections, gateway_log_level,
sglang_context_length) and added cuda_home to pin CUDA_HOME/LD_LIBRARY_PATH
(incl. --train-env-vars for the Megatron actors), fixing TransformerEngine's
"Multiple libcudart" abort. The tool-call parser is now a field instead of a
hardcoded qwen25.
examples/math_agent/train.sh — ported the CUDA-toolchain pinning,
torch_memory_saver fixup, eval flags, log suppression, and checkpointing from
the working run; all paths/credentials stay env-var placeholders.
config.yaml.example — documented the new max_pool_connections,
cumulative_token_mode, and renderer_family settings.

This PR must work together with rllm-org/rllm#715.
A follow-up PR with slime environment setup guide in docs changes will come soon.

lliquid · 2026-07-01T05:29:37Z

+# But you must set model family name explicitly if actor_rollout_ref.model.path
+# is a local model path. Check supported model families in MODEL_RENDERER_MAP of
+# https://github.com/PrimeIntellect-ai/renderers/blob/main/renderers/base.py
+renderer_family: "auto"


the semantics of renderer family does not look clear - if i am a user of the trainer class. if this could be inferred from model name then probably do not need to expose this hyperparam ?

Thanks for feedback. Renderer family cannot be inferred from local model path, in this case user must explicitly set it. I changed the comments here to clarify.

lliquid · 2026-07-01T19:51:18Z

 acr_tps_limit: 25                   # ACR service TPS quota
-max_concurrent: 100                 # max concurrent ACR sessions (eval batching)
+max_concurrent: 100                 # max concurrent ACR sessions
+max_pool_connections: 256           # boto3 connection pool size


it feels like max_concurrent and max_pool_connections are controlling similar things - do we need to specify these two arguments or the effective one is actually the one with lower value ?

No, they are different. max_pool_connections are pased to RolloutClient initialize here https://github.com/awslabs/agentcore-rl-toolkit/blob/main/src/agentcore_rl_toolkit/client.py#L387, controlling boto3 connection bool size, not the max number of concurrently running ACR agent sessions.

luyuzhe111 · 2026-07-02T06:03:03Z

        items = out.split(b"\0")
        return [x.decode() for x in items if x]

+    def _cuda_ld_library_path(self) -> str:


do we really need this at the runner level? how did slime handle this?

Good question, slime handles this the same way — its example launch scripts set LD_LIBRARY_PATH inline in the ray job submit --runtime-env-json env-vars block (e.g. scripts/run-deepseek-r1.sh hardcodes LD_LIBRARY_PATH). SlimeRunner just populates that same worker runtime_env, so it's consistent with slime's approach. But we compute the value instead of hardcoding it, because slime's scripts assume using the fixed docker with cu12.9 while we allow the user to run in arbitrary venvs where different cuda versions (12.9 or 13.0) could be installed. It's harmless and opt-in: the helper only runs when cuda_home is set. Left unset (the default), SlimeRunner inherits the ambient env with no pinning — identical to slime's default behavior.

lliquid reviewed Jul 1, 2026

View reviewed changes

luyuzhe111 reviewed Jul 2, 2026

View reviewed changes

lyzustc closed this Jul 3, 2026

lyzustc force-pushed the main branch from 90b8276 to 5001471 Compare July 3, 2026 04:20

lyzustc had a problem deploying to github-pages July 3, 2026 04:40 — with GitHub Pages Failure

lyzustc had a problem deploying to github-pages July 3, 2026 04:43 — with GitHub Actions Failure

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(slime): multi-turn trace merging, validation at an input dataset and gateway config fix in custom rollout function#78

feat(slime): multi-turn trace merging, validation at an input dataset and gateway config fix in custom rollout function#78
lyzustc wants to merge 0 commit into
awslabs:mainfrom
lyzustc:main

lyzustc commented Jul 1, 2026 •

edited

Loading

Uh oh!

lliquid Jul 1, 2026

Uh oh!

lyzustc Jul 1, 2026

Uh oh!

lliquid Jul 1, 2026

Uh oh!

lyzustc Jul 1, 2026

Uh oh!

luyuzhe111 Jul 2, 2026

Uh oh!

lyzustc Jul 2, 2026

Uh oh!

luyuzhe111 Jul 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

lyzustc commented Jul 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Custom rollout function (integration/)

Training scripts & runner

Uh oh!

lliquid Jul 1, 2026

Choose a reason for hiding this comment

Uh oh!

lyzustc Jul 1, 2026

Choose a reason for hiding this comment

Uh oh!

lliquid Jul 1, 2026

Choose a reason for hiding this comment

Uh oh!

lyzustc Jul 1, 2026

Choose a reason for hiding this comment

Uh oh!

luyuzhe111 Jul 2, 2026

Choose a reason for hiding this comment

Uh oh!

lyzustc Jul 2, 2026

Choose a reason for hiding this comment

Uh oh!

luyuzhe111 Jul 2, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

lyzustc commented Jul 1, 2026 •

edited

Loading

Custom rollout function (`integration/`)