feat(slime): multi-turn trace merging, validation at an input dataset and gateway config fix in custom rollout function#78
Conversation
| # But you must set model family name explicitly if actor_rollout_ref.model.path | ||
| # is a local model path. Check supported model families in MODEL_RENDERER_MAP of | ||
| # https://github.com/PrimeIntellect-ai/renderers/blob/main/renderers/base.py | ||
| renderer_family: "auto" |
There was a problem hiding this comment.
the semantics of renderer family does not look clear - if i am a user of the trainer class. if this could be inferred from model name then probably do not need to expose this hyperparam ?
There was a problem hiding this comment.
Thanks for feedback. Renderer family cannot be inferred from local model path, in this case user must explicitly set it. I changed the comments here to clarify.
| acr_tps_limit: 25 # ACR service TPS quota | ||
| max_concurrent: 100 # max concurrent ACR sessions (eval batching) | ||
| max_concurrent: 100 # max concurrent ACR sessions | ||
| max_pool_connections: 256 # boto3 connection pool size |
There was a problem hiding this comment.
it feels like max_concurrent and max_pool_connections are controlling similar things - do we need to specify these two arguments or the effective one is actually the one with lower value ?
There was a problem hiding this comment.
No, they are different. max_pool_connections are pased to RolloutClient initialize here https://github.com/awslabs/agentcore-rl-toolkit/blob/main/src/agentcore_rl_toolkit/client.py#L387, controlling boto3 connection bool size, not the max number of concurrently running ACR agent sessions.
| items = out.split(b"\0") | ||
| return [x.decode() for x in items if x] | ||
|
|
||
| def _cuda_ld_library_path(self) -> str: |
There was a problem hiding this comment.
do we really need this at the runner level? how did slime handle this?
There was a problem hiding this comment.
Good question, slime handles this the same way — its example launch scripts set LD_LIBRARY_PATH inline in the ray job submit --runtime-env-json env-vars block (e.g. scripts/run-deepseek-r1.sh hardcodes LD_LIBRARY_PATH). SlimeRunner just populates that same worker runtime_env, so it's consistent with slime's approach. But we compute the value instead of hardcoding it, because slime's scripts assume using the fixed docker with cu12.9 while we allow the user to run in arbitrary venvs where different cuda versions (12.9 or 13.0) could be installed. It's harmless and opt-in: the helper only runs when cuda_home is set. Left unset (the default), SlimeRunner inherits the ambient env with no pinning — identical to slime's default behavior.
This PR adds drift-free multi-turn trajectory merging to the slime custom rollout
function and fixes several issues that surfaced when running on full training /
validation datasets (eval episodes timing out and crashing the engine, connection-pool
churn, and unreadably noisy logs). It also brings
SlimeRunnerand the example scriptsin line with these changes.
Custom rollout function (
integration/)traces.py). Addedmerge_traces_to_samples: an episode's turns are folded into one trainingSample per contiguous prefix-extending segment (bridge tokens masked out,
completions trained), instead of one Sample per turn. Pairs with the gateway's
cumulative token mode; falls back to one-Sample-per-turn when a turn breaks the
prefix.
rollout.py,gateway.py). Newcumulative_token_mode/renderer_familyconfig knobs, forwarded to thegateway;
merge_traces_to_samplesis wired in as the rollout's sample builder.rollout.py). Supported doing validation on an input validationset periodically with an interval in training. Bounded in-flight agent sessions in validation
by
max_concurrentin yaml config.rollout.py). Addedmax_pool_connectionsso user can set
>= max_concurrentto stop boto3 client logging "Connection poolis full, discarding connection" under concurrent ACR sessions.
rollout.py,gateway.py). Replaced per-session loggingwith a single per-batch summary (
episodes / succeeded / failed / sequences);failed episodes log once at INFO with the full ACR
session_idfor CloudWatchlookup. Added
gateway_log_level(defaultwarning) to silence the gateway'suvicorn/httpx access logs.
/generate(gateway.py). The slimebackend now enables
use_sglangunconditionally and always passes the served--model; removed the deaduse_sglangconfig field. (The/generatemechanics live in the rllm-model-gateway PR.)
Training scripts & runner
runner.py— exposed the new knobs onSlimeRunner(
sglang_tool_call_parser,sglang_reasoning_parser,cumulative_token_mode,renderer_family,max_pool_connections,gateway_log_level,sglang_context_length) and addedcuda_hometo pin CUDA_HOME/LD_LIBRARY_PATH(incl.
--train-env-varsfor the Megatron actors), fixing TransformerEngine's"Multiple libcudart" abort. The tool-call parser is now a field instead of a
hardcoded
qwen25.examples/math_agent/train.sh— ported the CUDA-toolchain pinning,torch_memory_saverfixup, eval flags, log suppression, and checkpointing fromthe working run; all paths/credentials stay env-var placeholders.
config.yaml.example— documented the newmax_pool_connections,cumulative_token_mode, andrenderer_familysettings.This PR must work together with rllm-org/rllm#715.
A follow-up PR with slime environment setup guide in docs changes will come soon.