[rollout] feat: enable Async RL for trtllm rollout#5631
[rollout] feat: enable Async RL for trtllm rollout#5631hchings wants to merge 17 commits intoverl-project:mainfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces abort and resume functionality for AsyncLLM in trtllm rollouts. The core logic changes in trtllm_async_server.py correctly implement this by mapping pause_generation and resume_generation calls and handling the cancelled finish reason. A new test file, test_trtllm_abort.py, is added to validate this functionality. My review focuses on the new test file, where I've identified a couple of areas for improvement to enhance its robustness and maintainability, particularly around configuration path resolution and Ray cluster cleanup.
| config_dir = os.path.abspath("verl/verl/trainer/config") | ||
| if not os.path.exists(config_dir): | ||
| config_dir = os.path.abspath("verl/trainer/config") |
There was a problem hiding this comment.
The method for locating the configuration directory is fragile as it depends on the current working directory from which pytest is executed. This can lead to test failures if run from a different directory (e.g., inside the tests/ directory). A more robust approach would be to determine the project's root directory programmatically and construct the path from there. This will make the test more reliable and easier to run in different environments.
Please also add from pathlib import Path to the top-level imports of the file.
| config_dir = os.path.abspath("verl/verl/trainer/config") | |
| if not os.path.exists(config_dir): | |
| config_dir = os.path.abspath("verl/trainer/config") | |
| from pathlib import Path | |
| repo_root = Path(__file__).resolve().parents[4] | |
| config_dir_path = repo_root / "verl/verl/trainer/config" | |
| if not config_dir_path.exists(): | |
| config_dir_path = repo_root / "verl/trainer/config" | |
| if not config_dir_path.exists(): | |
| raise FileNotFoundError(f"Config directory not found. Searched under {repo_root / 'verl'}") | |
| config_dir = str(config_dir_path) |
|
|
||
| finally: | ||
| ray.shutdown() | ||
| subprocess.run(["ray", "stop"], capture_output=True) |
There was a problem hiding this comment.
The use of subprocess.run(["ray", "stop"]) is risky for test cleanup. It's a forceful command that can interfere with other parallel tests using Ray and depends on the ray CLI being in the PATH. The ray.shutdown() call on the previous line is the standard and safer way to clean up a local Ray cluster and should be sufficient. Relying on ray stop can mask resource leak issues that should be addressed directly.
cd53bf6 to
f68e442
Compare
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
What does this PR do?
Requires verl's trtllm version to be updated to include NVIDIA/TensorRT-LLM#12272 [Merged].
**Note that this MR only enables e2e async RL functionalities for trtllm rollout and tested convergence, it hasn't included perf optimizations for the rollout itself. Currently, we do see python overheads in
_update_requestsof trtllm rollout (especially when wtihTorchSampler). We'll have separate MRs to address rollout's perf.Checklist Before Starting
[{modules}] {type}: {description}(This will be checked by the CI){modules}includefsdp,megatron,veomni,sglang,vllm,rollout,trainer,ci,training_utils,recipe,hardware,deployment,ray,worker,single_controller,misc,perf,model,algo,env,tool,ckpt,doc,data,cfg,reward,fully_async,one_step_off,like[megatron, fsdp, doc]{type}is infeat,fix,refactor,chore,test[BREAKING]to the beginning of the title.[BREAKING][fsdp, megatron] feat: dynamic batchingTest
Convergence of
Qwen3-8B-baseandQwen3-30B-A3B-Baseof this MR aligns with vllm with the same config.1. Convergence - Qwen3-8B-base
bypass=False + rollout_IS=token. e2e script.bypass=False + IS=token, the other withbypass=True and IS=null)*Note that
token_multi_prob_erroris a tentatively metric adopting from nemo-rl doc.2. Convergence - Qwen3-30B-A3B-Base
https://wandb.ai/nvidia/DAPO_fully_async_30B_erinh?nw=nwusererinh
Two trtllm curves, two vllm curves, for
bypass=False + rollout_IS=tokenandbypass=True + rollout_IS=nullcorrespondingly.API and Usage Example
# Add code snippet or script demonstrating how to use thisDesign & Code Changes
Checklist Before Submitting
Important
Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review.
pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=alwaysci-requestchannel in theverlSlack workspace. (If not accessible, please try the Feishu group (飞书群).)recipesubmodule, please also update the reference to the submodule commit viagit submodule update --remoteorcd recipe && git pull origin main.