Skip to content

[rollout] feat: enable Async RL for trtllm rollout#5631

Draft
hchings wants to merge 17 commits intoverl-project:mainfrom
hchings:abort_resume
Draft

[rollout] feat: enable Async RL for trtllm rollout#5631
hchings wants to merge 17 commits intoverl-project:mainfrom
hchings:abort_resume

Conversation

@hchings
Copy link
Copy Markdown
Collaborator

@hchings hchings commented Mar 17, 2026

What does this PR do?

Requires verl's trtllm version to be updated to include NVIDIA/TensorRT-LLM#12272 [Merged].

**Note that this MR only enables e2e async RL functionalities for trtllm rollout and tested convergence, it hasn't included perf optimizations for the rollout itself. Currently, we do see python overheads in _update_requests of trtllm rollout (especially when wtih TorchSampler). We'll have separate MRs to address rollout's perf.

Checklist Before Starting

  • Search for similar PRs. Paste at least one query link here: ...
  • Format the PR title as [{modules}] {type}: {description} (This will be checked by the CI)
    • {modules} include fsdp, megatron, veomni, sglang, vllm, rollout, trainer, ci, training_utils, recipe, hardware, deployment, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data, cfg, reward, fully_async, one_step_off
    • If this PR involves multiple modules, separate them with , like [megatron, fsdp, doc]
    • {type} is in feat, fix, refactor, chore, test
    • If this PR breaks any API (CLI arguments, config, function signature, etc.), add [BREAKING] to the beginning of the title.
    • Example: [BREAKING][fsdp, megatron] feat: dynamic batching

Test

Convergence of Qwen3-8B-base and Qwen3-30B-A3B-Base of this MR aligns with vllm with the same config.

1. Convergence - Qwen3-8B-base

bypass=False + rollout_IS=token. e2e script.

*Note that token_multi_prob_error is a tentatively metric adopting from nemo-rl doc.

image (20) image (19) image (18) image (17) image (16) image (15) image

2. Convergence - Qwen3-30B-A3B-Base

https://wandb.ai/nvidia/DAPO_fully_async_30B_erinh?nw=nwusererinh
Two trtllm curves, two vllm curves, for bypass=False + rollout_IS=token and bypass=True + rollout_IS=null correspondingly.

image image image

API and Usage Example

Demonstrate how the API changes if any, and provide usage example(s) if possible.

# Add code snippet or script demonstrating how to use this

Design & Code Changes

Demonstrate the high-level design if this PR is complex, and list the specific changes.

Checklist Before Submitting

Important

Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces abort and resume functionality for AsyncLLM in trtllm rollouts. The core logic changes in trtllm_async_server.py correctly implement this by mapping pause_generation and resume_generation calls and handling the cancelled finish reason. A new test file, test_trtllm_abort.py, is added to validate this functionality. My review focuses on the new test file, where I've identified a couple of areas for improvement to enhance its robustness and maintainability, particularly around configuration path resolution and Ray cluster cleanup.

Comment on lines +66 to +68
config_dir = os.path.abspath("verl/verl/trainer/config")
if not os.path.exists(config_dir):
config_dir = os.path.abspath("verl/trainer/config")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The method for locating the configuration directory is fragile as it depends on the current working directory from which pytest is executed. This can lead to test failures if run from a different directory (e.g., inside the tests/ directory). A more robust approach would be to determine the project's root directory programmatically and construct the path from there. This will make the test more reliable and easier to run in different environments.

Please also add from pathlib import Path to the top-level imports of the file.

Suggested change
config_dir = os.path.abspath("verl/verl/trainer/config")
if not os.path.exists(config_dir):
config_dir = os.path.abspath("verl/trainer/config")
from pathlib import Path
repo_root = Path(__file__).resolve().parents[4]
config_dir_path = repo_root / "verl/verl/trainer/config"
if not config_dir_path.exists():
config_dir_path = repo_root / "verl/trainer/config"
if not config_dir_path.exists():
raise FileNotFoundError(f"Config directory not found. Searched under {repo_root / 'verl'}")
config_dir = str(config_dir_path)


finally:
ray.shutdown()
subprocess.run(["ray", "stop"], capture_output=True)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The use of subprocess.run(["ray", "stop"]) is risky for test cleanup. It's a forceful command that can interfere with other parallel tests using Ray and depends on the ray CLI being in the PATH. The ray.shutdown() call on the previous line is the standard and safer way to clean up a local Ray cluster and should be sufficient. Relying on ray stop can mask resource leak issues that should be addressed directly.

@hchings hchings changed the title [rollout][trtllm] Add abort and resume for AsyncLLM in trtllm rollout [rollout][trtllm] Add abort and resume in trtllm rollout for Async RL Mar 17, 2026
@hchings hchings changed the title [rollout][trtllm] Add abort and resume in trtllm rollout for Async RL [rollout][trtllm] enable Async RL for trtllm rollout Mar 25, 2026
@hchings hchings force-pushed the abort_resume branch 2 times, most recently from cd53bf6 to f68e442 Compare April 15, 2026 18:19
@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Apr 22, 2026

CLA assistant check
All committers have signed the CLA.

@hchings hchings requested a review from Superjomn April 28, 2026 00:53
@hchings hchings self-assigned this Apr 28, 2026
@hchings hchings marked this pull request as ready for review May 4, 2026 06:34
@hchings hchings changed the title [rollout][trtllm] enable Async RL for trtllm rollout [rollout] feat: enable Async RL for trtllm rollout May 4, 2026
@hchings hchings marked this pull request as draft May 4, 2026 06:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants