[trl] Trl v0.28 (and above) rl fixes#4156
Conversation
for more information, see https://pre-commit.ci
Remove sync/reload weights calls , remove vllm.LLM instantiation
for more information, see https://pre-commit.ci
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request updates Unsloth's reinforcement learning integration to align with recent changes in the TRL library, specifically addressing the restructuring of their RL pipelines in versions 0.28 and higher. The primary goal is to ensure seamless operation and performance optimization by adapting to TRL's new Highlights
Changelog
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c75d8aa4af
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| f"{indent}# Skipping per-step vLLM sync_weights().\n" | ||
| ) | ||
|
|
||
| function = sync_weights_block.sub(remove_sync_weights_block, function) |
There was a problem hiding this comment.
Keep vLLM weight sync for non-fast-inference runs
This change removes the self.vllm_generation.sync_weights() call from _generate_single_turn for all GRPO executions, not just when Unsloth fast-inference LoRA is active. In the non-fast-inference path (where _init_vllm creates a separate LLM(...) instance), that sync is what keeps vLLM generation weights aligned with the training model each step; skipping it causes rollouts to be generated from stale parameters, which can corrupt training signals. The later unsloth_fast_inference_lora guard in VLLMGeneration.sync_weights does not mitigate this because the call site is removed entirely.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Code Review
This pull request updates Unsloth to be compatible with TRL v0.28 and above by patching the new trl.generation.vllm_generation module. The changes correctly reuse existing vLLM instances and prevent redundant weight synchronization, following the established patching patterns in the codebase. The implementation is robust, utilizing regular expressions for source code transformation and linecache for better debugging of patched code. I've provided a couple of suggestions to refine exception handling to be more specific, which will improve code maintainability. Overall, the changes are well-executed and effectively address the compatibility requirements.
| if Version(importlib_version("trl")) < Version("0.28.0"): | ||
| return | ||
|
|
||
| try: |
There was a problem hiding this comment.
The exception handling (ImportError, NameError, Exception) is redundant because Exception is a base class for both ImportError and NameError. It's better to be more specific about the exceptions you expect to catch. In this case, ImportError is the most likely exception if the module path is incorrect or the trl version is not as expected. Catching the broad Exception can mask other unexpected issues during the import process.
| try: | |
| except ImportError as e: |
| logger.info(f"Unsloth: Could not find VLLMGeneration.{method_name}") | ||
| return False | ||
|
|
||
| try: |
There was a problem hiding this comment.
Catching the broad Exception class can hide unexpected errors. The inspect.getsource() function is documented to raise TypeError for unsupported object types and OSError if the source file cannot be retrieved. It is better practice to catch these specific exceptions to make the error handling more precise and robust.
| try: | |
| except (TypeError, OSError) as e: |
for more information, see https://pre-commit.ci
Remove sync/reload weights calls , remove vllm.LLM instantiation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 3fd2994dbf
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
unsloth/models/rl_replacements.py
Outdated
| function, | ||
| ) | ||
|
|
||
| <<<<<<< HEAD |
There was a problem hiding this comment.
Remove unresolved merge markers in rl_replacements
The file contains raw conflict markers (<<<<<<<, |||||||, >>>>>>>) inside grpo_trainer__generate_single_turn, which makes this module invalid Python and raises a SyntaxError as soon as unsloth.models.rl_replacements is imported. In environments that load RL patching (including PatchFastRL), this is a hard failure that prevents startup entirely.
Useful? React with 👍 / 👎.
for more information, see https://pre-commit.ci
|
One comment that I have is that, my PR needs to be merged first: Unsloth: #4140 |
TRL restructured their RL pipelines to a new way. They have a file now called vllm_generation.py, which takes care of creating the vLLM instance and handling inference for RL. We need to patch it now, like we have been doing so far for the RL trainer classes.