-
-
Notifications
You must be signed in to change notification settings - Fork 9.8k
Adds parallel model weight loading for runai_streamer #21330
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request aims to add parallel model weight loading for runai_streamer
. The implementation introduces a new version of runai_safetensors_weights_iterator
that uses streamer.stream_files
for parallel processing.
However, there are a couple of significant issues:
- The old
runai_safetensors_weights_iterator
function was not removed, resulting in a duplicate function definition in the same file. - The
tqdm
progress bar in the new function is misconfigured, which will lead to an incorrect progress display and issues in distributed environments.
I've provided comments with suggestions to fix these issues.
def runai_safetensors_weights_iterator( | ||
hf_weights_files: List[str], | ||
use_tqdm_on_load: bool, | ||
) -> Generator[tuple[str, torch.Tensor], None, None]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This pull request introduces a second definition for the function runai_safetensors_weights_iterator
. The original implementation exists at lines 479-492, and this new implementation will shadow it. This is likely unintentional and makes the code confusing.
Please remove the old implementation and keep this new one, which seems to correctly implement parallel weight loading as intended by the PR.
I cannot suggest the deletion directly as the original function is not part of this diff, but it should be removed to resolve this issue.
tensor_iter = tqdm( | ||
streamer.get_tensors(), | ||
total=len(hf_weights_files), | ||
desc="Loading safetensors using Runai Model Streamer", | ||
bar_format=_BAR_FORMAT, | ||
disable=not use_tqdm_on_load, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tqdm
progress bar is misconfigured here, which can lead to incorrect behavior:
-
Incorrect
total
: Thetotal
is set tolen(hf_weights_files)
, which is the number of files. However,tqdm
is iterating over tensors fromstreamer.get_tensors()
. Since there are usually multiple tensors per file, the progress bar will show incorrect percentages (e.g., > 100%). If the total number of tensors is not available, it's better to omit thetotal
argument.tqdm
will then display the iteration count instead of a percentage. -
Incorrect
disable
logic for distributed environments: Thedisable
argument should benot enable_tqdm(use_tqdm_on_load)
to ensure the progress bar is only displayed on the main process (rank 0) in a distributed setting. The current implementationnot use_tqdm_on_load
will cause progress bars to be printed from all worker processes.
Here is a suggested fix:
tensor_iter = tqdm(
streamer.get_tensors(),
desc="Loading safetensors using Runai Model Streamer",
bar_format=_BAR_FORMAT,
disable=not enable_tqdm(use_tqdm_on_load),
)
Signed-off-by: bbartels <[email protected]>
Signed-off-by: bbartels <[email protected]>
Signed-off-by: bbartels <[email protected]>
Signed-off-by: bbartels <[email protected]>
Signed-off-by: bbartels <[email protected]>
Signed-off-by: bbartels <[email protected]>
Signed-off-by: bbartels <[email protected]>
Signed-off-by: bbartels <[email protected]>
Signed-off-by: bbartels <[email protected]>
Signed-off-by: bbartels <[email protected]>
@DarkLight1337 Not sure whom to ping but you have reviewed existing runai related prs. Pipeline seems to have failed for unrelated reason. Any chance a full build can be kicked off? |
Triggering full CI |
The test failures are unrelated to this PR. Can you address my previous comment? Then we can merge the PR |
All done :) |
Co-authored-by: Cyrus Leung <[email protected]> Signed-off-by: bbartels <[email protected]>
…1330) Signed-off-by: bbartels <[email protected]> Co-authored-by: Cyrus Leung <[email protected]>
…1330) Signed-off-by: bbartels <[email protected]> Co-authored-by: Cyrus Leung <[email protected]> Signed-off-by: qizixi <[email protected]>
…1330) Signed-off-by: bbartels <[email protected]> Co-authored-by: Cyrus Leung <[email protected]>
…1330) Signed-off-by: bbartels <[email protected]> Co-authored-by: Cyrus Leung <[email protected]> Signed-off-by: avigny <[email protected]>
…1330) Signed-off-by: bbartels <[email protected]> Co-authored-by: Cyrus Leung <[email protected]> Signed-off-by: shuw <[email protected]>
…1330) Signed-off-by: bbartels <[email protected]> Co-authored-by: Cyrus Leung <[email protected]> Signed-off-by: x22x22 <[email protected]>
…1330) Signed-off-by: bbartels <[email protected]> Co-authored-by: Cyrus Leung <[email protected]>
…1330) Signed-off-by: bbartels <[email protected]> Co-authored-by: Cyrus Leung <[email protected]>
…1330) Signed-off-by: bbartels <[email protected]> Co-authored-by: Cyrus Leung <[email protected]> Signed-off-by: Jinzhen Lin <[email protected]>
…1330) Signed-off-by: bbartels <[email protected]> Co-authored-by: Cyrus Leung <[email protected]> Signed-off-by: Paul Pak <[email protected]>
…1330) Signed-off-by: bbartels <[email protected]> Co-authored-by: Cyrus Leung <[email protected]>
…1330) Signed-off-by: bbartels <[email protected]> Co-authored-by: Cyrus Leung <[email protected]> Signed-off-by: Diego-Castan <[email protected]>
…1330) Signed-off-by: bbartels <[email protected]> Co-authored-by: Cyrus Leung <[email protected]>
…1330) Signed-off-by: bbartels <[email protected]> Co-authored-by: Cyrus Leung <[email protected]>
Essential Elements of an Effective PR Description Checklist
supported_models.md
andexamples
for a new model.Purpose
The runai streamer library published a new way to load files in parallel as opposed to one at a time. This will allow for a much greater level of parallelism, especially when files are small.
Test Plan
The existing tests cover the code change
Test Result
(Optional) Documentation Update