-
Notifications
You must be signed in to change notification settings - Fork 132
hetero pd2 #1980
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: libint/debug_ttft
Are you sure you want to change the base?
hetero pd2 #1980
Conversation
remote_engine_id, len(meta.local_block_ids), | ||
len(meta.remote_block_ids)) | ||
if self.use_host_buffer: | ||
is_hetero = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whats the plan to set this variable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need the flag to enable the _recving_metadata
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are u going to add it as env flag ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we reuse the DECODE_TP_RATIO env
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it seems good
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
check if this can be moved to hpu_model_runner
self.vllm_config = vllm_config | ||
self.block_size = vllm_config.cache_config.block_size | ||
|
||
self.block_factor = 8 # A100.block_size/G2.block_size |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it going to be hardcoded value ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's hardcode for now, maybe it's ok since this number won't change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's better to check block size on both
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can check if the remote.block_size is expected. we don't know the remote.block_size here because the handshake occures afterwards.
"Rank %s, get_finished: %s requests done sending " | ||
"and %s requests done recving", self.tp_rank, | ||
len(done_sending), len(done_recving)) | ||
#import remote_pdb; remote_pdb.set_trace() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add check remote is gpu attention?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, i can add this
if num_local_blocks < num_remote_blocks: | ||
remote_block_ids = remote_block_ids[-num_local_blocks:] | ||
#if num_local_blocks < num_remote_blocks: | ||
# remote_block_ids = remote_block_ids[-num_local_blocks:] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add check for heter
No description provided.