Skip to content

Conversation

hsubramony
Copy link

No description provided.

@hsubramony hsubramony changed the title Buke hetero pd2 hetero pd2 Sep 25, 2025
@hsubramony hsubramony changed the base branch from habana_main to libint/debug_ttft September 25, 2025 00:07
remote_engine_id, len(meta.local_block_ids),
len(meta.remote_block_ids))
if self.use_host_buffer:
is_hetero = True
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whats the plan to set this variable

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need the flag to enable the _recving_metadata

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are u going to add it as env flag ?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we reuse the DECODE_TP_RATIO env

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems good

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check if this can be moved to hpu_model_runner

self.vllm_config = vllm_config
self.block_size = vllm_config.cache_config.block_size

self.block_factor = 8 # A100.block_size/G2.block_size
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it going to be hardcoded value ?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's hardcode for now, maybe it's ok since this number won't change

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's better to check block size on both

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can check if the remote.block_size is expected. we don't know the remote.block_size here because the handshake occures afterwards.

"Rank %s, get_finished: %s requests done sending "
"and %s requests done recving", self.tp_rank,
len(done_sending), len(done_recving))
#import remote_pdb; remote_pdb.set_trace()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add check remote is gpu attention?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, i can add this

if num_local_blocks < num_remote_blocks:
remote_block_ids = remote_block_ids[-num_local_blocks:]
#if num_local_blocks < num_remote_blocks:
# remote_block_ids = remote_block_ids[-num_local_blocks:]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add check for heter

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants