-
-
Notifications
You must be signed in to change notification settings - Fork 10.9k
[Bugfix][P/D] Fix throughput stats in disaggregated setup #27569
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from 1 commit
378e06a
4c463b6
863ea67
9415154
c3d6723
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -121,6 +121,8 @@ class EngineCoreOutput( | |
| trace_headers: Mapping[str, str] | None = None | ||
| # The number of tokens with prefix cache hits. | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, this comment looks incorrect ... assuming "prefix cache" refers to the local cache? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am not familiar with it cc @chaunceyjiang |
||
| num_cached_tokens: int = 0 | ||
| # The number of tokens that have been computed remotely. | ||
| num_external_computed_tokens: int = 0 | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'd be tempted to refactor these two into a There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't have a strong opinion on this tbh, we can probably wait to have a few more things to bundle before executing the suggestion |
||
|
|
||
| @property | ||
| def finished(self) -> bool: | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -113,7 +113,7 @@ def _reset(self, now): | |
|
|
||
| def _track_iteration_stats(self, iteration_stats: IterationStats): | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Presumably you want to update the Prometheus metric too? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @markmc which one? I intentionally left |
||
| # Save tracked stats for token counters. | ||
| self.num_prompt_tokens += iteration_stats.num_prompt_tokens | ||
| self.num_prompt_tokens += iteration_stats.num_local_prompt_tokens | ||
NickLucche marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| self.num_generation_tokens += iteration_stats.num_generation_tokens | ||
|
|
||
| def _get_throughput(self, tracked_stats: int, now: float) -> float: | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -221,6 +221,8 @@ def __init__(self): | |
| self.num_generation_tokens = 0 | ||
| self.num_prompt_tokens = 0 | ||
| self.num_preempted_reqs = 0 | ||
| # Num of prompt tokens that have been computed locally. | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is the naming here a big confusing? By "computed locally" here we mean both computed and locally cached? If you just tracked There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Yes the behavior is unchanged, cached ones would still result in higher throughput even in regular aggregated setup.
I think looking at the diff this is pretty clear that I just want to rule out the remote tokens ie I assume the semantic was the intended one from the beginning, it's just "local" used to be redundant |
||
| self.num_local_prompt_tokens = 0 | ||
| self.finished_requests: list[FinishedRequestStats] = [] | ||
| self.max_num_generation_tokens_iter: list[int] = [] | ||
| self.n_params_iter: list[int] = [] | ||
|
|
@@ -251,6 +253,9 @@ def update_from_output( | |
| self.num_generation_tokens += num_new_generation_tokens | ||
| if is_prefilling: | ||
| self.num_prompt_tokens += prompt_len | ||
| self.num_local_prompt_tokens += ( | ||
| prompt_len - output.num_external_computed_tokens | ||
| ) | ||
NickLucche marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| first_token_latency = self._time_since(req_stats.arrival_time) | ||
| self.time_to_first_tokens_iter.append(first_token_latency) | ||
|
|
||
Uh oh!
There was an error while loading. Please reload this page.