-
Notifications
You must be signed in to change notification settings - Fork 5.5k
feat(native): Separate exchange get-data-size vs get-data counters #26353
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(native): Separate exchange get-data-size vs get-data counters #26353
Conversation
Reviewer's GuideThis PR separates exchange protocol metrics by introducing distinct histograms for the initial get-data-size phase vs the data retrieval phase, refines existing histogram parameters, and updates PrestoExchangeSource to tag and record metrics based on request type. Sequence diagram for differentiated metric recording in exchange protocolsequenceDiagram
participant Client
participant PrestoExchangeSource
participant Metrics
Client->>PrestoExchangeSource: get-data-size request (long-poll)
PrestoExchangeSource->>Metrics: Record kCounterExchangeGetDataSizeDuration
PrestoExchangeSource->>Metrics: Record kCounterExchangeGetDataSizeNumTries
Client->>PrestoExchangeSource: get-data request (non-blocking)
PrestoExchangeSource->>Metrics: Record kCounterExchangeRequestDuration
PrestoExchangeSource->>Metrics: Record kCounterExchangeRequestNumTries
PrestoExchangeSource->>Metrics: Record kCounterExchangeRequestPageSize
Class diagram for updated PrestoExchangeSource and CountersclassDiagram
class PrestoExchangeSource {
+void processDataResponse(std::unique_ptr<http::HttpResponse> response, bool getDataSize)
}
class Counters {
+kCounterExchangeRequestDuration
+kCounterExchangeRequestNumTries
+kCounterExchangeRequestPageSize
+kCounterExchangeGetDataSizeDuration
+kCounterExchangeGetDataSizeNumTries
}
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey there - I've reviewed your changes - here's some feedback:
- Using a boolean flag in processDataResponse to differentiate request types can be error-prone; consider splitting into two explicit handler methods or using an enum for better clarity.
- Inferring get-data-size requests via maxBytes == 0 is implicit and may lead to confusion; introducing an explicit parameter or separate request path would make the intent clearer.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- Using a boolean flag in processDataResponse to differentiate request types can be error-prone; consider splitting into two explicit handler methods or using an enum for better clarity.
- Inferring get-data-size requests via maxBytes == 0 is implicit and may lead to confusion; introducing an explicit parameter or separate request path would make the intent clearer.
## Individual Comments
### Comment 1
<location> `presto-native-execution/presto_cpp/main/PrestoExchangeSource.cpp:249` </location>
<code_context>
processDataError(httpRequestPath, maxBytes, maxWait, response->error());
} else {
- processDataResponse(std::move(response));
+ processDataResponse(std::move(response), maxBytes == 0);
}
} catch (const std::exception& e) {
</code_context>
<issue_to_address>
**suggestion:** The use of 'maxBytes == 0' to signal get-data-size requests could be error-prone.
Using 'maxBytes == 0' as a signal may cause confusion or bugs if the parameter's meaning changes or if zero becomes a valid value. An explicit flag or method would improve clarity and future maintainability.
Suggested implementation:
```cpp
} else {
const bool isGetDataSizeRequest = (maxBytes == 0);
processDataResponse(std::move(response), isGetDataSizeRequest);
}
```
```cpp
void PrestoExchangeSource::processDataResponse(
std::unique_ptr<http::HttpResponse> response, bool isGetDataSizeRequest) {
```
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
presto-native-execution/presto_cpp/main/PrestoExchangeSource.cpp
Outdated
Show resolved
Hide resolved
be73073 to
7bee4cc
Compare
…r better tracking Exchange has two phase protocol 1. First request is a get-data-size request, it uses long-poll protocol and wait until exchange source has data or timeout 2. Second request is get-data request, it will be be non-blocking call to transmit data Differentiating those two would be helpful to differentiate the time waiting on server and time for transmitting data
7bee4cc to
e1344b1
Compare
| if (isGetDataSizeRequest) { | ||
| RECORD_HISTOGRAM_METRIC_VALUE( | ||
| kCounterExchangeGetDataSizeDuration, | ||
| dataRequestRetryState_.durationMs()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure how useful is the overall duration. Should we add a metric excluding the duration of a long pool? (waiting for pages).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm also debating with myself whether this is needed, but when I look at this counter, one useful insight it gives is a ballpark about how much time we have to wait for data
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I spent some time thinking how we can exclude the duration but didn't find an easy way
- we need to include in the http response the "get-data-size server ready time"
- but since get-data-size is quite frequent call, so it would introduce non-trivial amount of overhead
Any ideas?
| dataRequestRetryState_.durationMs()); | ||
| RECORD_HISTOGRAM_METRIC_VALUE( | ||
| kCounterExchangeGetDataSizeNumTries, | ||
| dataRequestRetryState_.numTries()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the duration include all retries? Do we have (want to have?) metrics for a single request?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it includes all retries. So far, in my observation, for get-data, we don't have any retry at all
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. We'll enable this in our clusters as well and share observations with you next native worker group meeting.
Exchange has two phase protocol (#21926)
exchange source has data or timeout
Differentiating those two would be helpful to differentiate the time waiting on server
and time for transmitting data