Consider doFrames when extending frame timeline#4949
Conversation
For blocking calls per frame duration metrics we extend the frame timeline to cover cases where blocking calls starting within a certain frame extend beyond it. - Problem: Until now frames were extended till the next available frame in the actual frametine. This however can be problematic when UI thread gets vsync updates and makes blocking calls for a period of time where no draws actually happen. This means the next actual frame might be much later. This leads to skewed metrics that collect blocking calls way into the future after a specific frame had already ended. - Solution: A better approach could be to consider both the next actual frame and the next Choreographer#doFrame slice, picking the earliest option. This updated definition of extended frame should make the metric more meaningful and stable. Test: tools/diff_test_trace_processor.py --name-filter ".*android.*" out/android/trace_processor_shell
🎨 Perfetto UI Builds |
| -- ends, a part of the blocking call can be missed while calculating the metric. To avoid this | ||
| -- issue, the frame boundary is extended, spanning from the start of the current actual frame, till | ||
| -- the start of the next actual frame. | ||
| -- the start of either next actual frame or the next doFrame slice - whichever is earliest. |
There was a problem hiding this comment.
could you add here in the comment more details about why we're doing this? (e.g. that for some frames we might not actually be drawing and the actual timeline will not show an entry there, but there can still be doFrames and we only want to consider the portion until that doFrame even if there is no drawing going on).
but another question that arises could be... why don't we only consider the doFrame instead of the actual timeline for these boundaries? are there cases not covered by only considering the doFrame? (this is a genuine question, I don't remember the reason we were considering the timeline here instead of just the doFrame, but there might have been one)
There was a problem hiding this comment.
Added a more extensive comment section.
Re: the second question - not sure if fully relying on doFrame slices would be enough myself either. My thinking was that, by default, "per frame" should just be defined in terms of actual frame timeline data alone, but I guess with these cases where animate() blocks can run without any draws we can have the metric heavily inflated by actual frame timeline gaps. So I guess it might also be about - how do we actually define the term "per frame" - and how stable of a definition would, say, actual frame timeline till next doFrame slice be if used all the time.
Just kinda thinking out loud on this one, so open to ideas/changes, now or down the road.
| min(ts) | ||
| FROM _android_jank_cuj_do_frames | ||
| WHERE | ||
| utid = fr.ui_thread_utid AND ts >= fr.ts_end |
There was a problem hiding this comment.
to make this a bit more efficient you could also add AND ts <= fr.next_frame_start (as we're doing the min anyway later on). but I expect very minimal perf gains, so optional.
| lead(frame_ts) OVER (PARTITION BY cuj_id, layer_id ORDER BY frame_id) AS next_frame_start | ||
| FROM _android_distinct_frames_in_cuj | ||
| ), | ||
| _frames_with_extension_options AS ( |
There was a problem hiding this comment.
could you double check the perfromance of adding this additional table on a complicated enough trace? just to make sure we're not regression performance of generating metrics too much
There was a problem hiding this comment.
As per discussion offline. Went ahead and compiled the trace processor tool with and without my change. The delay till metrics were displayed in both seemed to be quite similar.
For blocking calls per frame duration metrics
we extend the frame timeline to cover cases
where blocking calls starting within a certain frame extend beyond it.
Until now frames were extended till the next available frame in the actual frametine. This however can be problematic when UI thread gets vsync updates and makes blocking calls for a period of time where no draws actually happen.
This means the next actual frame might be much later. This leads to skewed metrics that collect blocking calls way into the future after a specific frame had already ended.
A better approach could be to consider both the next actual frame and the next Choreographer#doFrame slice, picking the earliest option. This updated definition of extended frame should make the metric more meaningful and stable.
Test: tools/diff_test_trace_processor.py --name-filter ".*android.*" out/android/trace_processor_shell
Bug: 481631038