Skip to content

Consider doFrames when extending frame timeline#4949

Open
IQ01660 wants to merge 2 commits intomainfrom
dev/gabiyev/extend-frame
Open

Consider doFrames when extending frame timeline#4949
IQ01660 wants to merge 2 commits intomainfrom
dev/gabiyev/extend-frame

Conversation

@IQ01660
Copy link
Contributor

@IQ01660 IQ01660 commented Feb 27, 2026

For blocking calls per frame duration metrics
we extend the frame timeline to cover cases
where blocking calls starting within a certain frame extend beyond it.

  • Problem:

Until now frames were extended till the next available frame in the actual frametine. This however can be problematic when UI thread gets vsync updates and makes blocking calls for a period of time where no draws actually happen.
This means the next actual frame might be much later. This leads to skewed metrics that collect blocking calls way into the future after a specific frame had already ended.

  • Solution:

A better approach could be to consider both the next actual frame and the next Choreographer#doFrame slice, picking the earliest option. This updated definition of extended frame should make the metric more meaningful and stable.

Test: tools/diff_test_trace_processor.py --name-filter ".*android.*" out/android/trace_processor_shell
Bug: 481631038

For blocking calls per frame duration metrics
we extend the frame timeline to cover cases
where blocking calls starting within a certain frame
extend beyond it.

- Problem:

Until now frames were extended till the next available frame
in the actual frametine. This however can be problematic when
UI thread gets vsync updates and makes blocking calls for
a period of time where no draws actually happen.
This means the next actual frame might be much later. This leads
to skewed metrics that collect blocking calls way into the future
after a specific frame had already ended.

- Solution:

A better approach could be to consider both the next actual frame
and the next Choreographer#doFrame slice, picking the earliest option.
This updated definition of extended frame should make the metric
more meaningful and stable.

Test: tools/diff_test_trace_processor.py --name-filter ".*android.*"
out/android/trace_processor_shell
@IQ01660 IQ01660 requested a review from a team as a code owner February 27, 2026 12:12
@IQ01660 IQ01660 requested a review from devianb February 27, 2026 12:13
@github-actions
Copy link

github-actions bot commented Feb 27, 2026

Copy link
Contributor

@devianb devianb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but please get an approval from @nicomazz too, as I no longer look after this metric.

@devianb devianb requested a review from nicomazz March 2, 2026 07:44
-- ends, a part of the blocking call can be missed while calculating the metric. To avoid this
-- issue, the frame boundary is extended, spanning from the start of the current actual frame, till
-- the start of the next actual frame.
-- the start of either next actual frame or the next doFrame slice - whichever is earliest.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you add here in the comment more details about why we're doing this? (e.g. that for some frames we might not actually be drawing and the actual timeline will not show an entry there, but there can still be doFrames and we only want to consider the portion until that doFrame even if there is no drawing going on).

but another question that arises could be... why don't we only consider the doFrame instead of the actual timeline for these boundaries? are there cases not covered by only considering the doFrame? (this is a genuine question, I don't remember the reason we were considering the timeline here instead of just the doFrame, but there might have been one)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a more extensive comment section.

Re: the second question - not sure if fully relying on doFrame slices would be enough myself either. My thinking was that, by default, "per frame" should just be defined in terms of actual frame timeline data alone, but I guess with these cases where animate() blocks can run without any draws we can have the metric heavily inflated by actual frame timeline gaps. So I guess it might also be about - how do we actually define the term "per frame" - and how stable of a definition would, say, actual frame timeline till next doFrame slice be if used all the time.

Just kinda thinking out loud on this one, so open to ideas/changes, now or down the road.

min(ts)
FROM _android_jank_cuj_do_frames
WHERE
utid = fr.ui_thread_utid AND ts >= fr.ts_end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to make this a bit more efficient you could also add AND ts <= fr.next_frame_start (as we're doing the min anyway later on). but I expect very minimal perf gains, so optional.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

lead(frame_ts) OVER (PARTITION BY cuj_id, layer_id ORDER BY frame_id) AS next_frame_start
FROM _android_distinct_frames_in_cuj
),
_frames_with_extension_options AS (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you double check the perfromance of adding this additional table on a complicated enough trace? just to make sure we're not regression performance of generating metrics too much

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per discussion offline. Went ahead and compiled the trace processor tool with and without my change. The delay till metrics were displayed in both seemed to be quite similar.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants