Compute light probe matrix earlier and cache in LightProbeInfo #20738 #20782

jz009 · 2025-08-28T02:54:36Z

Objective

Closes Light probe matrix caching could be accelerated #20738
Avoid multiple computations of the same matrix transpose

Solution

Compute the transpose when LightProbeInfo is created and reference it from there

Testing

cargo run -p ci

github-actions · 2025-08-28T02:54:45Z

Welcome, new contributor!

Please make sure you've read our contributing guide and we look forward to reviewing your pull request shortly ✨

alice-i-cecile · 2025-08-28T03:02:35Z

This is a reasonable change, but I generally think we should be measuring this sort of perf-oriented change. Caching is great, but it takes up memory and can lead to bugs so it's not free.

I expect that this is valuable, since light probes are often static, but I've learned not to trust my intuitions very far on this stuff.

Also, since I'm still a relative rendering noob: how will this cache get updated?

james7132 · 2025-08-28T06:40:00Z

crates/bevy_pbr/src/light_probe/mod.rs

@@ -145,6 +145,9 @@ where
    // The transform from world space to light probe space.
    light_from_world: Affine3A,

+    // The transpose of the inverse of [`light_from_world`].
+    light_from_world_transposed: Mat4,


If we are to improve performance, this would need to replace light_from_world with the three axes used in maybe_gather_light_probes. light_from_world otherwise goes completely unused.

…ransposed from LightProbeInfo

jz009 · 2025-08-29T01:24:56Z

I expect that this is valuable, since light probes are often static, but I've learned not to trust my intuitions very far on this stuff.

I did some testing and it seems that this basically only avoids one .inverse().transpose() per frame per view in the scene. At least in the example I was using to test which was irradiance_volumes.rs.

Also, since I'm still a relative rendering noob: how will this cache get updated?

The cache gets updated every frame since LightProbeInfo only persists for a single frame before being recreated.

mate-h · 2025-08-29T19:02:40Z

The code and the goal of the PR looks good to me, I am thinking of profiling this with tracy and Metal debugger to test whether there is performance improvement as Alice suggested.

atlv24 · 2025-09-02T04:39:14Z

This doesn't avoid an inverse per frame - the inverse is already avoided. This just avoids a transpose, to be clear. Im curious as to how much time we even save by transposing it to avoid sending a vec4 to the gpu tbh. We send full mat4s in a lot of places in bevy that we could 'optimize' similarly to here by sending transposes and omitting the last row. Anyways im not really for or against this pr one way or another, my main motivation for touching this code was precision related w.r.t. the inverses. Id need to see benches /shrug

james7132

Even if it doesn't save compute time, this does reduce the CPU size of LightProbeInfo by one Vec4.

atlv24 · 2025-09-04T03:37:44Z

im not convinced the readability loss is worth it, not gonna block this but im not giving it the second approve either. id have to see benches to feel that this matters enough to be fuddling around with Vec4 arrays. If anyone else feels strongly enough to approve go for it.

We send full mat4s everywhere, if this is actually a worthwhile optimization lets have evidence of it first and then do it elsewhere too, probably with a TransposedAffine3A component which is a thin wrapper around [Vec4; 3] with ctor that handles the transpose etc. and centralized docs explaining the reasoning on it

james7132 · 2025-09-04T05:41:04Z

We send full mat4s everywhere, if this is actually a worthwhile optimization lets have evidence of it first and then do it elsewhere too, probably with a TransposedAffine3A component which is a thin wrapper around [Vec4; 3] with ctor that handles the transpose etc. and centralized docs explaining the reasoning on it

From the GPU-side, @superdump, I believe, was the one who brought up that transposes on modern GPUs are basically free, which is where this optimization comes from.

From the CPU-side, in general, data-oriented design prioritizes shaping code to match the target hardware. For modern CPUs, this often means minimizing padding and shrinking or, in other cases, splitting our memory fetches to make every RAM fetch worth it, so I think we should keep that in mind for all CPU-side operations, engine wide. A bit of extra in-register computation tends to be rather cheap.

I agreed that a helper type would be useful here. There are definitely areas (e.g. skinned mesh bindposes) where we store and send many more Mat4s/Affine3As to the GPU, and shaving 25% off of the memory usage and bandwidth there is nothing to scoff at.

atlv24 · 2025-09-05T20:44:59Z

Okay im convinced, i presume a helper type is left as followup work? Or should we do it in this PR

tychedelia · 2025-09-05T22:32:17Z

From the CPU-side, in general, data-oriented design prioritizes shaping code to match the target hardware. For modern CPUs, this often means minimizing padding and shrinking or, in other cases, splitting our memory fetches to make every RAM fetch worth it, so I think we should keep that in mind for all CPU-side operations, engine wide. A bit of extra in-register computation tends to be rather cheap.

The theory here makes absolute sense to me, but I'll also articulate @atlv24 more general concern about empiricism with respect to these changes. I don't want to blame this particular PR though, this is something we struggle with in general, and we should have more robust mechanisms to automate running benches via automation so we can answer these questions. In the meantime, I am also very convinced by your explanation here!

Compute light probe matrix earlier and cache in LightProbeInfo

66992c1

alice-i-cecile requested review from tychedelia, atlv24 and mate-h August 28, 2025 02:59

alice-i-cecile added A-Rendering Drawing game state to the screen C-Performance A change motivated by improving speed, memory usage or compile times S-Needs-Benchmarking This set of changes needs performance benchmarking to double-check that they help labels Aug 28, 2025

github-project-automation bot added this to Rendering Aug 28, 2025

formatting

3e8f96d

james7132 requested changes Aug 28, 2025

View reviewed changes

jz009 added 3 commits August 28, 2025 14:12

Replace light_from_world with [Vec4; 3] and remove light_from_world_t…

b5a0b82

…ransposed from LightProbeInfo

remove testing variables

a35a524

formatting

4491b03

jz009 requested a review from james7132 August 28, 2025 18:51

james7132 approved these changes Sep 3, 2025

View reviewed changes

alice-i-cecile added S-Needs-Review Needs reviewer attention (from anyone!) to move forward and removed S-Needs-Benchmarking This set of changes needs performance benchmarking to double-check that they help labels Sep 3, 2025

atlv24 approved these changes Sep 5, 2025

View reviewed changes

alice-i-cecile added S-Ready-For-Final-Review This PR has been approved by the community. It's ready for a maintainer to consider merging it and removed S-Needs-Review Needs reviewer attention (from anyone!) to move forward labels Sep 5, 2025

alice-i-cecile added this pull request to the merge queue Sep 11, 2025

Merged via the queue into bevyengine:main with commit e78b62d Sep 11, 2025
36 checks passed

github-project-automation bot moved this to Done in Rendering Sep 11, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Compute light probe matrix earlier and cache in LightProbeInfo #20738 #20782

Compute light probe matrix earlier and cache in LightProbeInfo #20738 #20782

Uh oh!

jz009 commented Aug 28, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Aug 28, 2025

Uh oh!

alice-i-cecile commented Aug 28, 2025

Uh oh!

james7132 Aug 28, 2025

Uh oh!

jz009 commented Aug 29, 2025

Uh oh!

mate-h commented Aug 29, 2025

Uh oh!

atlv24 commented Sep 2, 2025

Uh oh!

james7132 left a comment

Uh oh!

atlv24 commented Sep 4, 2025

Uh oh!

james7132 commented Sep 4, 2025

Uh oh!

atlv24 commented Sep 5, 2025

Uh oh!

tychedelia commented Sep 5, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Compute light probe matrix earlier and cache in LightProbeInfo #20738 #20782

Compute light probe matrix earlier and cache in LightProbeInfo #20738 #20782

Uh oh!

Conversation

jz009 commented Aug 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Objective

Solution

Testing

Uh oh!

github-actions bot commented Aug 28, 2025

Uh oh!

alice-i-cecile commented Aug 28, 2025

Uh oh!

james7132 Aug 28, 2025

Choose a reason for hiding this comment

Uh oh!

jz009 commented Aug 29, 2025

Uh oh!

mate-h commented Aug 29, 2025

Uh oh!

atlv24 commented Sep 2, 2025

Uh oh!

james7132 left a comment

Choose a reason for hiding this comment

Uh oh!

atlv24 commented Sep 4, 2025

Uh oh!

james7132 commented Sep 4, 2025

Uh oh!

atlv24 commented Sep 5, 2025

Uh oh!

tychedelia commented Sep 5, 2025

Uh oh!

Uh oh!

Uh oh!

jz009 commented Aug 28, 2025 •

edited

Loading