Skip to content

Conversation

@b0nes164
Copy link

@b0nes164 b0nes164 commented Nov 13, 2025

This PR adds generation of intersection data to make_tiles.

For our MSAA calculations, we need the points where a line intersects with a tile. Since the end goal is to perform the MSAA in parallel, we need these intersections to be watertight between different threads. By performing these calculations in make_tiles, we can create a source of ground truth, ensuring watertightness.

Although it is feasible to calculate the exact intersection points in make_tiles, we defer that work to the gpu. Instead we create an intersection bitmask, which unambiguously defines which edges the line intersects. We say a line intersects the edge of a tile if it touches that edge AND continues into another tile. A endpoint that exactly touches an edge does NOT count as intersecting, though it may still contribute to winding (in the case of a top edge touch).

The bitmask is 5 bits, and will be consumed downstream by the rasterization stage:
P(erfect) | R(ight) | L(eft) | B(ottom) | T(op)
The lower 4 bits correspond to the intersection edges. The "Perfect" bit is necessary to resolve ambiguity in cases where a line perfectly intersects with a corner of a tile.


Cases of 3 bit ambiguity can be resolved by always calculating intersections on the opposing edges of the tile.
Consider a tile with intersections T | L | B:

o--------+
|\       |
| \      |
|  \     | 
+---o----+

Calculate the intersections on T and B!


Cases of 2 bit ambiguity require the "Perfect" bit, which is set when there is exactly one unique edge intersection.
Consider a tile with intersections T | L:

This is valid:

o--------+
|\       |
| o      |
|        | 
+--------+

But so is this:

+--o-----+
| /      |
o        |
|        | 
+--------+

With the perfect bit, the first case would be P | T | L and the second case would be T | L

@b0nes164 b0nes164 requested review from LaurenzV and tomcur November 13, 2025 17:04
@LaurenzV
Copy link
Contributor

I ran this through my PDF test suite and there only seem to be small unnoticeable single-pixel differences, so no regressions. 👍 However, I won't be able to take a closer look at this before Monday.

But just one thing I noticed, to me, it does seem like this will add some processing time for a single tile. Perhaps we could add a const generic to the method indicating whether tile intersection data should be computed, and when false it just sets it to 0? This way, we can skip all of the calculations for vello_cpu, where it's not needed.

Also, this PR would make #1211 irrelevant, right?

@b0nes164
Copy link
Author

However, I won't be able to take a closer look at this before Monday.

No worries! Take your time.

But just one thing I noticed, to me, it does seem like this will add some processing time for a single tile.

Yes there is a special case for a single tile.

This way, we can skip all of the calculations for vello_cpu

Yes, I was thinking the same thing. Not super familiar rust, so not sure what's the rustiest way to do this, but I think even two different make_tiles could work.

Also, this PR would make #1211 irrelevant, right?

I think so. I was debating whether it would be worth it to case out more fast paths, but as mentioned above, I do have a "line completely enclosed in tile" case.

Copy link
Member

@tomcur tomcur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool stuff. The packing is clever and looks good.

Your explanation of the packing is great, I think it would be worthwhile to add that as documentation to the packed_winding_line_idx.

Cases of 3 bit ambiguity can be resolved by always calculating intersections on the opposing edges of the tile.

Perhaps if you add the documentation to the code, also mention the 4-bit case (where, if my understanding is correct, you just calculate for any two opposing sides).

To be able to properly review the core idea itself I probably need a bit more background on the needs of MSAA, would you have some pointers?

This way, we can skip all of the calculations for vello_cpu

Yes, I was thinking the same thing.

Conditionally running this seems like a good idea. Here's an example of the const-generic pattern: https://github.com/linebender/parley/blob/4f887570bac98c6693e4a3c6937eebb3423dde72/parley/src/layout/alignment.rs#L84-L97, but two versions also makes sense, especially if the two versions can be optimized differently.

@LaurenzV
Copy link
Contributor

I think by default I would prefer a single version, and only if it really turns out that having two separate methods is faster do that. Otherwise, when changing stuff in the logic we need to always remember to do it in both methods.

@b0nes164
Copy link
Author

b0nes164 commented Nov 17, 2025

Re:All
I will work in your comments, eta tomorrow.

If I understand it, a const-generic is the equivalent of a C++ non-typed templating? I.e. the templated/const-generic variable is compile-time visible, and if we use a bool as the variable we can conditionally compile what we need?

This would work, but I expect it may be a little messy, as the function cannot be split cleanly. I will include this in the version tomorrow, so we can get a picture of what it would look like.


Re:Tom

Perhaps if you add the documentation to the code, also mention the 4-bit case (where, if my understanding is correct, you just calculate for any two opposing sides).

Yes this is correct.

To be able to properly review the core idea itself I probably need a bit more background on the needs of MSAA, would you have some pointers?

Yes apologies!

Conceptually, the MSAA-version of sparse-strips is almost identical to the analytic version but instead of calculating the coverage-mask through the area of the trapezoid formed by the line, we instead iterate through N subpixel sampling locations and determine if they are "inside" or "outside" the line. You then take the bitmask of these inside/outsides, count them, and turn that into coverage.

For example (note there is a bug in the topmost-left tile):
image

Naively this can be done by maintaining a winding number per subpixel sample, but because we want to do this in parallel, communicating 8, 16, 32 windings across threads becomes prohibitively costly.

Instead, we can minimize the communication required to a single winding number per tile---the same coarse winding used in the analytic version---by applying three rules. For simplicity we'll assume even/odd fill rule, for a single non-overlapping tile:

For any given pixel in a tile, the winding is the XOR of the coarse winding number for the whole tile, with the left edge intersection, with the per-pixel calculations.
pixel[x][y] = coarse_winding ^ left_edge_intersection ^ per_pixel_calculations

  1. The coarse winding we get from the tiling.

  2. The left-edge-intersection rule is: If the line intersecting a tile intersects its left edge, then left_edge_intersection is true for all pixel rows below the pixel row the line intersected. IF the line perfectly intersects the top-left corner, this requires tie-breaking logic, which is not implemented yet. (I realized today that the current logic is insufficient).

  3. The per-pixel-calculations involve determining which pixels the line intersects, then getting the subpixel sample mask, but I am keeping this deliberately vague, as it doesn't relate to the tiling.

Example, the left-edge-intersection in action:
image

Example of the current bug with top left intersection:
image

Because the left-edge-intersection seeds an entire row of pixels, any discrepancy in the left-edge pixel intersection between tiles (due to floating point errors) is potentially catastrophic. So the idea is to take advantage of the fact that we fully traverse the line during the tiling to create a source of ground-truth for the compute-shader to use.


Instead of going straight into the code, I think the best to way to review this is to go into the test cases, and see if the results match what you would expect. Or add your own test case and see what you get.

@b0nes164
Copy link
Author

Added changes as discussed, sans the additional comments. I deferred adding comments since the top-left intersection logic needs to be reworked to address the issue mentioned above. Previously, I had planned on doing all top-left corner tie-breaking logic downstream. However, I need to reconsider this decision because of the bug...

@LaurenzV
Copy link
Contributor

Sorry for the delay, I hope that I will be able to take a look in the next few days.

@b0nes164
Copy link
Author

@tomcur

I'm currently working on an overhaul to the intersections which will change the intersection data mask. Laurenz is working on a patch to the blender2d benchmarking suite to better integrate it with vello.

We would like to get both done, and then benchmark the performance before moving forward, so for now there is no need to review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants