-
Notifications
You must be signed in to change notification settings - Fork 178
Introduce a new clipping algorithm for non-layer based clipping #1203
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
cc @sagudev maybe interesting for you as well :) |
I think this should be ready for review now. |
@taj-p Lemme know if that would be useful for you in vello_hybrid, but since this hasn't really been talked about before I presume that the current clipping algo is good enough for you, so I haven't implemented it there. |
We actually make quite a lot of use of clip paths, so I'm wondering whether this would be faster than the layer implementation. For significantly clipped images, I imagine this would be faster (because the layer implementation requires drawing the entire image to a layer for the next to sample from). But, for milder clips, I'm not so sure since it's a question of whether it's faster to either:
My feeling is that I saw the Are you able to confirm whether I properly migrated this to work as you'd expect in hybrid? ![]() Cc @grebmeg for your thoughts |
Thanks! The result also surprises me a bit, though it's possible that pushing/popping on the GPU is so fast that it doesn't really matter, but I think it would matter for the CPU. I will try your experiment in vello_cpu tomorrow and see how it performs. To be fair, I didn't mean to imply that this version is faster than the current layer-based clipping! Just that this version of clipping should be much faster than using masks for simulating non-isolated clips. 😄 Based on intuition, on the CPU I would at least expect that this version is faster when dealing with many nested clip paths where only few paths are drawn, while the layer-based approach is faster when few clips are used but many paths are drawn with those clips active. But I haven't benchmarked it and it's possible that things are different on the GPU. |
It’s really nice that you can use sparse strips of two paths and get their intersection, and it’s SIMD-accelerated! That said, I’d like to understand the main goal of this PR more clearly: is it primarily about performance, about enabling clipping of intermediate results (as in issue #1117), or about separating two concepts that don’t quite fit together, or all together? From a performance perspective, That said, your change brings some nice improvements on CPU, especially avoiding extra clip buffer allocations per tile, and making it possible to immediately see clipping result when a clip is active. I’m also wondering whether your intersection algorithm could be integrated into current layer clipping in I also think that for |
It's basically as outlined in the
Looking at it a bit more closely, I believe this should be possible and it could simplify the code logic a bit, but the core difficulty in the current algo seems to be in other places so I'm not sure how much simpler it would make the current code. |
I can reproduce the slowdown in |
This way of clipping does have the perhaps surprising effect that clips will not affect layer composition at all. When we pop a layer, the active non-layer clip will not restrict where the layer gets composited. |
Yeah, the intention is that you pass a clip path to |
It might make sense to expose the active non-layer clip path in that case, so that it can easily be set on a layer, too. I'm imagining a hypothetical UI building use case, where each child widget gets clipped to its dimensions before it is allowed to render itself. That's a prime example of nested clips that might benefit from using this new algorithm. However, every once in a while, a widget may have need for an isolated layer and fancy compositing, in which case it should be able to inherit the active non-layer clip. (There might also be some value in renaming layer clips to something else to avoid confusion. Maybe stencils or shapes or masks?) Interestingly, while Skia separates clips and layers, they share a save/restore stack. You can either save just the current clip (and transformation matrix), or save the current clip and create a new layer. In either case you use That model should also allow for the following optimization: when the current layer's clip is identical to the current non-layer clip, then we don't have to intersect paths with the non-layer clip, as the layer already takes care of it. Realistically, "identical" would be implemented as "neither clip has changed since the layer's creation". Of course at that point we're moving further away from the PDF model, where clips and layers (transparency groups) don't share the same stack, I think (the full PDF model has a lot of fun corner cases and full support for it should probably not be the primary goal...) |
124051e
to
60e3e34
Compare
@taj-p Feel free to retry now, when I try it now the new clip version is faster! Though we do need to consider that it's kind of an edge case because we are only drawing one path per clip path, I assume the difference is going to amortize and turn around as the number of paths per clip path increases (haven't tested it carefully, though). |
I'm seeing a roughly 20-25% improvement in performance 😮 ! It's likely to be more for users with worse GPUs too. I wouldn't consider this too much an edge case. It's common to, for example, use a clipping region across a single image. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’m also wondering whether your intersection algorithm could be integrated into current layer clipping in pop_clip. Right now, that code works with clip bounding boxes and calculates all clip commands on pop. Would it be possible to use your intersect function to compute the intersection between a rectangular clip box and the path strips, and then run wide.generate on the resulting strips, assuming they are now all either clip alpha or clip fill?
Looking at it a bit more closely, I believe this should be possible and it could simplify the code logic a bit, but the core difficulty in the current algo seems to be in other places so I'm not sure how much simpler it would make the current code.
I suspect your intersection algorithm could also be applied to push_clip
/pop_clip
in coarse, which might remove the need for the current non-SIMD code while still replicating the existing push/pop clip buffer behavior and generating clip commands (I’m curious whether this approach would lead to better performance). It might also make sense to maintain a single clip stack.
if use_clip_path { | ||
ctx.push_clip_path(&clip_circle); | ||
} else { | ||
ctx.push_clip_layer(&clip_circle); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm that does look weird, will try to take a look when I find the time!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I cannot reproduce this on my machine, both cases look like the push_clip_layer
one for me...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it would help to add a test showing a more complex option that uses both clipping APIs? Roughly, something like this:
#[vello_test]
fn clip_non_isolated_rectangle_with_star_evenodd(ctx: &mut impl Renderer) {
let rect = Rect::new(0.0, 0.0, 100.0, 100.0);
let rect2 = Rect::new(10.0, 10.0, 80.0, 80.0);
let rect3 = Rect::new(40.0, 40.0, 90.0, 90.0);
let rect4 = Rect::new(70.0, 70.0, 75.0, 75.0);
ctx.push_clip_path(&rect2.to_path(0.1));
ctx.push_layer(Some(&rect3.to_path(0.1)), None, None, None);
ctx.push_clip_path(&rect4.to_path(0.1));
ctx.set_paint(REBECCA_PURPLE);
ctx.fill_rect(&rect);
ctx.pop_clip_path();
ctx.pop_layer();
ctx.pop_clip_path();
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to confirm, is it expected behavior that pushed clip paths (push_clip_path
) still affect internal paths rendering within the layer (in push_layer
)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that is expected. All clip paths pushed via push_clip_path
apply to all path drawing operations, regardless of which layer they are drawn on.
1725cda
to
5c534c6
Compare
5c534c6
to
8bd3fca
Compare
Motivation
Currently, the only way to do efficient clipping in Vello Sparse is by pushing a new isolated layer and associating a clip path with that layer. This might be sufficient for cases like for example SVG, but the problem is that doing so forces the user to conflate layers and clipping, which is not what should be done in some imaging models. For example, clip paths in PDF have nothing to do with isolation, and it is therefore desirable to be able to separate the two. I believe the same is the case for HTML canvas, where the clip stack is not really related to layers, and the fact they are causes problems (see #1117). In practice, this mostly doesn't cause issues as (to my understanding) source-over compositing with premultiplied alpha is associative and it therefore mostly doesn't matter whether you clip with isolation or not. However, problems start to occur once you add masks or blending to the mix, as the behavior with and without isolation is not identical.
It should be noted that above can be simulated by creating a new mask for each clip path, but this is obviously expensive, especially for deeply-nested clips.
Solution
Because of this, I propose a second way to support clipping in Vello CPU (could be added to hybrid as well) that is based on an independent clip stack that can be pushed and popped to without actually having to create new layers. In particular, you do not need to clear the clip stack before rasterizing, which should adddress the issue in #1117.
I think how this works is very neat and demonstrates how useful sparse strips are! Fundamentally, this is based on a new algorithm that, given two paths in sparse strips representation, computes a new path in sparse strips representation that represents the intersection of the two. While the actual implementation is a bit finicky, conceptually it's very simple: We iterate over each stripped and fill region of the two paths in lock step and determine all of the overlaps between the two. In case we have two fill regions, the overlap region will also be filled. In case we have one strip and one fill region, the overlap will copy the alpha mask of the strip region. Finally, if we have two strip regions, we combine the alpha masks of both. That's it, this is enough to compute a new sparse strips path that when filled is equivalent to intersecting the two strips. The beauty of this is that this is incredibly fast because of the usual sparseness of paths and the result will also still be as sparse as possible as we preserve sparse fill regions whenever possible.
Given this, we introduce two new
push_clip_path
andpop_clip_path
methods that operate on their own clip stack and completely independently from the push/pop layer commands. In case a clip path is currently active, once we draw a path we simply calculate the intersection after converting it into sparse strips, and then use that new path as the basis in coarse rasterization instead. In case we callpush_clip_path
multiple times, we can simply intersect the new clip path with the one in the previous stack position, to end up with one single new clip path that represents the intersection of all previous ones. Therefore, even for deeply nested clips we only need to perform one intersection per path drawn.The disadvantage is that in multi-threading, it requires us to process new clip paths on the main thread, but the advantage is that (unlike the current clip algorithm) causes no additional overhead during rasterization (which is currently the only serial part), and the clipping calculation is done by each worker separately for the path they are currently processing, which means it scales very well!
Testing
I will add more tests, but I confirmed the current version to cause no regressions in my PDF test suite.