Dynamic filters blog post (rev 2) #103

alamb · 2025-08-22T13:49:55Z

Closes Blog post about TopK filter pushdown datafusion#15513

📰 See rendered preview here: https://datafusion.staged.apache.org/blog/2025/09/10/dynamic-filters/ 📰

Notes:
This is based on @adriangb 's PR in #102, but is hosted in the apache/datafusion-site fork so

Other people can push commits to it
It benefits from the staging site, as described here

Co-authored-by: Copilot <[email protected]>

alamb · 2025-08-22T15:46:11Z

@adriangb an update

I added a several diagrams to describe topk and the background
I rewrote some of the intro and restructured the flow so it was building up the explanation rather than following the implementation timeline
I moved some of the content around

This has resulted in a bit of a frankenstein 👹 at the moment -- I will fix it up shortly

You can see the current setup here: https://datafusion.staged.apache.org/blog/2025/09/01/dynamic-filters/

Question: Do you mind if I add myself a the second author?

alamb · 2025-08-22T15:46:33Z

adriangb · 2025-08-22T15:47:29Z

Question: Do you mind if I add myself a the second author?

You're making extensive edits, you should add yourself 😄

adriangb · 2025-08-22T15:48:05Z

FWIW I like the new figures but I also like the idea of annotating a plan that looks ~ the output from running explain on a query.

alamb · 2025-08-22T15:49:48Z

FWIW I like the new figures but I also like the idea of annotating a plan that looks ~ the output from running explain on a query.

Yes absolutely -- I think there is room for both -- basically I want to write a "everyman's explanation of Topk/dynamic filters" and then show how that translates into DataFusion.

adriangb · 2025-08-22T15:51:56Z

Sounds like a good plan to me!

alamb · 2025-08-24T10:51:42Z

I absolutely have reviewing / working on this article on my list this week, but it will likely take me a few days

alamb · 2025-09-04T11:12:02Z

I am very sorry for the delay -- I have been away, but plan to keep working on this post now that i have returned

THank you for your patience

XiangpengHao · 2025-09-08T18:44:42Z

Love this blog! Thank you @adriangb and @alamb for the hard work and putting it together -- easily my favorite read of the year!

content/blog/2025-09-10-dynamic-filters.md

alamb · 2025-09-09T11:41:42Z

I plan to publish this blog post tomorrow unless there are further comments

djanderson

Love this post, really cool.

nuno-faria

Really great read!

content/blog/2025-09-10-dynamic-filters.md

comphead · 2025-09-09T20:33:03Z

@adriangb @alamb what is still unclear for me, how the dynamic filter gets the min value? is it under the hood by scanning lets say stats, like parquet file footers or histograms?

adriangb · 2025-09-09T21:12:02Z

@adriangb @alamb what is still unclear for me, how the dynamic filter gets the min value? is it under the hood by scanning lets say stats, like parquet file footers or histograms?

We can probably improve the wording but for:

The TopK operator it comes from the TopK Heap.
For joins we accumulate min/max values of the join keys for each partition as we build the build side.
So the values are coming from the data itself.

comphead · 2025-09-09T21:47:56Z

The TopK operator it comes from the TopK Heap.

For joins we accumulate min/max values of the join keys for each partition as we build the build side.
So the values are coming from the data itself.

Maybe it requires slightly more details for the reader. I'm still trying to grasp the idea. 🤔
First of all having the filter makes a lot of sense as we do not scan whats unnecessary.

However to get the filter value (it doesn't have to be super accurate, just close to reduce the reading scope) it is possible to scan select min(ts) from t1 first, and this refers to a single column which might be cheap, and even cheaper if min/max can be derived from the footer, and then apply the value for TopK filter.

For the heap though the algorithm still not clear for me. How it makes sure we dont need to scan 100M rows as before, is it for any scenario, or when underlying files data are sorted? If the heap stored topK it still need to read all the rows? the benefit is we don't pay for full sorting and just for rebuild a heap.

alamb · 2025-09-10T12:31:19Z

However to get the filter value (it doesn't have to be super accurate, just close to reduce the reading scope) it is possible to scan select min(ts) from t1 first, and this refers to a single column which might be cheap, and even cheaper if min/max can be derived from the footer, and then apply the value for TopK filter.

I think one major idea is to reuse state / information that is already present in the operators -- so for example the TopK operator already has a topK heap, and the dynamic filter concept allows this information to be passed down to the scan.

How it makes sure we dont need to scan 100M rows as before, is it for any scenario, or when underlying files data are sorted?

I don't think the dynamic filter has any guarantees that it will filter rows -- for example, in the pathalogical case where the data is scanned in reverse order, it will not filter any

However, the idea is that updaing the dynamic filter is cheap and it does help in many real world settings, so it is overall a good optimization to do

adriangb · 2025-09-10T13:18:07Z

Besides, scanning the data in the precisely reverse order to the query is bad dynamic filters or not and we should fix that

alamb · 2025-09-10T14:09:53Z

I am going to incorporate the feedback from @comphead and @nuno-faria in the next few hours

alamb · 2025-09-10T15:29:33Z

Besides, scanning the data in the precisely reverse order to the query is bad dynamic filters or not and we should fix that

If it is known that the data is in the wrong order for sure. I am not sure DataFusion always knows how data is distributed across files.

Another potentially pathological case is when the data is randomly distributed throughout the files (so no files or row groups can be pruned out)

alamb · 2025-09-10T15:34:52Z

@adriangb @alamb what is still unclear for me, how the dynamic filter gets the min value? is it under the hood by scanning lets say stats, like parquet file footers or histograms?

@comphead -- the min value is the minimum value of what has already been seen during query execution.

So when the query starts, there is no min value.
As data flows through and the top K operator starts updating the heap, the min value is set
As more data flows through and the heap is refined (new values potentially replace existing values) then the min value can be updated as well

The TopK operator heap here:
https://github.com/apache/datafusion/blob/ab108a50d75e4e12fb6ebbfac0d0bffa24c265ea/datafusion/physical-plan/src/topk/mod.rs#L119

alamb · 2025-09-10T15:36:49Z

I am happy to keep making clarifications / etc on tickets and this blog in follow on PRs. But for now, let's publish it as I have already delayed it by several weeks and I would like this to be available before the NYC meetup next week

Onwards 🚀

comphead · 2025-09-10T15:41:15Z

@adriangb @alamb what is still unclear for me, how the dynamic filter gets the min value? is it under the hood by scanning lets say stats, like parquet file footers or histograms?

@comphead -- the min value is the minimum value of what has already been seen during query execution.

So when the query starts, there is no min value.

As data flows through and the top K operator starts updating the heap, the min value is set

As more data flows through and the heap is refined (new values potentially replace existing values) then the min value can be updated as well

The TopK operator heap here: https://github.com/apache/datafusion/blob/ab108a50d75e4e12fb6ebbfac0d0bffa24c265ea/datafusion/physical-plan/src/topk/mod.rs#L119

Thanks @alamb I was referring to statement

Figure 3 is better, but it still reads and decodes all 100M rows of the hits table, which is often unnecessary once we have found the top 10 rows

which implies the optimization doesn't have to read and decode 100M rows to get top10, and I cannot see how this is exactly happening 🤔 This is makes sense with the filter, but to get the min value for the filter we still to full scan, that is something i'm still missing, lets go ahead, yes, thanks for explanations

alamb · 2025-09-10T16:19:39Z

This is makes sense with the filter, but to get the min value for the filter we still to full scan, that is something i'm still missing, lets go ahead, yes, thanks for explanations

Let's take the best case, which is

after reading the first batch from the first file, DataFusion has read the actual minimum value

While it is true DataFusion now still needs to check all remaining files to ensure this is actually the minimum value, it may not have to actually open and read and decode the rows in the file -- for example, it could potentially prune (skip) all remaining files using statistics. And even if it can't prune out the entire file, it may be able to prune row groups, or ranges of rows (if pushdown_filters) is turned on

comphead · 2025-09-10T16:41:21Z

This is makes sense with the filter, but to get the min value for the filter we still to full scan, that is something i'm still missing, lets go ahead, yes, thanks for explanations

Let's take the best case, which is

after reading the first batch from the first file, DataFusion has read the actual minimum value

While it is true DataFusion now still needs to check all remaining files to ensure this is actually the minimum value, it may not have to actually open and read and decode the rows in the file -- for example, it could potentially prune (skip) all remaining files using statistics. And even if it can't prune out the entire file, it may be able to prune row groups, or ranges of rows (if pushdown_filters) is turned on

Oh I think I'm getting the picture now. So it is not only derived from data itself(like I was told) it is hybrid, data + parquet stats. That makes sense now, so we have an assumption that some value in the heap is approximate just to remove unnecessary reads, because it is still better than full scan. Best case scenario if we got the min value from the first batch, worst case still should be cheaper than full scan

alamb · 2025-09-10T17:31:21Z

Oh I think I'm getting the picture now. So it is not only derived from data itself(like I was told) it is hybrid, data + parquet stats.

Yeah, I think the predicate (min value) itself is derived only from data, but actually using it to make the query faster relies on statistics (and all the other parts of multi-level pruning)

adriangb · 2025-09-15T13:56:02Z

@nuno-faria what computer (or at least how many cores) did you run #103 (comment) on?

nuno-faria · 2025-09-15T17:56:56Z

@nuno-faria what computer (or at least how many cores) did you run #103 (comment) on?

@adriangb I used a 6-core CPU (12 execution threads).

adriangb and others added 13 commits August 16, 2025 21:07

start dynamic filters

b3634fb

start blog post

5a2d721

update image

0e7ae50

finish first draft

ccc195d

cleanup

e1904e8

fix typos

bb0c116

Co-authored-by: Copilot <[email protected]>

Update content/blog/2025-08-16-dynamic-filters.md

2da25de

Co-authored-by: Copilot <[email protected]>

Update content/blog/2025-08-16-dynamic-filters.md

33fb6dd

Co-authored-by: Copilot <[email protected]>

Update content/blog/2025-08-16-dynamic-filters.md

48a343a

Co-authored-by: Copilot <[email protected]>

Update date + titl

7849385

Render markdown tables

57484b4

update

c23d62f

Fix images

3f1941d

alamb mentioned this pull request Aug 22, 2025

Dynamic filters blog post #102

Closed

alamb added 3 commits August 22, 2025 10:06

Add about author, about DataFusion sections, correct reference

367d77b

Start filling out some more background

44c82ca

more

85b5cac

alamb added 5 commits September 5, 2025 10:09

Update date

c6dd733

Take a pass at background / content

9c80a36

Take a pass at background / content

8abc822

More content obsession

04b58ba

footnotes

77f2582

alamb commented Sep 9, 2025

View reviewed changes

content/blog/2025-09-10-dynamic-filters.md Show resolved Hide resolved

alamb added 6 commits September 9, 2025 06:38

tweaks

beacd15

Update results chart

8615af9

Update join performance figure

73efd75

tweak capitalization

05be777

clarify hash join section and add bullet points

c6bd806

Add djanderson to aknowledgemnts

e560c2c

djanderson approved these changes Sep 9, 2025

View reviewed changes

nuno-faria approved these changes Sep 9, 2025

View reviewed changes

content/blog/2025-09-10-dynamic-filters.md Outdated Show resolved Hide resolved

comphead reviewed Sep 9, 2025

View reviewed changes

content/blog/2025-09-10-dynamic-filters.md Outdated Show resolved Hide resolved

clarify join selectivity comment

a4a779c

alamb merged commit f69c5cc into main Sep 10, 2025
1 check passed

alamb deleted the site/dynamic-filters branch September 10, 2025 16:15

Dynamic filters blog post (rev 2) #103

Dynamic filters blog post (rev 2) #103

Uh oh!

Conversation

alamb commented Aug 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alamb commented Aug 22, 2025

Uh oh!

alamb commented Aug 22, 2025

Uh oh!

adriangb commented Aug 22, 2025

Uh oh!

adriangb commented Aug 22, 2025

Uh oh!

alamb commented Aug 22, 2025

Uh oh!

adriangb commented Aug 22, 2025

Uh oh!

alamb commented Aug 24, 2025

Uh oh!

alamb commented Sep 4, 2025

Uh oh!

XiangpengHao commented Sep 8, 2025

Uh oh!

Uh oh!

alamb commented Sep 9, 2025

Uh oh!

djanderson left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nuno-faria left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

comphead commented Sep 9, 2025

Uh oh!

adriangb commented Sep 9, 2025

Uh oh!

comphead commented Sep 9, 2025

Uh oh!

alamb commented Sep 10, 2025

Uh oh!

adriangb commented Sep 10, 2025

Uh oh!

alamb commented Sep 10, 2025

Uh oh!

alamb commented Sep 10, 2025

Uh oh!

alamb commented Sep 10, 2025

Uh oh!

alamb commented Sep 10, 2025

Uh oh!

Uh oh!

comphead commented Sep 10, 2025

Uh oh!

alamb commented Sep 10, 2025

Uh oh!

comphead commented Sep 10, 2025

Uh oh!

alamb commented Sep 10, 2025

Uh oh!

adriangb commented Sep 15, 2025

Uh oh!

nuno-faria commented Sep 15, 2025

Uh oh!

Uh oh!

alamb commented Aug 22, 2025 •

edited

Loading

djanderson left a comment •

edited

Loading