[Parquet] Support page level cache for reading #8306

123789456ye · 2025-09-10T10:28:09Z

Which issue does this PR close?

Closes [Parquet] Support page level cache for reading #8246

Rationale for this change

We have a thought of introducing a page-level cache long ago. Though previously we can only read the whole rowgroup.
Now we can implement it. The predicate part has been implemented, and output part is left for this PR.

What changes are included in this PR?

The root part is to introduce page-level cache mechanism in decode_page in impl RowGroupReader for SerializedRowGroupReader.
Only effective for async readers. Nearly zero overhead for sync readers.
The cache mechanism is using moka crate. This part is plug-in if we need to change.

Are these changes tested?

I run cargo test and cargo test --features=arrow,async.
All tests pass.

Are there any user-facing changes?

No.

…gorithm

123789456ye · 2025-09-10T10:38:59Z

I set default cache capacity to 100, which means we can cache 100 pages, using slightly more than 100 MB memory.

I run cargo bench --bench arrow_reader_clickbench --features "arrow async" "async" -- --nocapture --measurement-time 10 --save-baseline baseline for baseline,
and cargo bench --bench arrow_reader_clickbench --features "arrow async" "async" -- --nocapture --measurement-time 10 --baseline baseline for bench.

Run in local environment, Ubuntu 22.04 LTS in WSL2.

Results as followes. All time in table is median time.

And we think it is a very delightful result.

Query	Baseline Time	Current Time	Change
Q1	1.8 ms	1.9 ms	+1.0%
Q10	17.5 ms	10.5 ms	-40.1%
Q11	20.0 ms	13.6 ms	-31.9%
Q12	27.8 ms	16.8 ms	-39.4%
Q13	39.4 ms	28.0 ms	-28.9%
Q14	36.3 ms	23.8 ms	-34.3%
Q19	5.1 ms	4.7 ms	-7.6%
Q20	95.8 ms	43.4 ms	-54.7%
Q21	112.8 ms	52.8 ms	-53.2%
Q22	191.3 ms	127.2 ms	-33.5%
Q23	327.1 ms	259.7 ms	-20.6%
Q24	34.7 ms	28.3 ms	-18.4%
Q27	73.5 ms	35.6 ms	-51.5%
Q28	77.5 ms	34.4 ms	-55.6%
Q30	50.9 ms	40.3 ms	-20.9%
Q36	96.6 ms	50.2 ms	-48.0%
Q37	75.8 ms	46.0 ms	-39.3%
Q38	30.8 ms	24.5 ms	-20.7%
Q39	41.2 ms	26.2 ms	-36.4%
Q40	45.2 ms	36.2 ms	-20.1%
Q41	33.2 ms	29.7 ms	-10.5%
Q42	11.6 ms	11.4 ms	-2.2%

123789456ye · 2025-09-10T10:48:54Z

Basically we are using memory to exchange for bypassing decompressing and decoding.

The great result partly comes from the concurrency of bench test, that will read one file across multiple readers.

I have also tested to use a reader-level caching, but unfortunately all performance will regress. Therefore we can only maintain a global cache.

alamb · 2025-09-10T19:01:18Z

Thank you for this @123789456ye -- I have started the CI checks on this PR

Perhaps @XiangpengHao also has some time to review this

alamb · 2025-09-10T19:11:25Z

parquet/Cargo.toml

@@ -52,6 +52,7 @@ parquet-variant-compute = { workspace = true, optional = true }
 object_store = { version = "0.12.0", default-features = false, optional = true }

 bytes = { version = "1.1", default-features = false, features = ["std"] }
+moka = { version = "0.12", default-features = false, features = ["sync"] }


In general, we have tried to keep the dependnecy tree relatively small for paruqet -- this one seems to be a significant addition: https://crates.io/crates/moka/0.12.10/dependencies

Rather than implement the cache directly in the parquet crate, I wonder if we could add a trait in the parquet crate and then users would provide implementations 🤔

I totally agree that we should not introduce it. And this part is easy to change.
But I haven't consider not to implement cache. If we don't provide a default implementation, isn't it messy to write a custom implementation each time if we want to use it?

Rather than implement the cache directly in the parquet crate, I wonder if we could add a trait in the parquet crate and then users would provide implementations

For this part, you may review trait PageCacheStrategy in page_cache.rs and see if it meets your needs.

alamb

Could you please add some tests for this code?

I think you should be able to follow the model here:

https://github.com/apache/arrow-rs/blob/main/parquet/tests/arrow_reader/io/async_reader.rs

https://github.com/apache/arrow-rs/blob/main/parquet/tests/arrow_reader/io/sync_reader.rs

alamb · 2025-09-10T19:12:56Z

parquet/Cargo.toml

@@ -52,6 +52,7 @@ parquet-variant-compute = { workspace = true, optional = true }
 object_store = { version = "0.12.0", default-features = false, optional = true }

 bytes = { version = "1.1", default-features = false, features = ["std"] }
+moka = { version = "0.12", default-features = false, features = ["sync"] }


Rather than implement the cache directly in the parquet crate, I wonder if we could add a trait in the parquet crate and then users would provide implementations 🤔

XiangpengHao · 2025-09-11T04:34:28Z

Hi @123789456ye -- just to clarify, is the idea that this cache is mainly for predicate cache or more of a general purpose Parquet page cache?

If it's for predicate cache, we already have fully decoded arrow data cache in #7850, which should take care of avoid extra IO and decoding.

If it's for cache across different queries, my sense is that OS page cache usually handles that pretty well. Storing decompressed Parquet page feels like a pretty specific design, so it might worth disucssing the trade-offs, e.g., overhead, complexity, whether user can do it by themselves etc.

123789456ye · 2025-09-11T16:28:37Z

Thank you @XiangpengHao for your review.
The original motivation of this is to use it in getting from remote resource(e.g. object storage), therefore we need some cache.

If it's for predicate cache, we already have fully decoded arrow data cache in #7850, which should take care of avoid extra IO and decoding.

This is designed for a general reading purpose. And yes we have noticed that work. I should have remebered to split out output phase from pushdown phase. However I somehow missed in the levels of readers.
And when I write tests, I find out that current implementation will also influence pushdown phase. I am thinking of how to split them. (Or maybe no need to split?)

If it's for cache across different queries, my sense is that OS page cache usually handles that pretty well. Storing decompressed Parquet page feels like a pretty specific design, so it might worth disucssing the trade-offs, e.g., overhead, complexity, whether user can do it by themselves etc.

Of course these should be carefully discussed. Though I think OS page cache serves for different parts. IMO, the OS page cache should cache raw bytes(i.e. compressed pages), and this cache shall cache decompressed pages.

XiangpengHao · 2025-09-11T22:37:29Z

Got it, thank you for clarifying @123789456ye !

In a Parquet → Arrow pipeline, I usually think of it in four steps:

device -> raw parquet bytes in memory  -> uncompressed bytes in memory -> Arrow

Each of the step can take a significant amount of time, and may warrant a cache. Personally, I'd lean towards keeping things flexible so that users can plug in the caching they need, rather than baking a specific policy directly into the parquet crate.

For example, @alamb is working on push decoder, which will make the step 1 very easy -- any end user can decide how/where to feed the required bytes.

Step 3 is a bit tricky because Parquet to Arrow is very non-trivial (but probably still doable).

Step 2 is what this PR tackles.

So my hope is that we can evolve the API in a direction where downstream users have the hooks (maybe traits?) to implement their own cache strategies, instead of locking in a particular approach inside the crate.

Hope this helps!

ethe · 2025-09-13T05:25:08Z

Each of the step can take a significant amount of time, and may warrant a cache.

This work is sponsored by Tonbo. Given the immutability of Parquet/Arrow, it would be very helpful in real-world projects if users could use caching to avoid as much computation (decompression, deserialization, etc.) and I/O as possible. That’s why we are looking for this feature. Unfortunately, for external users, the only cache level currently available is the raw bytes of a Parquet file.

I agree that the current implementation of arrow-rs lacks APIs for users to hook into caching. Do you think it would be meaningful to push forward a discussion or draft proposal for such an API?

XiangpengHao · 2025-09-13T15:49:49Z

it would be very helpful in real-world projects if users could use caching to avoid as much computation (decompression, serialization, etc.) and I/O as possible.

💯, I totally agree, almost everyone wants a cache for these computation.

Do you think it would be meaningful to push forward a discussion or draft proposal for such an API?

Yea, that's my hope. While everyone wants a cache, they demand different policies depends on their data and query pattern; I think it's valuable if we can have a set of APIs that can easily allow users to plugin their own policies/caching mechanisms.

maybe @alamb also has some opinions on this

123789456ye · 2025-09-16T11:16:02Z

Remove the concrete implementations. Add some tests based on the visualization I/O.
Though the split out of predicate and output phase is not decided, and the test therefore show the result of both.

Currently the way of using page level cache is as followes:

First you should have a page strategy that impl PageCacheStrategy.
Then before any reader built, run ParquetContext::set_cache(page_cache: Option<Arc<dyn PageCacheStrategy>>) to setup global cache.

Maybe we can expand this cache to other stages and levels, but I think currently we can first push forward this.

123789456ye · 2025-09-16T11:23:26Z

Re-request review. What do you think about the API design, or any other things?

123789456ye added 6 commits August 28, 2025 16:29

feat: support page level cache for reading

412c28e

refactor: move PAGE_CACHE to AsyncReader and support plug-in cache al…

5721c44

…gorithm

fix: pass test

531424c

chore

53b05f0

chore

2f47d37

Merge remote-tracking branch 'origin/main' into myfork/main

37c92d4

github-actions bot added the parquet Changes to the parquet crate label Sep 10, 2025

fmt

eff6f6f

alamb reviewed Sep 10, 2025

View reviewed changes

fix: pass CI

79f56da

test: add test for page cache

30667cf

123789456ye requested a review from alamb September 16, 2025 11:22

[Parquet] Support page level cache for reading #8306

Are you sure you want to change the base?

[Parquet] Support page level cache for reading #8306

Uh oh!

Conversation

123789456ye commented Sep 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

123789456ye commented Sep 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

123789456ye commented Sep 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alamb commented Sep 10, 2025

Uh oh!

alamb Sep 10, 2025

Choose a reason for hiding this comment

Uh oh!

alamb Sep 10, 2025

Choose a reason for hiding this comment

Uh oh!

123789456ye Sep 11, 2025

Choose a reason for hiding this comment

Uh oh!

123789456ye Sep 11, 2025

Choose a reason for hiding this comment

Uh oh!

alamb left a comment

Choose a reason for hiding this comment

Uh oh!

alamb Sep 10, 2025

Choose a reason for hiding this comment

Uh oh!

XiangpengHao commented Sep 11, 2025

Uh oh!

123789456ye commented Sep 11, 2025

Uh oh!

XiangpengHao commented Sep 11, 2025

Uh oh!

ethe commented Sep 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

XiangpengHao commented Sep 13, 2025

Uh oh!

123789456ye commented Sep 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

123789456ye commented Sep 16, 2025

Uh oh!

Uh oh!

123789456ye commented Sep 10, 2025 •

edited

Loading

123789456ye commented Sep 10, 2025 •

edited

Loading

123789456ye commented Sep 10, 2025 •

edited

Loading

ethe commented Sep 13, 2025 •

edited

Loading

123789456ye commented Sep 16, 2025 •

edited

Loading