-
Notifications
You must be signed in to change notification settings - Fork 1k
[Parquet] Support page level cache for reading #8306
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
I set default cache capacity to 100, which means we can cache 100 pages, using slightly more than 100 MB memory. I run Run in local environment, Ubuntu 22.04 LTS in WSL2. Results as followes. All time in table is median time. And we think it is a very delightful result.
|
Basically we are using memory to exchange for bypassing decompressing and decoding. The great result partly comes from the concurrency of bench test, that will read one file across multiple readers. I have also tested to use a reader-level caching, but unfortunately all performance will regress. Therefore we can only maintain a global cache. |
Thank you for this @123789456ye -- I have started the CI checks on this PR Perhaps @XiangpengHao also has some time to review this |
parquet/Cargo.toml
Outdated
@@ -52,6 +52,7 @@ parquet-variant-compute = { workspace = true, optional = true } | |||
object_store = { version = "0.12.0", default-features = false, optional = true } | |||
|
|||
bytes = { version = "1.1", default-features = false, features = ["std"] } | |||
moka = { version = "0.12", default-features = false, features = ["sync"] } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, we have tried to keep the dependnecy tree relatively small for paruqet -- this one seems to be a significant addition: https://crates.io/crates/moka/0.12.10/dependencies
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than implement the cache directly in the parquet crate, I wonder if we could add a trait in the parquet crate and then users would provide implementations 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I totally agree that we should not introduce it. And this part is easy to change.
But I haven't consider not to implement cache. If we don't provide a default implementation, isn't it messy to write a custom implementation each time if we want to use it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than implement the cache directly in the parquet crate, I wonder if we could add a trait in the parquet crate and then users would provide implementations
For this part, you may review trait PageCacheStrategy
in page_cache.rs and see if it meets your needs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please add some tests for this code?
I think you should be able to follow the model here:
https://github.com/apache/arrow-rs/blob/main/parquet/tests/arrow_reader/io/async_reader.rs
https://github.com/apache/arrow-rs/blob/main/parquet/tests/arrow_reader/io/sync_reader.rs
parquet/Cargo.toml
Outdated
@@ -52,6 +52,7 @@ parquet-variant-compute = { workspace = true, optional = true } | |||
object_store = { version = "0.12.0", default-features = false, optional = true } | |||
|
|||
bytes = { version = "1.1", default-features = false, features = ["std"] } | |||
moka = { version = "0.12", default-features = false, features = ["sync"] } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than implement the cache directly in the parquet crate, I wonder if we could add a trait in the parquet crate and then users would provide implementations 🤔
Hi @123789456ye -- just to clarify, is the idea that this cache is mainly for predicate cache or more of a general purpose Parquet page cache? If it's for predicate cache, we already have fully decoded arrow data cache in #7850, which should take care of avoid extra IO and decoding. If it's for cache across different queries, my sense is that OS page cache usually handles that pretty well. Storing decompressed Parquet page feels like a pretty specific design, so it might worth disucssing the trade-offs, e.g., overhead, complexity, whether user can do it by themselves etc. |
Thank you @XiangpengHao for your review.
This is designed for a general reading purpose. And yes we have noticed that work. I should have remebered to split out output phase from pushdown phase. However I somehow missed in the levels of readers.
Of course these should be carefully discussed. Though I think OS page cache serves for different parts. IMO, the OS page cache should cache raw bytes(i.e. compressed pages), and this cache shall cache decompressed pages. |
Got it, thank you for clarifying @123789456ye ! In a Parquet → Arrow pipeline, I usually think of it in four steps:
Each of the step can take a significant amount of time, and may warrant a cache. Personally, I'd lean towards keeping things flexible so that users can plug in the caching they need, rather than baking a specific policy directly into the parquet crate. For example, @alamb is working on push decoder, which will make the step 1 very easy -- any end user can decide how/where to feed the required bytes. Step 3 is a bit tricky because Parquet to Arrow is very non-trivial (but probably still doable). Step 2 is what this PR tackles. So my hope is that we can evolve the API in a direction where downstream users have the hooks (maybe traits?) to implement their own cache strategies, instead of locking in a particular approach inside the crate. Hope this helps! |
This work is sponsored by Tonbo. Given the immutability of Parquet/Arrow, it would be very helpful in real-world projects if users could use caching to avoid as much computation (decompression, deserialization, etc.) and I/O as possible. That’s why we are looking for this feature. Unfortunately, for external users, the only cache level currently available is the raw bytes of a Parquet file. I agree that the current implementation of arrow-rs lacks APIs for users to hook into caching. Do you think it would be meaningful to push forward a discussion or draft proposal for such an API? |
💯, I totally agree, almost everyone wants a cache for these computation.
Yea, that's my hope. While everyone wants a cache, they demand different policies depends on their data and query pattern; I think it's valuable if we can have a set of APIs that can easily allow users to plugin their own policies/caching mechanisms. maybe @alamb also has some opinions on this |
Remove the concrete implementations. Add some tests based on the visualization I/O. Currently the way of using page level cache is as followes:
Maybe we can expand this cache to other stages and levels, but I think currently we can first push forward this. |
Re-request review. What do you think about the API design, or any other things? |
Which issue does this PR close?
Rationale for this change
We have a thought of introducing a page-level cache long ago. Though previously we can only read the whole rowgroup.
Now we can implement it. The predicate part has been implemented, and output part is left for this PR.
What changes are included in this PR?
The root part is to introduce page-level cache mechanism in
decode_page
inimpl RowGroupReader for SerializedRowGroupReader
.Only effective for async readers. Nearly zero overhead for sync readers.
The cache mechanism is using moka crate. This part is plug-in if we need to change.
Are these changes tested?
I run
cargo test
andcargo test --features=arrow,async
.All tests pass.
Are there any user-facing changes?
No.