-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Upgrade arrow/parquet to 56.0.0 #16690
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
6698376
to
1747605
Compare
1747605
to
fc5bd79
Compare
I see some failures in the row_group_pruning tests
Here is an example failure
|
1c82d05
to
76bc1b2
Compare
ok, I now have a clean run! |
Thank you @alamb , I am curious about the benchmark result comparing the main branch, because we will include the apache/arrow-rs#7850 for this PR. And some improvement part of the improvement we have ported to datafusion, but we will also benefit from the dependency changes from the arrow side, such as the sort phase(the merge compare we have ported)/compare, etc. Could we trigger the benchmark for this PR, thanks! |
🤖 |
Done! |
🤖: Benchmark completed Details
|
Thank you @alamb, it seems we have some improvement for clickbench. Not too much because we gain for sort string view mostly which is not in clickbench but in sort_tpch. |
I will start those as well |
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
Thank you @alamb , it seems no improvement for sort dependencies(So it means the merge will occupied most time), so we gain most for the ported PRs to merge phase from: |
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
Thank you @alamb @Dandandan , we may also try sort_tpch10 benchmark , but it may also not too much improvement, the ported PR already has 1.4x faster for sort_tpch Q11(inlined string view sort). |
🤖 |
🤖: Benchmark completed Details
|
🤖 |
🤖: Benchmark completed Details
|
Thank you @alamb the result is very good, we have further improvement for sort also besides the ported code in datafusion, even Q1 is benefit which is not StringView sort from the new partition null validation, and Q3 is benefit from the new GC for large string size. And the sort-tpch10 seems broken. 🤔 |
🤖 |
🤖: Benchmark completed Details
|
0a0c7d7
to
60b8577
Compare
This one is now ready for review |
/// default parquet writer setting | ||
/// max_statistics_size is deprecated, currently it is not being used | ||
// TODO: remove once deprecated | ||
#[deprecated(since = "45.0.0", note = "Setting does not do anything")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these were removed from the underlying parquet library as well
@@ -2192,6 +2191,16 @@ impl ScalarValue { | |||
} | |||
|
|||
let array: ArrayRef = match &data_type { | |||
DataType::Decimal32(_precision, _scale) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Decimal323 and Decimal64 types were added to arrow
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM Thank you @alamb ! The failed CI seems not related to this PR.
I merged in |
Thanks @adriangb and @zhuqi-lucas ! |
🚀 |
This seems to have caused some minor failures in |
There were some timeouts reported that may be related:
I restarted the test -- hopefully it will pass on the second time |
Which issue does this PR close?
56.0.0
(July 2025) arrow-rs#7395Rationale for this change
There are several non trivial changes in arrow 56 so I want to start testing soon
Also, I would like a stable base to test new parquet pushdown code from @XiangpengHao
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?