-
Notifications
You must be signed in to change notification settings - Fork 0
Wire up ConfigOptions #34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wire up ConfigOptions #34
Conversation
Bumps [tokio](https://github.com/tokio-rs/tokio) from 1.45.1 to 1.46.0. - [Release notes](https://github.com/tokio-rs/tokio/releases) - [Commits](tokio-rs/tokio@tokio-1.45.1...tokio-1.46.0) --- updated-dependencies: - dependency-name: tokio dependency-version: 1.46.0 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…pt limit pushdown (apache#16641) Co-authored-by: Adrian Garcia Badaracco <[email protected]>
…16615) * Convert Option<Vec<sort expression>> to Vec<sort expression> * clippy * fix comment * fix doc * change back to Expr * remove redundant check
* Add an example of embedding indexes inside a parquet file * Add page image * Add prune file example * Fix clippy * polish code * Fmt * address comments * Add debug * Add new example, but it will fail with page index * add debug * add debug * polish * debug * Using low level API to support * polish * fix * merge * fix * complte solution * polish comments * adjust image * add comments part 1 * pin to new arrow-rs * pin to new arrow-rs * add comments part 2 * merge upstream * merge upstream * polish code * Rename example and add it to the list * Work on comments * More documentation * Documentation obession, encapsulate example * Update datafusion-examples/examples/parquet_embedded_index.rs Co-authored-by: Sherin Jacob <[email protected]> --------- Co-authored-by: Andrew Lamb <[email protected]> Co-authored-by: Sherin Jacob <[email protected]>
* Implementation for regex_instr * linting and typo addressed in bench * prettier formatting * scalar_functions_formatting * linting format macros * formatting * address comments to PR * formatting * clippy * fmt * address docs typo * remove unnecessary struct and comment * delete redundant lines add tests for subexp correct function signature for benches * refactor get_index * comments addressed * update doc * clippy upgrade --------- Co-authored-by: Nirnay Roy <[email protected]> Co-authored-by: Andrew Lamb <[email protected]> Co-authored-by: Dmitrii Blaginin <[email protected]>
…nts (apache#16672) - Refactored the `DataFusionError` enum to use `Box<T>` for: - `ArrowError` - `ParquetError` - `AvroError` - `object_store::Error` - `ParserError` - `SchemaError` - `JoinError` - Updated all relevant match arms and constructors to handle boxed errors. - Refactored error-related macros (`arrow_datafusion_err!`, `sql_datafusion_err!`, etc.) to use `Box<T>`. - Adjusted test cases and error assertions for boxed variants. - Documentation update to the upgrade guide to explain the required changes and rationale.
…on and Mapping (apache#16583) - Introduced a new `schema_adapter_factory` field in `ListingTableConfig` and `ListingTable` - Added `with_schema_adapter_factory` and `schema_adapter_factory()` methods to both structs - Modified execution planning logic to apply schema adapters during scanning - Updated statistics collection to use mapped schemas - Implemented detailed documentation and example usage in doc comments - Added new unit and integration tests validating schema adapter behavior and error cases
* Reuse Rows in RowCursorStream * WIP * Fmt * Add comment, make it backwards compatible * Add comment, make it backwards compatible * Add comment, make it backwards compatible * Clippy * Clippy * Return error on non-unique reference * Comment * Update datafusion/physical-plan/src/sorts/stream.rs Co-authored-by: Oleks V <[email protected]> * Fix * Extract logic * Doc fix --------- Co-authored-by: Oleks V <[email protected]>
apache#16630) * Perf: fast CursorValues compare for StringViewArray using inline_key_fast * fix * polish * polish * add test --------- Co-authored-by: Daniël Heres <[email protected]>
One step towards apache#16652. Co-authored-by: Oleks V <[email protected]> Co-authored-by: Andrew Lamb <[email protected]>
Co-authored-by: Daniël Heres <[email protected]>
* Refactor StreamJoinMetrics to reuse BaselineMetrics Signed-off-by: Alan Tang <[email protected]> * use the record_poll method to update output rows Signed-off-by: Alan Tang <[email protected]> --------- Signed-off-by: Alan Tang <[email protected]>
* Remove unused AggregateUDF struct * Fix docs --------- Co-authored-by: Andrew Lamb <[email protected]>
…che#16500) * chore: refactor `BuildProbeJoinMetrics` to use `BaselineMetrics` Closes apache#16495 Here's an example of an `explain analyze` of a hash join showing these metrics: ``` [(WatchID@0, WatchID@0)], metrics=[output_rows=100, elapsed_compute=2.313624ms, build_input_batches=1, build_input_rows=100, input_batches=1, input_rows=100, output_batches=1, build_mem_used=3688, build_time=865.832µs, join_time=1.369875ms] ``` Notice `output_rows=100, elapsed_compute=2.313624ms` in the above. * test: add checks for join metrics in tests * fix: add record_poll to ExhaustedProbeSide for nested_loop_join This was needed because ExhaustedProbeSide state can also return output rows - in certain types of joins. Without this, the output_rows metric for nested loop join was wrong!
* Use compression type in file suffices - Add FileFormat::compression_type method - Specify meaningful values for CSV only - Use compression type as a part of extension for files * Add CSV tests * Add glob dep, use env logging * Use a glob pattern with compression suffix for TableProviderFactory * Conform to clippy standards --------- Co-authored-by: Andrew Lamb <[email protected]>
* Refactor SortMergeJoinMetrics to reuse BaselineMetrics Signed-off-by: Alan Tang <[email protected]> * use record_poll method to update output_rows Signed-off-by: Alan Tang <[email protected]> * chore: Replace replace_poll with replace_output Signed-off-by: Alan Tang <[email protected]> --------- Signed-off-by: Alan Tang <[email protected]>
* Add support for Arrow Dictionary type in Substrait This commit adds support for the Arrow Dictionary type in Substrait plans. Resolves apache#16273 * Add more specific type variation consts
* fix sqllogictest condition mismatch * Update test file condition * revert changes in sqllogictests --------- Co-authored-by: Leon Lin <[email protected]>
…ring physical planning (apache#16454) * Fix duplicates on Join creation during physcial planning * Add Substrait reproducer * Better error message & more doc * Handle case for right/left/full joins as well
--- updated-dependencies: - dependency-name: tokio dependency-version: 1.46.1 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* Add reproducer for tpch Q16 deserialization bug * Add small Parquet file with 20 rows from part table for testing * Apply suggestions from code review Co-authored-by: Andrew Lamb <[email protected]> * Make the test fail and ignore it until the bug is fixed per review comments * fix clippy --------- Co-authored-by: Andrew Lamb <[email protected]>
This commit refactors the filter pushdown infrastructure to improve flexibility, readability, and maintainability: ### Major Changes: - **Eliminated `PredicateSupports`** wrapper in favor of directly using `Vec<PredicateSupport>`, simplifying APIs. - Introduced **`ChildFilterDescription::from_child`** to encapsulate logic for determining filter pushdown eligibility per child. - Added **`FilterDescription::from_children`** to generate pushdown plans based on column references across all children. - Replaced legacy methods (`all_parent_filters_supported`, etc.) with more flexible, composable APIs using builder-style chaining. - Updated all relevant nodes (`FilterExec`, `SortExec`, `RepartitionExec`, etc.) to use the new pushdown planning structure. ### Functional Adjustments: - Ensured filter column indices are reassigned properly when filters are pushed to projected inputs (e.g., in `FilterExec`). - Standardized handling of supported vs. unsupported filters throughout the propagation pipeline. - Improved handling of self-filters in nodes such as `FilterExec` and `SortExec`. ### Optimizer Improvements: - Clarified pushdown phases (`Pre`, `Post`) and respected them across execution plans. - Documented the full pushdown lifecycle within `filter_pushdown.rs`, improving discoverability for future contributors. These changes lay the groundwork for more precise and flexible filter pushdown optimizations and improve the robustness of the optimizer infrastructure.
…6848) * feat(spark): Implement Spark luhn_check function Signed-off-by: Alan Tang <[email protected]> * test(spark): add more tests Signed-off-by: Alan Tang <[email protected]> * feat(spark): set the signature to be Utf8 type Signed-off-by: Alan Tang <[email protected]> * chore: add more types for luhn_check function Signed-off-by: Alan Tang <[email protected]> * test: add test for array input Signed-off-by: Alan Tang <[email protected]> --------- Signed-off-by: Alan Tang <[email protected]>
Bumps [aws-config](https://github.com/smithy-lang/smithy-rs) from 1.8.2 to 1.8.3. - [Release notes](https://github.com/smithy-lang/smithy-rs/releases) - [Changelog](https://github.com/smithy-lang/smithy-rs/blob/main/CHANGELOG.md) - [Commits](https://github.com/smithy-lang/smithy-rs/commits) --- updated-dependencies: - dependency-name: aws-config dependency-version: 1.8.3 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* Derive UDF equality from PartialEq, Hash Reduce boilerplate in cases where implementation of `{ScalarUDFImpl,AggregateUDFImpl,WindowUDFImpl}::{equals,hash_code}` can be derived using standard `PartialEq` and `Hash` traits. This is code complexity reduction. While valuable on its own, this also prepares for more automatic derivation of UDF equals/hash and/or removal of default implementations (which currently are error-prone). * udf_equals_hash example * test udf_equals_hash * empty: roll the dice 🎲
…che#16857) * Handle expression and value elements in Substrait VirtualTable * Added test * Modified test plan, changed conditional check for clarity * Consume expressions with empty input schema
* Update common.rs * Update common.rs
* feat(spark): implement Spark datetime function last_day Signed-off-by: Alan Tang <[email protected]> * chore: fix the export function name Signed-off-by: Alan Tang <[email protected]> * chore: Fix Cargo.toml formatting Signed-off-by: Alan Tang <[email protected]> * test: add more tests for spark function last_day Signed-off-by: Alan Tang <[email protected]> * feat(spark): set the signature to be taking exactly one Date32 type Signed-off-by: Alan Tang <[email protected]> * test(spark): add more bad cases Signed-off-by: Alan Tang <[email protected]> * chore: clean up redundant package Signed-off-by: Alan Tang <[email protected]> --------- Signed-off-by: Alan Tang <[email protected]>
* def-min-max * Update mod.rs
* Update interval_arithmetic.rs * Update interval_arithmetic.rs * Update interval_arithmetic.rs
…he#16897) * minor * Update aggregate.rs
…for `Decimal128` and `Decimal256` (apache#16831) * Add missing ScalarValue impls for large decimals Add methods distance, new_zero, new_one, new_ten for Decimal128, Decimal256 * Support expr simplication for Decimal256 * Replace lookup table with i128::pow * Support different scales for Decimal constructors - Allow to construct one and ten with different scales - Add tests for new_one, new_ten - Add test for distance * Revert "Replace lookup table with i128::pow" This reverts commit ba23e8c. * Use Arrow error directly
* Update value.rs * Update value.rs * Update value.rs * Update datafusion/physical-plan/src/metrics/value.rs Co-authored-by: Andrew Lamb <[email protected]> * Update datafusion/physical-plan/src/metrics/value.rs Co-authored-by: Andrew Lamb <[email protected]> * Update datafusion/physical-plan/src/metrics/value.rs Co-authored-by: Andrew Lamb <[email protected]> --------- Co-authored-by: Andrew Lamb <[email protected]>
* Update partial_sort.rs * Update partial_sort.rs * Update partial_sort.rs * add sql test
* fix: skip predicates on struct unnest in FilterPushdown * doc comments * fix
Signed-off-by: Ruihang Xia <[email protected]>
…ke_async_with_args to remove ConfigOptions arg.
* checkpoint: Address PR feedback in https://github.com/apach... * add FilteredVec to consolidate handling of filter / remap pattern
I am not sure why github decided that merging in main should be in the commit list but the changes I made were in 547bcac Note that I realized today that there are no tests covering the actual new functionality which I'll add this week |
(note this appears to be a PR on to my fork -- we should probably rebase / retarget to main in the apache repo) |
Yup. I don't know how to add to an existing PR in github (Since I am not a committer I doubt I can anyways). Best I know how to do is either submit a PR with both of our commits in it, have another PR that requires yours to be approved first, or you rebase your branch and pull in my commit. If there is another option that you would like to do let me know |
if you are a committer to the repo in question, you can just push commits to the remote fork (if the PR author has clicked the box "allow maintainers to edit PR")
I personally suggest you make a new PR on the apache repo that has whatever commits are needed and we'll close up these POCs |
Submitted PR to datafusion - apache#16970 |
* dissallow pushdown of volatile PhysicalExprs * fix * add FilteredVec helper to handle filter / remap pattern (#34) * checkpoint: Address PR feedback in https://github.com/apach... * add FilteredVec to consolidate handling of filter / remap pattern * lint * Add slt test for pushing volatile predicates down (apache#35) --------- Co-authored-by: Andrew Lamb <[email protected]>
Wire up config options using your POC. Tests pass locally.
Two main points: