-
Notifications
You must be signed in to change notification settings - Fork 1.6k
fix(parquet): write single file if option is set #17009
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
hknlof
wants to merge
21
commits into
apache:main
Choose a base branch
from
hknlof:fix/write-single-parquet
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
1c05032
to
f1d58c5
Compare
I restarted the checks |
…ec of the correct size (apache#16995) * apache#16994 Ensure CooperativeExec#maintains_input_order returns a Vec of the correct size * apache#16994 Extend default ExecutionPlan invariant checks Add checks that verify the length of the vectors returned by methods that need to return a value per child.
* fix error result in execute&pre_selection * fix clippy * Optimize implementation * more efficiency impl * fix CI
) * Docs: Update the crate configuration / build settings page * Update docs/source/user-guide/crate-configuration.md Co-authored-by: Oleks V <[email protected]> --------- Co-authored-by: Oleks V <[email protected]>
`ScalarValue` can be made into from a `&str`, `Option<&str>` and `String`. `Option<String>` was a missing alternative.
…onfigOptions` on each query (apache#16970) * Add `ConfigOptions` to ExecutionProps when execution is started * Add ConfigOptions to ScalarFunctionArgs, refactor AsyncScalarUDF.invoke_async_with_args to remove ConfigOptions arg. * Updated OptimizerConfig.options() -> Arc<ConfigOptions> to eliminate clone() calls. Fixed an issue with SimplifyExpressions.rewrite(..) not adding config options to execution_props. Added test to verify it works * Test update. * Add note in upgrade guide * Use Arc all the way down * start_execution -> mark_start_execution * Update datafusion/expr/src/execution_props.rs Co-authored-by: Andrew Lamb <[email protected]> * Update comments * Avoid API breakage via #deprecated * Update upgrade guide for Arc<ConfigOptions> change * Apply suggestions from code review * fmt --------- Co-authored-by: Andrew Lamb <[email protected]>
Bumps [tokio-util](https://github.com/tokio-rs/tokio) from 0.7.15 to 0.7.16. - [Release notes](https://github.com/tokio-rs/tokio/releases) - [Commits](tokio-rs/tokio@tokio-util-0.7.15...tokio-util-0.7.16) --- updated-dependencies: - dependency-name: tokio-util dependency-version: 0.7.16 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…alarUDFImpl::invoke_with_args` (apache#16902) * Change AsyncScalarUDFImpl::invoke_async_with_args return type to ColumnarValue * fix docs * cargo fmt * cargo clippy * Add a note in the upgrade guide * Fix merge blunder --------- Co-authored-by: Andrew Lamb <[email protected]>
* feat(spark): implement spark array function array Signed-off-by: Alan Tang <[email protected]> * chore: add license header Signed-off-by: Alan Tang <[email protected]> * chore: fix clippy error Signed-off-by: Alan Tang <[email protected]> * feat: add with_list_field_name method and more tests Signed-off-by: Alan Tang <[email protected]> * feat: add name field to SparkArray structure Signed-off-by: Alan Tang <[email protected]> * chore: hardcode field name Signed-off-by: Alan Tang <[email protected]> * chore: fix clippy error Signed-off-by: Alan Tang <[email protected]> --------- Signed-off-by: Alan Tang <[email protected]>
* Use get_slice_memory_size() instead of get_array_memory_size() for measuring array_agg accumulator size * Add comment explaining the rationale for using `.get_slice_memory_size()`
…pache#17003) * Support centroids config for `approx_percentile_cont_with_weight` * Match two functions' signature * Update docs * Address comments and unify centroids config
Sorry for the delay -- this PR seems to have quite. few conflicts. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
common
Related to common crate
core
Core DataFusion crate
datasource
Changes to the datasource crate
development-process
Related to development process of DataFusion
documentation
Improvements or additions to documentation
execution
Related to the execution crate
ffi
Changes to the ffi crate
functions
Changes to functions implementation
logical-expr
Logical plan and expressions
optimizer
Optimizer rules
physical-expr
Changes to the physical-expr crates
physical-plan
Changes to the physical-plan crate
proto
Related to proto crate
spark
sql
SQL Planner
sqllogictest
SQL Logic Tests (.slt)
substrait
Changes to the substrait crate
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
#13323
Which issue does this PR close?
DataFrameWriteOptions::with_single_file_output
produces a directory #13323Rationale for this change
DF.write_parquet
writes multiple files / one directory even ifoptions.single_file_output
is set.What changes are included in this PR?
Introduce an internal
.single
extension.Are these changes tested?
Yes, tests are part of this PR.
Are there any user-facing changes?
Not in this implementation. There might be, if we decide to move to an
FileSinkConfig
based solution.Quoting: #13323 (comment)