Skip to content

Conversation

timsaucer
Copy link
Contributor

@timsaucer timsaucer commented Aug 26, 2025

Which issue does this PR close?

Rationale for this change

For some users, adding in sqlparser greatly increases build times for their projects even though they are not using sql. This feature makes sql an optional dependency to improve build times and reduce the number of dependencies.

What changes are included in this PR?

Adds in a sql feature in the core crate and makes sqlparser an optional dependency. The rest of the changes are to account for the fact that some of the structs in sqlparser are used throughout the repository.

Are these changes tested?

Ideally this is effectively a no-op change. By enabling sql (or simply using the default features) there is no change and all tests should run as before. I have also run all tests with the --no-default-features flag.

Are there any user-facing changes?

There is one change that may be impactful to some users. By adding in the sql feature, if users are not using default features of the datafusion core crate, they will need to add sql to their Cargo.toml.

Additional context

When building on a mac m4, running a cargo clean and then checking time and build size these are my local results for just building the core datafusion library:

cargo build --release -p datafusion --lib --no-default-features  374.93s user 11.45s system 800% cpu 48.276 total
-rw-r--r--@ 1 tsaucer  staff    10M Aug 27 08:15 target/release/libdatafusion.rlib

cargo build --release -p datafusion --lib --no-default-features --features sql  453.58s user 12.40s system 761% cpu 1:01.21 total
-rw-r--r--@ 1 tsaucer  staff    17M Aug 27 08:17 target/release/libdatafusion.rlib

cargo build -p datafusion --lib --no-default-features  172.85s user 13.35s system 637% cpu 29.189 total
-rw-r--r--@ 1 tsaucer  staff    49M Aug 27 08:20 target/debug/libdatafusion.rlib

cargo build -p datafusion --lib --no-default-features --features sql  210.16s user 14.26s system 471% cpu 47.625 total
-rw-r--r--@ 1 tsaucer  staff    91M Aug 27 08:19 target/debug/libdatafusion.rlib

Net results:

  • Release mode without SQL: 48.276 seconds, 10M library
  • Release mode with SQL: 61.21 seconds, 17M library
  • Debug mode without SQL: 29.189 seconds, 49M library
  • Debug mode with SQL: 47.625 seconds, 91M library

The absolute time numbers aren't so bad when you build just this library by itself, but as part of larger projects it has a compounding effect.

@timsaucer timsaucer self-assigned this Aug 26, 2025
@github-actions github-actions bot added sql SQL Planner logical-expr Logical plan and expressions physical-expr Changes to the physical-expr crates optimizer Optimizer rules core Core DataFusion crate catalog Related to the catalog crate common Related to common crate execution Related to the execution crate proto Related to proto crate physical-plan Changes to the physical-plan crate labels Aug 26, 2025
@github-actions github-actions bot added development-process Related to development process of DataFusion substrait Changes to the substrait crate labels Aug 26, 2025
@timsaucer timsaucer marked this pull request as ready for review August 27, 2025 11:18
@mbutrovich mbutrovich self-requested a review August 27, 2025 15:05
@crepererum
Copy link
Contributor

I've adjusted the description because I think that also closes #17258 🥳

@@ -579,7 +579,7 @@ jobs:
with:
rust-version: stable
- name: Run datafusion-common tests
run: cargo test --profile ci -p datafusion-common --features=pyarrow
run: cargo test --profile ci -p datafusion-common --features=pyarrow,sql
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we extend the CI to also verify that ALL crates at least cargo check with and without sql support? This YAML file has a few examples/listings of feature tests with cargo check.

@mbutrovich
Copy link
Contributor

I was playing with this PR this morning to see if Comet could use it, and it looks like datafusion-sql is brought in with the nested_expressions feature (which Comet uses). I wonder if we can disentangle that dependency.

@findepi
Copy link
Member

findepi commented Sep 4, 2025

❤️ ❤️ ❤️ ❤️ ❤️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
catalog Related to the catalog crate common Related to common crate core Core DataFusion crate development-process Related to development process of DataFusion execution Related to the execution crate logical-expr Logical plan and expressions optimizer Optimizer rules physical-expr Changes to the physical-expr crates physical-plan Changes to the physical-plan crate proto Related to proto crate sql SQL Planner substrait Changes to the substrait crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make sqlparser dependency optional RFC: Add features to reduce dependencies on core crate
4 participants