Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 26 additions & 0 deletions dev/update_config_docs.sh
Original file line number Diff line number Diff line change
Expand Up @@ -149,6 +149,32 @@ SET datafusion.execution.target_partitions = '1';

[`ListingTable`]: https://docs.rs/datafusion/latest/datafusion/datasource/listing/struct.ListingTable.html

## Memory-limited Queries

When executing a memory-consuming query under a tight memory limit, DataFusion
will spill intermediate results to disk.

When the [`FairSpillPool`] is used, memory is divided evenly among partitions.
The higher the value of `datafusion.execution.target_partitions`, the less memory
is allocated to each partition, and the out-of-core execution path may trigger
more frequently, possibly slowing down execution.

Additionally, while spilling, data is read back in `datafusion.execution.batch_size` size batches.
The larger this value, the fewer spilled sorted runs can be merged. Decreasing this setting
can help reduce the number of subsequent spills required.

In conclusion, for queries under a very tight memory limit, it's recommended to
set `target_partitions` and `batch_size` to smaller values.

```sql
-- Query still gets paralleized, but each partition will have more memory to use
SET datafusion.execution.target_partitions = 4;
-- Smaller than the default '8192', while still keep the benefit of vectorized execution
SET datafusion.execution.batch_size = 1024;
```

[`FairSpillPool`]: https://docs.rs/datafusion/latest/datafusion/execution/memory_pool/struct.FairSpillPool.html

EOF


Expand Down
26 changes: 26 additions & 0 deletions docs/source/user-guide/configs.md
Original file line number Diff line number Diff line change
Expand Up @@ -221,3 +221,29 @@ SET datafusion.execution.target_partitions = '1';
```

[`listingtable`]: https://docs.rs/datafusion/latest/datafusion/datasource/listing/struct.ListingTable.html

## Memory-limited Queries

When executing a memory-consuming query under a tight memory limit, DataFusion
will spill intermediate results to disk.

When the [`FairSpillPool`] is used, memory is divided evenly among partitions.
The higher the value of `datafusion.execution.target_partitions`, the less memory
is allocated to each partition, and the out-of-core execution path may trigger
more frequently, possibly slowing down execution.

Additionally, while spilling, data is read back in `datafusion.execution.batch_size` size batches.
The larger this value, the fewer spilled sorted runs can be merged. Decreasing this setting
can help reduce the number of subsequent spills required.

In conclusion, for queries under a very tight memory limit, it's recommended to
set `target_partitions` and `batch_size` to smaller values.

```sql
-- Query still gets paralleized, but each partition will have more memory to use
SET datafusion.execution.target_partitions = 4;
-- Smaller than the default '8192', while still keep the benefit of vectorized execution
SET datafusion.execution.batch_size = 1024;
```

[`fairspillpool`]: https://docs.rs/datafusion/latest/datafusion/execution/memory_pool/struct.FairSpillPool.html