@@ -149,6 +149,32 @@ SET datafusion.execution.target_partitions = '1';
149
149
150
150
[`ListingTable`]: https://docs.rs/datafusion/latest/datafusion/datasource/listing/struct.ListingTable.html
151
151
152
+ ## Memory-limited Queries
153
+
154
+ When executing a memory-consuming query under a tight memory limit, DataFusion
155
+ will spill intermediate results to disk.
156
+
157
+ When the [`FairSpillPool`] is used, memory is divided evenly among partitions.
158
+ The higher the value of `datafusion.execution.target_partitions`, the less memory
159
+ is allocated to each partition, and the out-of-core execution path may trigger
160
+ more frequently, possibly slowing down execution.
161
+
162
+ Additionally, while spilling, data is read back in `datafusion.execution.batch_size` size batches.
163
+ The larger this value, the fewer spilled sorted runs can be merged. Decreasing this setting
164
+ can help reduce the number of subsequent spills required.
165
+
166
+ In conclusion, for queries under a very tight memory limit, it's recommended to
167
+ set `target_partitions` and `batch_size` to smaller values.
168
+
169
+ ```sql
170
+ -- Query still gets paralleized, but each partition will have more memory to use
171
+ SET datafusion.execution.target_partitions = 4;
172
+ -- Smaller than the default '8192', while still keep the benefit of vectorized execution
173
+ SET datafusion.execution.batch_size = 1024;
174
+ ```
175
+
176
+ [`FairSpillPool`]: https://docs.rs/datafusion/latest/datafusion/execution/memory_pool/struct.FairSpillPool.html
177
+
152
178
EOF
153
179
154
180
0 commit comments