Skip to content

Commit 2b3a5e9

Browse files
doc: update memory tuning guide (#1394)
## Which issue does this PR close? Closes #1388 ## Rationale for this change Following up on #1369 and #1386 ## What changes are included in this PR? Updated the doc ## How are these changes tested?
1 parent 527cb57 commit 2b3a5e9

File tree

1 file changed

+23
-1
lines changed

1 file changed

+23
-1
lines changed

docs/source/user-guide/tuning.md

Lines changed: 23 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,21 @@ Comet provides some tuning options to help you get the best performance from you
2828
The recommended way to share memory between Spark and Comet is to set `spark.memory.offHeap.enabled=true`. This allows
2929
Comet to share an off-heap memory pool with Spark. The size of the pool is specified by `spark.memory.offHeap.size`. For more details about Spark off-heap memory mode, please refer to Spark documentation: https://spark.apache.org/docs/latest/configuration.html.
3030

31+
The type of pool can be specified with `spark.comet.exec.memoryPool`.
32+
33+
The valid pool types are:
34+
35+
- `unified` (default when `spark.memory.offHeap.enabled=true` is set)
36+
- `fair_unified`
37+
38+
The `unified` pool type implements a greedy first-come first-serve limit. This pool works well for queries that do not
39+
need to spill or have a single spillable operator.
40+
41+
The `fair_unified` pool type prevents operators from using more than an even fraction of the available memory
42+
(i.e. `pool_size / num_reservations`). This pool works best when you know beforehand
43+
the query has multiple operators that will likely all need to spill. Sometimes it will cause spills even
44+
when there is sufficient memory in order to leave enough memory for other operators.
45+
3146
### Dedicated Comet Memory Pools
3247

3348
Spark uses on-heap memory mode by default, i.e., the `spark.memory.offHeap.enabled` setting is not enabled. If Spark is under on-heap memory mode, Comet will use its own dedicated memory pools that
@@ -48,6 +63,7 @@ The valid pool types are:
4863
- `fair_spill`
4964
- `fair_spill_global`
5065
- `fair_spill_task_shared`
66+
- `unbounded`
5167

5268
Pool types ending with `_global` use a single global memory pool between all tasks on same executor.
5369

@@ -61,13 +77,19 @@ pool works well for queries that do not need to spill or have a single spillable
6177

6278
The `fair_spill*` pool types use DataFusion's [FairSpillPool], which prevents spillable reservations from using more
6379
than an even fraction of the available memory sans any unspillable reservations
64-
(i.e. `(pool_size - unspillable_memory) / num_spillable_reservations)`). This pool works best when you know beforehand
80+
(i.e. `(pool_size - unspillable_memory) / num_spillable_reservations`). This pool works best when you know beforehand
6581
the query has multiple spillable operators that will likely all need to spill. Sometimes it will cause spills even
6682
when there was sufficient memory (reserved for other operators) to avoid doing so. Unspillable memory is allocated in
6783
a first-come, first-serve fashion
6884

85+
The `unbounded` pool type uses DataFusion's [UnboundedMemoryPool], which enforces no limit. This option is useful for
86+
development/testing purposes, where there is no room to allow spilling and rather choose to fail the job.
87+
Spilling significantly slows down the job and this option is one way to measure the best performance scenario without
88+
adjusting how much memory to allocate.
89+
6990
[GreedyMemoryPool]: https://docs.rs/datafusion/latest/datafusion/execution/memory_pool/struct.GreedyMemoryPool.html
7091
[FairSpillPool]: https://docs.rs/datafusion/latest/datafusion/execution/memory_pool/struct.FairSpillPool.html
92+
[UnboundedMemoryPool]: https://docs.rs/datafusion/latest/datafusion/execution/memory_pool/struct.UnboundedMemoryPool.html
7193

7294

7395
### Determining How Much Memory to Allocate

0 commit comments

Comments
 (0)