Skip to content

Commit 8dc2391

Browse files
committed
Add comment explaining the rationale for using .get_slice_memory_size()
1 parent ac407a1 commit 8dc2391

File tree

1 file changed

+11
-0
lines changed

1 file changed

+11
-0
lines changed

datafusion/functions-aggregate/src/array_agg.rs

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -372,6 +372,17 @@ impl Accumulator for ArrayAggAccumulator {
372372
+ self
373373
.values
374374
.iter()
375+
// Each ArrayRef might be just a reference to a bigger array, and many
376+
// ArrayRefs here might be referencing exactly the same array, so if we
377+
// were to call `arr.get_array_memory_size()`, we would be double-counting
378+
// the same underlying data many times.
379+
//
380+
// Instead, we do an approximation by estimating how much memory each
381+
// ArrayRef would occupy if its underlying data was fully owned by this
382+
// accumulator.
383+
//
384+
// Note that this is just an estimation, but the reality is that this
385+
// accumulator might not own any data.
375386
.map(|arr| arr.to_data().get_slice_memory_size().unwrap_or_default())
376387
.sum::<usize>()
377388
+ self.datatype.size()

0 commit comments

Comments
 (0)