Skip to content

Commit b39fc94

Browse files
committed
Add comment explaining the rationale for using .get_slice_memory_size()
1 parent 83a20c5 commit b39fc94

File tree

1 file changed

+11
-0
lines changed

1 file changed

+11
-0
lines changed

datafusion/functions-aggregate/src/array_agg.rs

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -393,6 +393,17 @@ impl Accumulator for ArrayAggAccumulator {
393393
+ self
394394
.values
395395
.iter()
396+
// Each ArrayRef might be just a reference to a bigger array, and many
397+
// ArrayRefs here might be referencing exactly the same array, so if we
398+
// were to call `arr.get_array_memory_size()`, we would be double-counting
399+
// the same underlying data many times.
400+
//
401+
// Instead, we do an approximation by estimating how much memory each
402+
// ArrayRef would occupy if its underlying data was fully owned by this
403+
// accumulator.
404+
//
405+
// Note that this is just an estimation, but the reality is that this
406+
// accumulator might not own any data.
396407
.map(|arr| arr.to_data().get_slice_memory_size().unwrap_or_default())
397408
.sum::<usize>()
398409
+ self.datatype.size()

0 commit comments

Comments
 (0)