-
Notifications
You must be signed in to change notification settings - Fork 1k
Description
Describe the bug
While working on #8470 I noticed that the API to report memory usage when encryption was used undercounts the actual memory used
ParquetMetaData::memory_size is used for memory accounting for in memory parquet caches, and thus should be accurate
To Reproduce
Specifically this function
arrow-rs/parquet/src/file/metadata/mod.rs
Lines 281 to 286 in b8ae8e0
| pub fn memory_size(&self) -> usize { | |
| std::mem::size_of::<Self>() | |
| + self.file_metadata.heap_size() | |
| + self.row_groups.heap_size() | |
| + self.column_index.heap_size() | |
| + self.offset_index.heap_size() |
Does not account for the heap allocations in the file_decryptor field:
arrow-rs/parquet/src/file/metadata/mod.rs
Line 191 in b8ae8e0
| file_decryptor: Option<FileDecryptor>, |
Expected behavior
ParquetMetaData::memory_size should report its actually heap allocation size (by implementing the HeapSize trait for FileDecryptor and all its subfields
Additional context