Skip to content

Conversation

@rfratto
Copy link
Member

@rfratto rfratto commented Jul 31, 2025

Previously, readerDownloader would retrieve page information for projected columns sequentially. This can result in a very high number of roundtrips to object storage: for a streams section with 160 columns, 160 roundtrips at 10ms each would add 1.6s of latency to a scan.

This updates readerDownloader to lazily download all page headers at once for all projected columns, once the first page header is requested.

Previously, readerDownloader would retrieve page information for
projected columns sequentially. This can result in a very high number of
roundtrips to object storage: for a streams section with 160 columns,
160 roundtrips at 10ms each would add 1.6s of latency to a scan.

This updates readerDownloader to lazily download all page headers at
once for all projected columns, once the first page header is requested.
@rfratto rfratto requested a review from a team as a code owner July 31, 2025 13:43
@rfratto rfratto merged commit a5c722c into main Jul 31, 2025
66 checks passed
@rfratto rfratto deleted the dataobj-dataset-batched-page-metadata-downloads branch July 31, 2025 14:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants