chore(dataobj): list pages across all columns in one request #18678

rfratto · 2025-07-31T13:43:54Z

Previously, readerDownloader would retrieve page information for projected columns sequentially. This can result in a very high number of roundtrips to object storage: for a streams section with 160 columns, 160 roundtrips at 10ms each would add 1.6s of latency to a scan.

This updates readerDownloader to lazily download all page headers at once for all projected columns, once the first page header is requested.

Previously, readerDownloader would retrieve page information for projected columns sequentially. This can result in a very high number of roundtrips to object storage: for a streams section with 160 columns, 160 roundtrips at 10ms each would add 1.6s of latency to a scan. This updates readerDownloader to lazily download all page headers at once for all projected columns, once the first page header is requested.

rfratto requested a review from a team as a code owner July 31, 2025 13:43

pull-request-size bot added the size/M label Jul 31, 2025

benclive approved these changes Jul 31, 2025

View reviewed changes

rfratto merged commit a5c722c into main Jul 31, 2025
66 checks passed

rfratto deleted the dataobj-dataset-batched-page-metadata-downloads branch July 31, 2025 14:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

chore(dataobj): list pages across all columns in one request #18678

chore(dataobj): list pages across all columns in one request #18678

rfratto commented Jul 31, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

chore(dataobj): list pages across all columns in one request #18678

chore(dataobj): list pages across all columns in one request #18678

Conversation

rfratto commented Jul 31, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants