WIP: [Parquet] Add tests for IO/CPU access in parquet reader #7971

alamb · 2025-07-21T15:46:33Z

Which issue does this PR close?

This is part of the work to add cache to the Parquet Reader in Parquet filter pushdown v4 #7850

Rationale for this change

There is quite a bit of cleverness in parquet reader related to IO patterns. To ensure we don't introduce regressions in the existing code, I would like to add tests that cover the IO patterns of the Parquet Reader.

I eventually would like to revisit the "minimize IO at all costs" design of the parquet reader (for use cases where the file is local, for example) but to do that I think we need to better understand what the current reader does

What changes are included in this PR?

Add a new test:

Creates a temporary parquet file with a known row group structure
Reads data from that file using the Arrow Parquet Reader, recording the IO operations
Asserts the expected IO patterns based on the read operations

This is done for both the sync and async readers.

Are these changes tested?

This is only tests

Are there any user-facing changes?

alamb · 2025-07-22T21:04:30Z

Update here is I am quite pleased with how the sync reader looks. Now I am working on sorting out how to test the async reader

github-actions bot added the parquet Changes to the parquet crate label Jul 21, 2025

This was referenced Jul 21, 2025

Parquet filter pushdown v4 #7850

Open

[Epic] Accurate performance tracking XiangpengHao/liquid-cache#302

Open

alamb force-pushed the alamb/parquet_io_test branch from 2c8b561 to c2535a3 Compare July 22, 2025 16:28

Add Parquet IO test

741c0d2

alamb force-pushed the alamb/parquet_io_test branch from ba073f0 to 741c0d2 Compare July 22, 2025 20:54

alamb added 2 commits July 22, 2025 16:57

do it once

e89b8b3

do it once

cf271a1

alamb mentioned this pull request Jul 23, 2025

[DISCUSS] Decouple IO and CPU operations in the Parquet Reader (push decoder?) #7983

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

WIP: [Parquet] Add tests for IO/CPU access in parquet reader #7971

WIP: [Parquet] Add tests for IO/CPU access in parquet reader #7971

Uh oh!

alamb commented Jul 21, 2025 •

edited

Loading

Uh oh!

alamb commented Jul 22, 2025

Uh oh!

Uh oh!

WIP: [Parquet] Add tests for IO/CPU access in parquet reader #7971

Are you sure you want to change the base?

WIP: [Parquet] Add tests for IO/CPU access in parquet reader #7971

Uh oh!

Conversation

alamb commented Jul 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

alamb commented Jul 22, 2025

Uh oh!

Uh oh!

alamb commented Jul 21, 2025 •

edited

Loading