Skip to content

Add benchmark utility to profile peak memory usage #16814

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

ding-young
Copy link
Contributor

@ding-young ding-young commented Jul 18, 2025

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

  1. In each benchmark (clickbench, tpch, sort-tpch, imdb, h2o), we now call print_memory_stats(); to print memory usage statistics via mimalloc. (This only works when compiled with --features mimalloc_extended.)

  2. A new utility mem_profile is added. It builds dfbench with the mimalloc_extended feature enabled, and then runs each benchmark query in a separate subprocess to collect memory stats. The utility captures the subprocess’s stdout and summarizes the results for all queries.

Are these changes tested?

Yes. Via

// Run all queries 
cargo run --profile release-nonlto --bin mem_profile -- tpch --path benchmarks/data/tpch_sf1 --partitions 4 --format parquet
cargo run --profile release-nonlto --bin mem_profile -- h2o
cargo run --profile release-nonlto --bin mem_profile -- clickbench
cargo run --profile release-nonlto --bin mem_profile -- imdb --path benchmarks/data/imdb/
cargo run --profile release-nonlto --bin mem_profile -- sort-tpch --path benchmarks/data/tpch_sf1 --partitions 4

and

// Run specific query
cargo run --profile release-nonlto --bin mem_profile -- tpch --path benchmarks/data/tpch_sf1 --partitions 4 --format parquet --query 1

Are there any user-facing changes?

Comment on lines +325 to +335
# Profiling Memory Stats for each benchmark query
The `mem_profile` program wraps benchmark execution to measure memory usage statistics, such as peak RSS. It runs each benchmark query in a separate subprocess, capturing the child process’s stdout to print structured output.

Subcommands supported by mem_profile are the subset of those in `dfbench`.
Currently supported benchmarks include: Clickbench, H2o, Imdb, SortTpch, Tpch

Before running benchmarks, `mem_profile` automatically compiles the benchmark binary (`dfbench`) using `cargo build` with the same cargo profile (e.g., --release) as mem_profile itself. By prebuilding the binary and running each query in a separate process, we can ensure accurate memory statistics.

Currently, `mem_profile` only supports `mimalloc` as the memory allocator, since it relies on `mimalloc`'s API to collect memory statistics.

Because it runs the compiled binary directly from the target directory, make sure your working directory is the top-level datafusion/ directory, where the target/ is also located.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's more description about this utility and supported metrics.

@ding-young ding-young marked this pull request as ready for review July 25, 2025 03:19
@ding-young
Copy link
Contributor Author

@2010YOUY01 This is ready for review :) I would love to hear your feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support memory profiling in benchmarks
1 participant