Skip to content

Conversation

@BlakeOrth
Copy link
Contributor

Which issue does this PR close?

This does not fully close, but is an incremental building block component for:

The full context of how this code is likely to progress can be seen in the POC for this effort:

Rationale for this change

Further fills out method instrumentation

What changes are included in this PR?

  • Adds instrumentation to head requests in the instrumented object store
  • Adds instrumentatin to delete requests in the instrumented object store
  • Adds tests for new code

Are these changes tested?

Yes. New unit tests have been added.

Are there any user-facing changes?

No-ish

cc @alamb

@BlakeOrth BlakeOrth force-pushed the feature/cli_instrument_other branch 2 times, most recently from 74976f8 to dae5002 Compare October 21, 2025 18:45
 - Adds instrumentation to head requests in the instrumented object
   store
 - Adds instrumentatin to delete requests in the instrumented object
   store
 - Adds tests for new code and updates existing tests
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @BlakeOrth -- love the code 🚋 that is currently going on

| Get | duration | ...NORMALIZED...| 1 |
| Get | size | 1006 B | 1006 B | 1006 B | 1006 B | 1 |
| Head | duration | ...NORMALIZED...| 1 |
| Head | size | | | | | 1 |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is interesting that there are (even more!) requests going on to read files

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alamb Yes! This feature continues to provide interesting data. That being said, I'm 99% sure the head requests on read in this case is due to the parquet metadata cache! Refer to my comment on its initial PR here here: #16971 (comment)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess what I am saying is that if we tally up all the object store operations that are (currently) required to read a parquet file there is a lot of room for improvement. I think the next release of DataFusion is going to "magically get faster" for a bunch of people because we are optimizing these calls systematically.

I for one am very excited

@alamb alamb added this pull request to the merge queue Oct 21, 2025
Merged via the queue into apache:main with commit d5ea5e9 Oct 21, 2025
28 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants