ESQL: Buffer reuse in ParquetStorageObjectAdapter and StorageObject#143700
Open
costin wants to merge 3 commits intoelastic:mainfrom
Open
ESQL: Buffer reuse in ParquetStorageObjectAdapter and StorageObject#143700costin wants to merge 3 commits intoelastic:mainfrom
costin wants to merge 3 commits intoelastic:mainfrom
Conversation
Eliminate unnecessary allocations in two hot paths: ParquetStorageObjectAdapter's read(ByteBuffer) and readFully(ByteBuffer) now read directly into the backing array for heap ByteBuffers instead of allocating a temporary byte[] and double-copying. Direct ByteBuffers fall back to the previous temp-array approach. StorageObject gains a readBytesAsync overload that accepts a caller- provided ByteBuffer, reading directly into its backing array for heap buffers. This avoids per-call byte[] allocation for callers that can reuse buffers across reads.
Collaborator
|
Pinging @elastic/es-analytical-engine (Team:Analytics) |
Collaborator
|
Hi @costin, I've created a changelog YAML for you. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Two allocation hot spots in the datasource I/O path cause unnecessary garbage and
memory copies on every Parquet read call:
ParquetStorageObjectAdapter'sread(ByteBuffer)andreadFully(ByteBuffer)allocatea temporary
byte[]the size of the request, read into it, then copy into the caller'sByteBuffer. For heap-backed ByteBuffers (the common case with Parquet's column readers),
this is a pure waste — we can read directly into the backing array.
StorageObject.readBytesAsyncalways allocates a freshbyte[]viastream.readAllBytes().Callers that can reuse buffers across reads (e.g., columnar format readers doing sequential
chunk reads) have no way to avoid this allocation.
This PR fixes both:
ParquetStorageObjectAdapternow reads directly into heap ByteBuffer backing arrays,falling back to temp arrays only for direct ByteBuffers.
StorageObjectgains areadBytesAsync(long, ByteBuffer, Executor, ActionListener<Integer>)overload that fills a caller-provided buffer, enabling buffer reuse across async reads.
Developed using AI-assisted tooling