Skip to content

ESQL: Buffer reuse in ParquetStorageObjectAdapter and StorageObject#143700

Open
costin wants to merge 3 commits intoelastic:mainfrom
costin:esql/buffer-reuse-storage-parquet
Open

ESQL: Buffer reuse in ParquetStorageObjectAdapter and StorageObject#143700
costin wants to merge 3 commits intoelastic:mainfrom
costin:esql/buffer-reuse-storage-parquet

Conversation

@costin
Copy link
Member

@costin costin commented Mar 5, 2026

Two allocation hot spots in the datasource I/O path cause unnecessary garbage and
memory copies on every Parquet read call:

  1. ParquetStorageObjectAdapter's read(ByteBuffer) and readFully(ByteBuffer) allocate
    a temporary byte[] the size of the request, read into it, then copy into the caller's
    ByteBuffer. For heap-backed ByteBuffers (the common case with Parquet's column readers),
    this is a pure waste — we can read directly into the backing array.

  2. StorageObject.readBytesAsync always allocates a fresh byte[] via stream.readAllBytes().
    Callers that can reuse buffers across reads (e.g., columnar format readers doing sequential
    chunk reads) have no way to avoid this allocation.

This PR fixes both:

  • ParquetStorageObjectAdapter now reads directly into heap ByteBuffer backing arrays,
    falling back to temp arrays only for direct ByteBuffers.
  • StorageObject gains a readBytesAsync(long, ByteBuffer, Executor, ActionListener<Integer>)
    overload that fills a caller-provided buffer, enabling buffer reuse across async reads.

Developed using AI-assisted tooling

Eliminate unnecessary allocations in two hot paths:

ParquetStorageObjectAdapter's read(ByteBuffer) and readFully(ByteBuffer)
now read directly into the backing array for heap ByteBuffers instead of
allocating a temporary byte[] and double-copying. Direct ByteBuffers
fall back to the previous temp-array approach.

StorageObject gains a readBytesAsync overload that accepts a caller-
provided ByteBuffer, reading directly into its backing array for heap
buffers. This avoids per-call byte[] allocation for callers that can
reuse buffers across reads.
@costin costin added >enhancement ES|QL|DS ES|QL datasources labels Mar 5, 2026
@costin costin requested a review from bpintea March 5, 2026 17:13
@elasticsearchmachine elasticsearchmachine added needs:triage Requires assignment of a team area label v9.4.0 labels Mar 5, 2026
@costin costin removed the needs:triage Requires assignment of a team area label label Mar 5, 2026
@elasticsearchmachine elasticsearchmachine added the needs:triage Requires assignment of a team area label label Mar 5, 2026
@costin costin added :Analytics/ES|QL AKA ESQL and removed needs:triage Requires assignment of a team area label labels Mar 5, 2026
@elasticsearchmachine elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Mar 5, 2026
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@elasticsearchmachine
Copy link
Collaborator

Hi @costin, I've created a changelog YAML for you.

@costin costin requested a review from swallez March 5, 2026 19:52
@costin costin enabled auto-merge (squash) March 5, 2026 19:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/ES|QL AKA ESQL >enhancement ES|QL|DS ES|QL datasources Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v9.4.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants