ESQL: Buffer reuse in ParquetStorageObjectAdapter and StorageObject by costin · Pull Request #143700 · elastic/elasticsearch

costin · 2026-03-05T17:13:28Z

Two allocation hot spots in the datasource I/O path cause unnecessary garbage and
memory copies on every Parquet read call:

ParquetStorageObjectAdapter's read(ByteBuffer) and readFully(ByteBuffer) allocate
a temporary byte[] the size of the request, read into it, then copy into the caller's
ByteBuffer. For heap-backed ByteBuffers (the common case with Parquet's column readers),
this is a pure waste — we can read directly into the backing array.
StorageObject.readBytesAsync always allocates a fresh byte[] via stream.readAllBytes().
Callers that can reuse buffers across reads (e.g., columnar format readers doing sequential
chunk reads) have no way to avoid this allocation.

This PR fixes both:

ParquetStorageObjectAdapter now reads directly into heap ByteBuffer backing arrays,
falling back to temp arrays only for direct ByteBuffers.
StorageObject gains a readBytesAsync(long, ByteBuffer, Executor, ActionListener<Integer>)
overload that fills a caller-provided buffer, enabling buffer reuse across async reads.

Developed using AI-assisted tooling

Eliminate unnecessary allocations in two hot paths: ParquetStorageObjectAdapter's read(ByteBuffer) and readFully(ByteBuffer) now read directly into the backing array for heap ByteBuffers instead of allocating a temporary byte[] and double-copying. Direct ByteBuffers fall back to the previous temp-array approach. StorageObject gains a readBytesAsync overload that accepts a caller- provided ByteBuffer, reading directly into its backing array for heap buffers. This avoids per-call byte[] allocation for callers that can reuse buffers across reads.

elasticsearchmachine · 2026-03-05T17:28:12Z

Pinging @elastic/es-analytical-engine (Team:Analytics)

elasticsearchmachine · 2026-03-05T17:28:12Z

Hi @costin, I've created a changelog YAML for you.

costin added >enhancement ES|QL|DS ES|QL datasources labels Mar 5, 2026

costin requested a review from bpintea March 5, 2026 17:13

elasticsearchmachine added needs:triage Requires assignment of a team area label v9.4.0 labels Mar 5, 2026

costin removed the needs:triage Requires assignment of a team area label label Mar 5, 2026

elasticsearchmachine added the needs:triage Requires assignment of a team area label label Mar 5, 2026

costin added :Analytics/ES|QL AKA ESQL and removed needs:triage Requires assignment of a team area label labels Mar 5, 2026

elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Mar 5, 2026

Update docs/changelog/143700.yaml

4ee6b12

Merge branch 'main' into esql/buffer-reuse-storage-parquet

7136cca

costin requested a review from swallez March 5, 2026 19:52

costin enabled auto-merge (squash) March 5, 2026 19:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ESQL: Buffer reuse in ParquetStorageObjectAdapter and StorageObject#143700

ESQL: Buffer reuse in ParquetStorageObjectAdapter and StorageObject#143700
costin wants to merge 3 commits intoelastic:mainfrom
costin:esql/buffer-reuse-storage-parquet

costin commented Mar 5, 2026 •

edited

Loading

Uh oh!

elasticsearchmachine commented Mar 5, 2026

Uh oh!

elasticsearchmachine commented Mar 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

costin commented Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

elasticsearchmachine commented Mar 5, 2026

Uh oh!

elasticsearchmachine commented Mar 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

costin commented Mar 5, 2026 •

edited

Loading