Skip to content

JSON_EXTRACT: zero-copy byte slicing for object, array, and number extraction#143702

Open
quackaplop wants to merge 6 commits intoelastic:mainfrom
quackaplop:feature/json-extract-byte-slicing
Open

JSON_EXTRACT: zero-copy byte slicing for object, array, and number extraction#143702
quackaplop wants to merge 6 commits intoelastic:mainfrom
quackaplop:feature/json-extract-byte-slicing

Conversation

@quackaplop
Copy link
Contributor

Summary

Optimizes JSON_EXTRACT to use zero-copy byte slicing instead of copyCurrentStructure()
re-serialization when extracting objects, arrays, and numbers from JSON input. This builds on
the byte offset API exposed in #143501.

What changed

Object/array extraction — Previously, extracting a nested object or array walked every token
in the subtree and rebuilt JSON from scratch via XContentBuilder.copyCurrentStructure(). Now it
slices bytes directly from the input buffer using getTokenLocation().byteOffset()
skipChildren()getCurrentLocation().byteOffset(). Zero allocation, zero re-parsing.

Number extraction — Previously called parser.text() which makes Jackson convert the number
to a Java String, then wraps in BytesRef. Now byte-slices the number literal directly from
the input array, avoiding the String allocation entirely.

Boolean extraction — Reuses static TRUE_BYTES / FALSE_BYTES constants instead of
allocating a new BytesRef("true") / BytesRef("false") per call.

Navigation refactoring — Replaced recursive descent (extractValuenavigateObject
extractValue → ...) with an iterative loop. Navigation methods are now pure parser-positioning
helpers that don't need the byte-slicing context, keeping raw byte access confined to the
extraction point.

Non-JSON _source formats (SMILE/CBOR/YAML) fall back to copyCurrentStructure().

Benchmarks

Also adds json_extract and json_extract_object scenarios to EvalBenchmark, and a
dedicated JsonExtractBenchmark with 10 scenarios through the full eval pipeline (EvalMapper →
Layout → Page → Evaluator).

Environment: Apple M3 Max, JDK 25.0.1, JMH 1.37, warmup 3×2s, measurement 5×2s.

Scenario Before (ns/op) After (ns/op) Change
small_object (30B) 222.0 ± 2.8 115.9 ± 3.1 -47.8%
medium_object (500B) 1,275.9 ± 27.0 662.2 ± 15.7 -48.1%
large_object (4KB) 24,531.3 ± 1,641 15,938.0 ± 721 -35.0%
large_nested_extract (10KB doc) 12,323.1 ± 458 6,664.0 ± 180 -45.9%
array_of_objects ([25] of 50) 4,253.0 ± 76 3,853.5 ± 68 -9.4%
nested_scalar (5 levels) 206.2 ± 4.9 178.9 ± 3.1 -13.2%
deep_nesting (10 levels) 478.7 ± 54 324.9 ± 10.7 -32.1%
number 160.0 ± 2.4 133.1 ± 3.2 -16.8%
boolean 106.0 ± 2.1 100.6 ± 2.5 -5.1%
string 107.6 ± 2.8 103.1 ± 2.0 -4.2%

Largest wins on object/array extraction (35–48%) where copyCurrentStructure was the hot path.

Relates #142873

…traction

Replace copyCurrentStructure() re-serialization with zero-copy byte
slicing for JSON input. When the extracted value is an object, array,
or number, slice bytes directly from the input buffer using
XContentLocation.byteOffset() offsets (exposed in elastic#143501).

Also refactors navigation from recursive descent to iterative loop,
confining raw byte access to the extraction point. Adds JMH benchmarks
for JSON_EXTRACT through the full eval pipeline.
@elasticsearchmachine elasticsearchmachine added v9.4.0 Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) labels Mar 5, 2026
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@quackaplop quackaplop requested a review from nik9000 March 5, 2026 23:53
@coderabbitai
Copy link

coderabbitai bot commented Mar 5, 2026

Important

Review skipped

Auto reviews are limited based on label configuration.

🏷️ Required labels (at least one) (2)
  • Team:Delivery
  • Team:Search - Inference

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: 8e077be9-aa96-49f2-8bcc-ed85366ef820

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Tip

Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs).
Share your feedback on Discord.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai
Copy link

coderabbitai bot commented Mar 6, 2026

Note

Unit test generation is a public access feature. Expect some limitations and changes as we gather feedback and continue to improve it.


Generating unit tests... This may take up to 20 minutes.

@coderabbitai
Copy link

coderabbitai bot commented Mar 6, 2026

❌ Failed to create PR with unit tests: AGENT_CHAT: Failed to open pull request

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/ES|QL AKA ESQL >enhancement Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v9.4.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants