JSON_EXTRACT: zero-copy byte slicing for object, array, and number extraction#143702
JSON_EXTRACT: zero-copy byte slicing for object, array, and number extraction#143702quackaplop wants to merge 6 commits intoelastic:mainfrom
Conversation
…traction Replace copyCurrentStructure() re-serialization with zero-copy byte slicing for JSON input. When the extracted value is an object, array, or number, slice bytes directly from the input buffer using XContentLocation.byteOffset() offsets (exposed in elastic#143501). Also refactors navigation from recursive descent to iterative loop, confining raw byte access to the extraction point. Adds JMH benchmarks for JSON_EXTRACT through the full eval pipeline.
|
Pinging @elastic/es-analytical-engine (Team:Analytics) |
Navigation methods now only position the parser — they no longer carry builder, segments, depth, rawBytes, or rawOffset.
|
Important Review skippedAuto reviews are limited based on label configuration. 🏷️ Required labels (at least one) (2)
Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Path: .coderabbit.yml Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Tip Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs). Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
Note Unit test generation is a public access feature. Expect some limitations and changes as we gather feedback and continue to improve it. Generating unit tests... This may take up to 20 minutes. |
|
❌ Failed to create PR with unit tests: AGENT_CHAT: Failed to open pull request |
Summary
Optimizes
JSON_EXTRACTto use zero-copy byte slicing instead ofcopyCurrentStructure()re-serialization when extracting objects, arrays, and numbers from JSON input. This builds on
the byte offset API exposed in #143501.
What changed
Object/array extraction — Previously, extracting a nested object or array walked every token
in the subtree and rebuilt JSON from scratch via
XContentBuilder.copyCurrentStructure(). Now itslices bytes directly from the input buffer using
getTokenLocation().byteOffset()→skipChildren()→getCurrentLocation().byteOffset(). Zero allocation, zero re-parsing.Number extraction — Previously called
parser.text()which makes Jackson convert the numberto a Java
String, then wraps inBytesRef. Now byte-slices the number literal directly fromthe input array, avoiding the
Stringallocation entirely.Boolean extraction — Reuses static
TRUE_BYTES/FALSE_BYTESconstants instead ofallocating a new
BytesRef("true")/BytesRef("false")per call.Navigation refactoring — Replaced recursive descent (
extractValue→navigateObject→extractValue→ ...) with an iterative loop. Navigation methods are now pure parser-positioninghelpers that don't need the byte-slicing context, keeping raw byte access confined to the
extraction point.
Non-JSON
_sourceformats (SMILE/CBOR/YAML) fall back tocopyCurrentStructure().Benchmarks
Also adds
json_extractandjson_extract_objectscenarios toEvalBenchmark, and adedicated
JsonExtractBenchmarkwith 10 scenarios through the full eval pipeline (EvalMapper →Layout → Page → Evaluator).
Environment: Apple M3 Max, JDK 25.0.1, JMH 1.37, warmup 3×2s, measurement 5×2s.
Largest wins on object/array extraction (35–48%) where
copyCurrentStructurewas the hot path.Relates #142873