Use sketches-ddsketch fork with Java-compatible binary encoding#2842
Merged
trinity-1686a merged 12 commits intomainfrom Feb 20, 2026
Merged
Use sketches-ddsketch fork with Java-compatible binary encoding#2842trinity-1686a merged 12 commits intomainfrom
trinity-1686a merged 12 commits intomainfrom
Conversation
Fork sketches-ddsketch as a workspace member to add native Java binary serialization (to_java_bytes/from_java_bytes) for DDSketch. This enables pomsky to return raw DDSketch bytes that event-query can deserialize via DDSketchWithExactSummaryStatistics.decode(). Key changes: - Vendor sketches-ddsketch crate with encoding.rs implementing VarEncoding, flag bytes, and INDEX_DELTAS_AND_COUNTS store format - Align Config::key() to floor-based indexing matching Java's LogarithmicMapping - Add PercentilesCollector::to_sketch_bytes() for pomsky integration - Cross-language golden byte tests verified byte-identical with Java output Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Reference the exact Java source files in DataDog/sketches-java for Config::new(), Config::key(), Config::value(), Config::from_gamma(), and Store::add_count() so readers can verify the alignment. Co-authored-by: Cursor <cursoragent@cursor.com>
- manual_range_contains: use !(0.0..=1.0).contains(&q) - identity_op: simplify (0 << 2) | FLAG_TYPE to just FLAG_TYPE - manual_clamp: use .clamp(0, 8) instead of .max(0).min(8) - manual_repeat_n: use repeat_n() instead of repeat().take() - cast_abs_to_unsigned: use .unsigned_abs() instead of .abs() as usize Co-authored-by: Cursor <cursoragent@cursor.com>
- Replace bare constants with FlagType and BinEncodingMode enums - Use const fn for flag byte construction instead of raw bit ops - Replace if-else chain with nested match in decode_from_java_bytes - Use split_first() in read_byte for idiomatic slice consumption - Use split_at in read_f64_le to avoid TryInto on edition 2018 - Use u64::from(next) instead of `next as u64` casts - Extract assert_golden, assert_quantiles_match, bytes_to_hex helpers to reduce duplication across golden byte tests - Fix edition-2018 assert! format string compatibility - Clean up is_valid_flag_byte with let-else and match Co-authored-by: Cursor <cursoragent@cursor.com>
- Replace approximate PI/E constants with non-famous value in test - Fix reversed empty range (2048..0) → (0..2048).rev() in store test Co-authored-by: Cursor <cursoragent@cursor.com>
congx4
commented
Feb 19, 2026
sketches-ddsketch/src/encoding.rs
Outdated
| use crate::ddsketch::DDSketch; | ||
| use crate::store::Store; | ||
|
|
||
| // --------------------------------------------------------------------------- |
Collaborator
Author
There was a problem hiding this comment.
This file is generated by Opus and it generates tests to ensure it is fully correct. See more details from the pr description.
Collaborator
|
I don't think this should be in the tantivy repo. The changes should be in your |
Collaborator
Collaborator
Author
Move the vendored sketches-ddsketch crate (with Java-compatible binary encoding) to its own repo at quickwit-oss/rust-sketches-ddsketch and reference it via git+rev in Cargo.toml. Co-authored-by: Cursor <cursoragent@cursor.com>
fulmicoton-dd
approved these changes
Feb 19, 2026
Collaborator
fulmicoton-dd
left a comment
There was a problem hiding this comment.
Approved, but please have a look at the comments and see if they make sense
Address review feedback: replace assert_eq! with assert_nearly_equals! for float values that go through JSON serialization roundtrips, which can introduce minor precision differences. Co-authored-by: Cursor <cursoragent@cursor.com>
…ctor Replace the derived Serialize/Deserialize on PercentilesCollector with custom impls that use DDSketch's Java-compatible binary encoding (encode_to_java_bytes / decode_from_java_bytes). This removes the need for the use_serde feature on sketches-ddsketch entirely. Also restore original float test values and use assert_nearly_equals! for all float comparisons in percentile tests, since DDSketch quantile estimates can have minor precision differences across platforms. Co-authored-by: Cursor <cursoragent@cursor.com>
Keep use_serde on sketches-ddsketch so DDSketch derives Serialize/Deserialize, removing the need for custom impls on PercentilesCollector. Co-authored-by: Cursor <cursoragent@cursor.com>
fulmicoton-dd
approved these changes
Feb 19, 2026
Co-authored-by: Cursor <cursoragent@cursor.com>
9f764cb to
18fedd9
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Reference the
quickwit-oss/rust-sketches-ddsketchfork (viagit+revin Cargo.toml) which adds native Java-compatible binary serialization for DDSketch. This enables Rust applications to produce DDSketch bytes that Java consumers can directly deserialize and merge viaDDSketchWithExactSummaryStatistics.decode()from thesketches-javalibrary.Why?
The upstream
sketches-ddsketchRust crate only supports serde-based serialization (JSON), while Java'ssketches-javalibrary uses a custom binary wire format. For distributed aggregation pipelines where Rust search nodes produce intermediate DDSketch results consumed by Java query orchestrators, binary compatibility is required.What changed
sketches-ddsketch/directory from the tantivy workspacesketches-ddsketchdependency frompath = "./sketches-ddsketch"togit + revpointing to quickwit-oss/rust-sketches-ddsketch@555caf1encoding.rs: Java-compatible binary encode/decode using signed/unsigned varint and VarDouble encodingencode()/decode()methodsTesting