Skip to content

Commit fb7d02e

Browse files
scovichcarpecodeumalamb
authored
[Variant] Support Shredded Objects in variant_get (take 2) (#8280)
# Which issue does this PR close? - Closes #8150 - closes #8166 # Rationale for this change Add support for extracting fields from both shredded and non-shredded variant arrays at any depth (like "x", "a.x", "a.b.x") and casting them to Int32 with proper NULL handling for type mismatches. NOTE: This is a second attempt at * #8166 which suffered strong logical conflicts with * #8179 and which itself grew out of * #8122 See the other two PR for the vast majority of review commentary relating to this change. I started from the original PR commits (first three), performed the merge, and fixed up a bunch of issues. Manually diffing the before ([76b75ee..882aa4d](https://github.com/apache/arrow-rs/compare/76b75eebc..882aa4d69)) and after ([0ba91ae..f6fd915](https://github.com/apache/arrow-rs/compare/0ba91aed9..f6fd91583)) diffs gives the following non-trivial differences vs. the original PR * Ran `cargo fmt` * `typed_value_to_variant` now supports all primitive numeric types (previously only int16) * cast options plumbed through and respected * Fix a null buffer bug in `shredded_get_path` -- the original code was wrongly unioning in the null buffer from `typed_value` column: ```patch // Path exhausted! Create a new `VariantArray` for the location we landed on. - // Also union nulls from the final typed_value field we landed on - if let Some(typed_value) = shredding_state.typed_value_field() { - accumulated_nulls = arrow::buffer::NullBuffer::union( - accumulated_nulls.as_ref(), - typed_value.nulls(), - ); - } let target = make_target_variant( shredding_state.value_field().cloned(), shredding_state.typed_value_field().cloned(), accumulated_nulls, ); ``` * Remove the `get_variant_perfectly_shredded_int32_as_variant` test case, because #8179 introduced a battery of unit tests that cover the same functionality. * Remove now-unnecessary `.unwrap()` calls from object builder `finish` calls in unit tests * Fixed broken test code in `create_depth_1_shredded_test_data_working`, which captured the return value of a nested builder's `finish` (`()`) instead of the return value of the top-level builder. I'm not quite sure what this code was trying to do, but I changed it to just create a nested builder instead of a second top-level builder: ```patch fn create_depth_1_shredded_test_data_working() -> ArrayRef { // Create metadata following the working pattern from shredded_object_with_x_field_variant_array let (metadata, _) = { - let a_variant = { - let mut a_builder = parquet_variant::VariantBuilder::new(); - let mut a_obj = a_builder.new_object(); - a_obj.insert("x", Variant::Int32(55)); // "a.x" field (shredded when possible) - a_obj.finish().unwrap() - }; let mut builder = parquet_variant::VariantBuilder::new(); let mut obj = builder.new_object(); - obj.insert("a", a_variant); + + // Create the nested "a" object + let mut a_obj = obj.new_object("a"); + a_obj.insert("x", Variant::Int32(55)); + a_obj.finish(); + obj.finish().unwrap(); builder.finish() }; ``` * Similar fix (twice, `a_variant` and `b_variant`) for `create_depth_2_shredded_test_data_working` * `make_shredding_row_builder` now supports signed int and float types (unsigned int not supported yet) * A new `get_type_name` helper in row_builder.rs that gives human-readable data type names. I'm not convinced it's necessary (and the code is in the wrong spot, jammed in the middle of `VariantAsPrimitive` code. * `impl VariantAsPrimitive` for all signed int and float types * `PrimitiveVariantShreddingRowBuilder` now has a lifetime param because it takes a reference to cast options (it now respects unsafe vs. safe casting) # What changes are included in this PR? Everything in the original PR, plus merge in the main branch, fix logical conflicts and fix various broken tests. # Are these changes tested? All unit tests now pass. # Are there any user-facing changes? No (variant is not public yet) --------- Co-authored-by: carpecodeum <[email protected]> Co-authored-by: Andrew Lamb <[email protected]>
1 parent 4b8cbe2 commit fb7d02e

File tree

5 files changed

+1911
-146
lines changed

5 files changed

+1911
-146
lines changed

0 commit comments

Comments
 (0)