-
Notifications
You must be signed in to change notification settings - Fork 1k
Commit fb7d02e
[Variant] Support Shredded Objects in variant_get (take 2) (#8280)
# Which issue does this PR close?
- Closes #8150
- closes #8166
# Rationale for this change
Add support for extracting fields from both shredded and non-shredded
variant arrays at any depth (like "x", "a.x", "a.b.x") and casting them
to Int32 with proper NULL handling for type mismatches.
NOTE: This is a second attempt at
* #8166
which suffered strong logical conflicts with
* #8179
and which itself grew out of
* #8122
See the other two PR for the vast majority of review commentary relating
to this change.
I started from the original PR commits (first three), performed the
merge, and fixed up a bunch of issues.
Manually diffing the before
([76b75ee..882aa4d](https://github.com/apache/arrow-rs/compare/76b75eebc..882aa4d69))
and after
([0ba91ae..f6fd915](https://github.com/apache/arrow-rs/compare/0ba91aed9..f6fd91583))
diffs gives the following non-trivial differences vs. the original PR
* Ran `cargo fmt`
* `typed_value_to_variant` now supports all primitive numeric types
(previously only int16)
* cast options plumbed through and respected
* Fix a null buffer bug in `shredded_get_path` -- the original code was
wrongly unioning in the null buffer from `typed_value` column:
```patch
// Path exhausted! Create a new `VariantArray` for the location we
landed on.
- // Also union nulls from the final typed_value field we landed on
- if let Some(typed_value) = shredding_state.typed_value_field() {
- accumulated_nulls = arrow::buffer::NullBuffer::union(
- accumulated_nulls.as_ref(),
- typed_value.nulls(),
- );
- }
let target = make_target_variant(
shredding_state.value_field().cloned(),
shredding_state.typed_value_field().cloned(),
accumulated_nulls,
);
```
* Remove the `get_variant_perfectly_shredded_int32_as_variant` test
case, because #8179 introduced a
battery of unit tests that cover the same functionality.
* Remove now-unnecessary `.unwrap()` calls from object builder `finish`
calls in unit tests
* Fixed broken test code in `create_depth_1_shredded_test_data_working`,
which captured the return value of a nested builder's `finish` (`()`)
instead of the return value of the top-level builder. I'm not quite sure
what this code was trying to do, but I changed it to just create a
nested builder instead of a second top-level builder:
```patch
fn create_depth_1_shredded_test_data_working() -> ArrayRef {
// Create metadata following the working pattern from
shredded_object_with_x_field_variant_array
let (metadata, _) = {
- let a_variant = {
- let mut a_builder = parquet_variant::VariantBuilder::new();
- let mut a_obj = a_builder.new_object();
- a_obj.insert("x", Variant::Int32(55)); // "a.x" field (shredded when
possible)
- a_obj.finish().unwrap()
- };
let mut builder = parquet_variant::VariantBuilder::new();
let mut obj = builder.new_object();
- obj.insert("a", a_variant);
+
+ // Create the nested "a" object
+ let mut a_obj = obj.new_object("a");
+ a_obj.insert("x", Variant::Int32(55));
+ a_obj.finish();
+
obj.finish().unwrap();
builder.finish()
};
```
* Similar fix (twice, `a_variant` and `b_variant`) for
`create_depth_2_shredded_test_data_working`
* `make_shredding_row_builder` now supports signed int and float types
(unsigned int not supported yet)
* A new `get_type_name` helper in row_builder.rs that gives
human-readable data type names. I'm not convinced it's necessary (and
the code is in the wrong spot, jammed in the middle of
`VariantAsPrimitive` code.
* `impl VariantAsPrimitive` for all signed int and float types
* `PrimitiveVariantShreddingRowBuilder` now has a lifetime param because
it takes a reference to cast options (it now respects unsafe vs. safe
casting)
# What changes are included in this PR?
Everything in the original PR, plus merge in the main branch, fix
logical conflicts and fix various broken tests.
# Are these changes tested?
All unit tests now pass.
# Are there any user-facing changes?
No (variant is not public yet)
---------
Co-authored-by: carpecodeum <[email protected]>
Co-authored-by: Andrew Lamb <[email protected]>1 parent 4b8cbe2 commit fb7d02eCopy full SHA for fb7d02e
File tree
Expand file treeCollapse file tree
5 files changed
+1911
-146
lines changedFilter options
- parquet-variant-compute/src
- variant_get
- output
- parquet-variant/src
Expand file treeCollapse file tree
5 files changed
+1911
-146
lines changed
0 commit comments