Skip to content

Conversation

klion26
Copy link
Member

@klion26 klion26 commented Jul 16, 2025

Which issue does this PR close?

This pr wants to avoid the extra allocation for the object builder and the later buffer copy.

Rationale for this change

Avoid extra allocation in the object builder like the issue descripted.

What changes are included in this PR?

This removes the internal buffer in ObjectBuilder. All data insertion is done directly to the parent buffer wrapped in parent_state.

The corresponding new fields are added to ObjectBuilder.

  • add object_start_offset in ObjectBuilder, which describes the start offset in the parent buffer for the current object
  • Add has_been_finished in ObjectBuilder, which describes whether the current object has been finished; it will be used in the Drop function.

This patch modifies the logic of new, finish, parent_state, and drop function according to the change.

In particular, it writes data into the parent buffer directly when adding a field to the object (i.e., insert/try_insert is called). When finalizing (finish is called) the object, as header and field ids are must be put in front of data in the buffer, the builder will shift written data bytes for the necessary space for header and field ids. Then it writes header and field ids.

In drop, if the builder is not finalized before being dropped, it will truncate the written bytes to roll back the parent buffer status.

Are these changes tested?

The logic has been covered by the exist logic.

Are there any user-facing changes?

No

@github-actions github-actions bot added the parquet Changes to the parquet crate label Jul 16, 2025
@klion26
Copy link
Member Author

klion26 commented Jul 16, 2025

@alamb Please help to review this when you have time, thanks.

(state, self.validate_unique_fields)
let validate_unique_fields = self.validate_unique_fields;

match &mut self.parent_state {
Copy link
Member Author

@klion26 klion26 Jul 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did not find a better solution for this. I can change this if there is a better solution.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can use the pattern in this pR: klion26#1

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, it becomes cleaner now.

@klion26
Copy link
Member Author

klion26 commented Jul 16, 2025

The test in builder.rs completed successfully, will investigate why CI fails.
pushed a fixup to fix the failed CI.

@klion26 klion26 force-pushed the 7899-avoid-extra-allocation-in-object-builder branch from 91cfb73 to 1f2bcc3 Compare July 16, 2025 10:29
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this PR @klion26 -- I plan to review it carefully tomorrow

klion26 added 2 commits July 17, 2025 09:59
This commit will reuse the parent buffer for object builder.
It can avoid the extra allocation for the object and the later buffer copy.
@klion26 klion26 force-pushed the 7899-avoid-extra-allocation-in-object-builder branch from 1f2bcc3 to 6096566 Compare July 17, 2025 03:48
@@ -1064,20 +1084,58 @@ impl<'a> ObjectBuilder<'a> {
key: &str,
value: T,
) -> Result<(), ArrowError> {
// Get metadata_builder from parent state
let metadata_builder = self.parent_state.metadata_builder();
match &mut self.parent_state {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as below

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is a proposal of how to avoid the duplication:

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@klion26
Copy link
Member Author

klion26 commented Jul 17, 2025

@alamb thank you! I've rebased on the main branch, and hardened the nestes object test.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @klion26 -- this looks quite cool.

I think we should try and avoid the replication when possible -- I left a suggestion on how to do that and get the compiler to be happy

Likewise I think it would be nice to add a few more tests. Let me know if it makes sense

assert_eq!(inner_inner_object_d.len(), 1);
assert_eq!(inner_inner_object_d.field_name(0).unwrap(), "cc");
assert_eq!(inner_inner_object_d.field(0).unwrap(), Variant::from("dd"));

assert_eq!(outer_object.field_name(1).unwrap(), "b");
assert_eq!(outer_object.field(1).unwrap(), Variant::from(true));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should also add tests for the rollback behavior (as in starting an ObjectBuilder but not calling finish)

Similar we should test a list builder rollback too

Copy link
Member Author

@klion26 klion26 Jul 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I will add more tests for this. Currently, the list inside the object depends on the from_json test, add a unit test for it in builder.rs is better, I'll add this.

Not sure if the tests like test_xx_no_finishi()(such as test_object_builder_to_list_builder_inner_no_finish()) are enough to cover the rollback logic?

We've called drop in the test_xx_no_finish() test, do we need to call drop if we add tests to cover the rollaback logic?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After reviewing the tests, I agree that the existing tests are sufficient

fn test_object_builder_to_list_builder_outer_no_finish() {
let mut builder = VariantBuilder::new();
let mut object_builder = builder.new_object();
object_builder.insert("first", 1i8);
// Create a nested list builder and finish it
let mut nested_list_builder = object_builder.new_list("nested");
nested_list_builder.append_value("hi");
nested_list_builder.finish();
// Drop the outer object builder without finishing it
drop(object_builder);
builder.append_value(2i8);
// Only the second attempt should appear in the final variant
let (metadata, value) = builder.finish();
let metadata = VariantMetadata::try_new(&metadata).unwrap();
assert_eq!(metadata.len(), 2);
assert_eq!(&metadata[0], "first");
assert_eq!(&metadata[1], "nested"); // not rolled back
let variant = Variant::try_new_with_metadata(metadata, &value).unwrap();
assert_eq!(variant, Variant::Int8(2));
}
#[test]
fn test_object_builder_to_object_builder_inner_no_finish() {
let mut builder = VariantBuilder::new();
let mut object_builder = builder.new_object();
object_builder.insert("first", 1i8);
// Create a nested object builder but never finish it
let mut nested_object_builder = object_builder.new_object("nested");
nested_object_builder.insert("name", "unknown");
drop(nested_object_builder);
object_builder.insert("second", 2i8);
// The parent object should only contain the original fields
object_builder.finish().unwrap();
let (metadata, value) = builder.finish();
let metadata = VariantMetadata::try_new(&metadata).unwrap();
assert_eq!(metadata.len(), 3);
assert_eq!(&metadata[0], "first");
assert_eq!(&metadata[1], "name"); // not rolled back
assert_eq!(&metadata[2], "second");
let variant = Variant::try_new_with_metadata(metadata, &value).unwrap();
let obj = variant.as_object().unwrap();
assert_eq!(obj.len(), 2);
assert_eq!(obj.get("first"), Some(Variant::Int8(1)));
assert_eq!(obj.get("second"), Some(Variant::Int8(2)));
}
#[test]
fn test_object_builder_to_object_builder_outer_no_finish() {
let mut builder = VariantBuilder::new();
let mut object_builder = builder.new_object();
object_builder.insert("first", 1i8);
// Create a nested object builder and finish it
let mut nested_object_builder = object_builder.new_object("nested");
nested_object_builder.insert("name", "unknown");
nested_object_builder.finish().unwrap();
// Drop the outer object builder without finishing it
drop(object_builder);
builder.append_value(2i8);
// Only the second attempt should appear in the final variant
let (metadata, value) = builder.finish();
let metadata = VariantMetadata::try_new(&metadata).unwrap();
assert_eq!(metadata.len(), 3);
assert_eq!(&metadata[0], "first"); // not rolled back
assert_eq!(&metadata[1], "name"); // not rolled back
assert_eq!(&metadata[2], "nested"); // not rolled back
let variant = Variant::try_new_with_metadata(metadata, &value).unwrap();
assert_eq!(variant, Variant::Int8(2));
}

I also verified test coverage with

cargo llvm-cov --html test -p parquet-variant

And it is indeed covered:
Screenshot 2025-07-18 at 11 58 35 AM

@@ -999,8 +1000,17 @@ impl<'a> ListBuilder<'a> {
let offset_size = int_size(data_size);

// Get parent's buffer
let offset_shift = match &self.parent_state {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be nice if this was a function in ParentState, something like

Suggested change
let offset_shift = match &self.parent_state {
let offset_shift = self.parent_state.object_start_offset();

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@@ -1064,20 +1084,58 @@ impl<'a> ObjectBuilder<'a> {
key: &str,
value: T,
) -> Result<(), ArrowError> {
// Get metadata_builder from parent state
let metadata_builder = self.parent_state.metadata_builder();
match &mut self.parent_state {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is a proposal of how to avoid the duplication:

Comment on lines 1053 to 1057
let start_offset = match &parent_state {
ParentState::Variant { buffer, .. } => buffer.offset(),
ParentState::List { buffer, .. } => buffer.offset(),
ParentState::Object { buffer, .. } => buffer.offset(),
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let start_offset = match &parent_state {
ParentState::Variant { buffer, .. } => buffer.offset(),
ParentState::List { buffer, .. } => buffer.offset(),
ParentState::Object { buffer, .. } => buffer.offset(),
};
let start_offset = parent_state.buffer().offset();

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried this way, but it needs to change the parent_state to mutable. Added a function to retrieve the current offset of the buffer.

(state, self.validate_unique_fields)
let validate_unique_fields = self.validate_unique_fields;

match &mut self.parent_state {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can use the pattern in this pR: klion26#1

let data_size = self.buffer.offset();
let num_fields = self.fields.len();
let is_large = num_fields > u8::MAX as usize;
let metadata_builder = match &self.parent_state {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be nice to put this into a method as well rather than an inline match statement

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed


let starting_offset = self.object_start_offset;

// Shift existing data to make room for the header
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a follow on PR we can consider avoiding this extra splice somehow (by preallocating the size or something). Future work though

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

filed an issue(#7960) to trace this.

buffer[header_pos..header_pos + offset_size as usize]
.copy_from_slice(&data_size_bytes[..offset_size as usize]);

let start_offset_shift = match &self.parent_state {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here is another place we could use the method and avoid an inline match

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@klion26
Copy link
Member Author

klion26 commented Jul 18, 2025

@alamb Thanks for the detailed review, will adressed them soon.

Copy link
Member Author

@klion26 klion26 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alamb Thanks for the detailed review and the suggestion, the code becomes much more cleaner. I've addressed most of the comments(in commit f5b0465), will add the test to cover the rollback logic after confirmation.

Comment on lines 1053 to 1057
let start_offset = match &parent_state {
ParentState::Variant { buffer, .. } => buffer.offset(),
ParentState::List { buffer, .. } => buffer.offset(),
ParentState::Object { buffer, .. } => buffer.offset(),
};
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried this way, but it needs to change the parent_state to mutable. Added a function to retrieve the current offset of the buffer.


let starting_offset = self.object_start_offset;

// Shift existing data to make room for the header
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

filed an issue(#7960) to trace this.

buffer[header_pos..header_pos + offset_size as usize]
.copy_from_slice(&data_size_bytes[..offset_size as usize]);

let start_offset_shift = match &self.parent_state {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

let data_size = self.buffer.offset();
let num_fields = self.fields.len();
let is_large = num_fields > u8::MAX as usize;
let metadata_builder = match &self.parent_state {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

(state, self.validate_unique_fields)
let validate_unique_fields = self.validate_unique_fields;

match &mut self.parent_state {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, it becomes cleaner now.

@@ -1064,20 +1084,58 @@ impl<'a> ObjectBuilder<'a> {
key: &str,
value: T,
) -> Result<(), ArrowError> {
// Get metadata_builder from parent state
let metadata_builder = self.parent_state.metadata_builder();
match &mut self.parent_state {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@@ -999,8 +1000,17 @@ impl<'a> ListBuilder<'a> {
let offset_size = int_size(data_size);

// Get parent's buffer
let offset_shift = match &self.parent_state {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@klion26 klion26 force-pushed the 7899-avoid-extra-allocation-in-object-builder branch from deb0782 to f5b0465 Compare July 18, 2025 05:20
@alamb
Copy link
Contributor

alamb commented Jul 18, 2025

🤖 ./gh_compare_arrow.sh Benchmark Script Running
Linux aal-dev 6.11.0-1016-gcp #16~24.04.1-Ubuntu SMP Wed May 28 02:40:52 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing 7899-avoid-extra-allocation-in-object-builder (0fecaa0) to 99eb1bc diff
BENCH_NAME=variant_kernels
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench variant_kernels
BENCH_FILTER=
BENCH_BRANCH_NAME=7899-avoid-extra-allocation-in-object-builder
Results will be posted here when complete

@alamb
Copy link
Contributor

alamb commented Jul 18, 2025

🤖: Benchmark completed

Details

group                                                                7899-avoid-extra-allocation-in-object-builder    main
-----                                                                ---------------------------------------------    ----
batch_json_string_to_variant json_list 8k string                     1.00     27.9±0.10ms        ? ?/sec              1.03     28.7±0.11ms        ? ?/sec
batch_json_string_to_variant random_json(2633 bytes per document)    1.00    364.7±8.44ms        ? ?/sec              1.08    392.4±6.39ms        ? ?/sec
batch_json_string_to_variant repeated_struct 8k string               1.01      8.6±0.02ms        ? ?/sec              1.00      8.6±0.03ms        ? ?/sec
variant_get_primitive                                                1.00  1368.4±16.34µs        ? ?/sec              1.01   1375.4±8.13µs        ? ?/sec

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really nice @klion26 -- thank you very much for the contribution.

It is neat to see we can already see it making conersion from JSON 8% faster in some cases (see benchmark here #7935 (comment)):

batch_json_string_to_variant random_json(2633 bytes per document)    1.00    364.7±8.44ms        ? ?/sec              1.08    392.4±6.39ms        ? ?/sec

FYI @scovich @viirya and @friendlymatthew


// Shift existing data to make room for the header
let buffer = parent_buffer.inner_mut();
buffer.splice(starting_offset..starting_offset, vec![0u8; header_size]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can avoid even this allocation:

Suggested change
buffer.splice(starting_offset..starting_offset, vec![0u8; header_size]);
buffer.splice(starting_offset..starting_offset, std::iter::repeat_n(0u8, header_size));

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively -- that small allocation apparently isn't hurting performance. What if we create the temp vec with initial capacity of header_size and then populate it with the actual header info, field ids, and offsets before splicing it in. That way, an incorrect header_size calculation would not impact correctness:

let mut bytes_to_splice = Vec::with_capacity(header_size);
bytes_to_splice.push(header_byte);
if is_large {
    bytes_to_splice.extend((num_fields as u32).to_le_bytes());
} else {
    bytes_to_splice.push(num_fields as u8);
}
for &field_id in self.fields.keys() {
    bytes_to_splice.extend(field_id.to_le_bytes().into_iter().take(id_size));
}
for &offset in self.fields.values() {
    bytes_to_splice.extend((offset as u32).to_le_bytes().into_iter().take(offset_size));
}
bytes_to_splice.extend((data_size as u32).to_le_bytes().into_iter().take(offset_size));
buffer.splice(starting_offset..starting_offset, bytes_to_splice);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aside: I'd be very impressed if the compiler is smart enough to optimize away the allocation in vec![0u8; header_size].into_iter(), which would potentially cause the above to run slower than the current code.

Copy link
Contributor

@scovich scovich Jul 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, I think we can improve the performance of appending packed u32 values to a Vec<u8>. The following code:

pub fn append_bytes_brute2(dest: &mut Vec<u8>, src: u32, nbytes: NumBytes) {
    let n = dest.len() + nbytes as usize;
    dest.extend(src.to_le_bytes());
    dest.truncate(n);
}

... unconditionally adds all four bytes and then truncates the vector to the desired length. This works because (a) it's a single machine instruction to copy the four LE bytes of a u32 to a memory location; (b) truncate drops the higher order bytes, when working with LE bytes; and (c) truncate is dirt cheap (conditional move that doesn't break processor pipelines the way a branch would).

Result: Code that should perform the same regardless of the number of bytes we encode the u32 as, with only a single branch to check for vector capacity.

The above compiles to the following assembly code:

playground::append_packed_u32:
	;; set up stack frame (not relevant in practice due to inlining)
	  ...
	movq	(%rdi), %rcx      ; dest capacity
	movq	16(%rdi), %rbx    ; dest len
	subq	%rbx, %rcx        ; calculate available capacity
	movq	%rbx, %rax
	cmpq	$3, %rcx          ; if insufficient capacity...
	jbe	.LBB2_1           ; ... then `reserve` more

.LBB2_2:
	addq	%rdx, %rbx        ; add nbytes to len (truncate arg)
	movq	8(%rdi), %rcx     ; dest data pointer
	movl	%esi, (%rcx,%rax) ; append all four bytes of src to the vector
	addq	$4, %rax          ; increase len by 4 (u32 size)
	cmpq	%rax, %rbx        ; if new len is bigger than truncate arg...
	cmovbq	%rbx, %rax        ; ... then truncate len
	movq	%rax, 16(%rdi)    ; write back the updated dest len
	;; tear down stack frame
	  ...
	retq

.LBB2_1:
	;; call prologue
	  ... 
	callq	alloc::raw_vec::RawVecInner<A>::reserve::do_reserve_and_handle
	;; call epilogue
	  ...
	jmp	.LBB2_2

The one bummer is, when adding a reserve followed by a loop over a slice of source bytes:

pub fn append_packed_u32(dest: &mut Vec<u8>, src: &[u32], nbytes: usize) {
    dest.reserve((src.len() + 1) * nbytes);
    for val in src {
        let n = dest.len() + nbytes;
        dest.extend(val.to_le_bytes());
        dest.truncate(n);
    }
}

... the compiler isn't smart enough to eliminate the redundant capacity check inside the loop. Oh, well.

@@ -1317,7 +1414,15 @@ impl<'a> ObjectBuilder<'a> {
/// This is to ensure that the object is always finalized before its parent builder
/// is finalized.
impl Drop for ObjectBuilder<'_> {
fn drop(&mut self) {}
fn drop(&mut self) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️

assert_eq!(inner_inner_object_d.len(), 1);
assert_eq!(inner_inner_object_d.field_name(0).unwrap(), "cc");
assert_eq!(inner_inner_object_d.field(0).unwrap(), Variant::from("dd"));

assert_eq!(outer_object.field_name(1).unwrap(), "b");
assert_eq!(outer_object.field(1).unwrap(), Variant::from(true));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After reviewing the tests, I agree that the existing tests are sufficient

fn test_object_builder_to_list_builder_outer_no_finish() {
let mut builder = VariantBuilder::new();
let mut object_builder = builder.new_object();
object_builder.insert("first", 1i8);
// Create a nested list builder and finish it
let mut nested_list_builder = object_builder.new_list("nested");
nested_list_builder.append_value("hi");
nested_list_builder.finish();
// Drop the outer object builder without finishing it
drop(object_builder);
builder.append_value(2i8);
// Only the second attempt should appear in the final variant
let (metadata, value) = builder.finish();
let metadata = VariantMetadata::try_new(&metadata).unwrap();
assert_eq!(metadata.len(), 2);
assert_eq!(&metadata[0], "first");
assert_eq!(&metadata[1], "nested"); // not rolled back
let variant = Variant::try_new_with_metadata(metadata, &value).unwrap();
assert_eq!(variant, Variant::Int8(2));
}
#[test]
fn test_object_builder_to_object_builder_inner_no_finish() {
let mut builder = VariantBuilder::new();
let mut object_builder = builder.new_object();
object_builder.insert("first", 1i8);
// Create a nested object builder but never finish it
let mut nested_object_builder = object_builder.new_object("nested");
nested_object_builder.insert("name", "unknown");
drop(nested_object_builder);
object_builder.insert("second", 2i8);
// The parent object should only contain the original fields
object_builder.finish().unwrap();
let (metadata, value) = builder.finish();
let metadata = VariantMetadata::try_new(&metadata).unwrap();
assert_eq!(metadata.len(), 3);
assert_eq!(&metadata[0], "first");
assert_eq!(&metadata[1], "name"); // not rolled back
assert_eq!(&metadata[2], "second");
let variant = Variant::try_new_with_metadata(metadata, &value).unwrap();
let obj = variant.as_object().unwrap();
assert_eq!(obj.len(), 2);
assert_eq!(obj.get("first"), Some(Variant::Int8(1)));
assert_eq!(obj.get("second"), Some(Variant::Int8(2)));
}
#[test]
fn test_object_builder_to_object_builder_outer_no_finish() {
let mut builder = VariantBuilder::new();
let mut object_builder = builder.new_object();
object_builder.insert("first", 1i8);
// Create a nested object builder and finish it
let mut nested_object_builder = object_builder.new_object("nested");
nested_object_builder.insert("name", "unknown");
nested_object_builder.finish().unwrap();
// Drop the outer object builder without finishing it
drop(object_builder);
builder.append_value(2i8);
// Only the second attempt should appear in the final variant
let (metadata, value) = builder.finish();
let metadata = VariantMetadata::try_new(&metadata).unwrap();
assert_eq!(metadata.len(), 3);
assert_eq!(&metadata[0], "first"); // not rolled back
assert_eq!(&metadata[1], "name"); // not rolled back
assert_eq!(&metadata[2], "nested"); // not rolled back
let variant = Variant::try_new_with_metadata(metadata, &value).unwrap();
assert_eq!(variant, Variant::Int8(2));
}

I also verified test coverage with

cargo llvm-cov --html test -p parquet-variant

And it is indeed covered:
Screenshot 2025-07-18 at 11 58 35 AM

Copy link
Contributor

@scovich scovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initial review, a bit scattered but hopefully helpful.

Impressive that it's already so much faster than the baseline, potentially with room to improve even further!

Comment on lines 603 to 606
// returns the beginning offset of buffer for the parent if it is object builder, else 0.
// for object builder will reuse the buffer from the parent, this is needed for `finish`
// which needs the relative offset from the current variant.
fn object_start_offset(&self) -> usize {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't variant array also have a starting offset?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feature that reuse the parent buffer hasn't been implemented for ListBuilder. Will do it with a follow-up pr.

Comment on lines 620 to 575
ParentState::Variant {
buffer,
metadata_builder,
} => (buffer, metadata_builder),
ParentState::List {
buffer,
metadata_builder,
..
} => (buffer, metadata_builder),
ParentState::Object {
buffer,
metadata_builder,
..
} => (buffer, metadata_builder),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think rust allows this:

Suggested change
ParentState::Variant {
buffer,
metadata_builder,
} => (buffer, metadata_builder),
ParentState::List {
buffer,
metadata_builder,
..
} => (buffer, metadata_builder),
ParentState::Object {
buffer,
metadata_builder,
..
} => (buffer, metadata_builder),
ParentState::Variant {
buffer,
metadata_builder,
} |
ParentState::List {
buffer,
metadata_builder,
..
} |
ParentState::Object {
buffer,
metadata_builder,
..
} => (buffer, metadata_builder),

Not clear whether that's better or worse than the current code.

I'm also not sure how it would fmt.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed

Comment on lines 639 to 600
match self {
ParentState::Variant { buffer, .. } => buffer.offset(),
ParentState::Object { buffer, .. } => buffer.offset(),
ParentState::List { buffer, .. } => buffer.offset(),
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above, can reduce redundancy by

Suggested change
match self {
ParentState::Variant { buffer, .. } => buffer.offset(),
ParentState::Object { buffer, .. } => buffer.offset(),
ParentState::List { buffer, .. } => buffer.offset(),
}
match self {
ParentState::Variant { buffer, .. }
| ParentState::Object { buffer, .. }
| ParentState::List { buffer, .. } => buffer.offset(),
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually... we can just invoke ParentState::buffer?

Suggested change
match self {
ParentState::Variant { buffer, .. } => buffer.offset(),
ParentState::Object { buffer, .. } => buffer.offset(),
ParentState::List { buffer, .. } => buffer.offset(),
}
self.buffer().offset()

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs the signature changed to buffer_current_offset(&mut self), not sure if this is ok?

changed to the first version


// Shift existing data to make room for the header
let buffer = parent_buffer.inner_mut();
buffer.splice(starting_offset..starting_offset, vec![0u8; header_size]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can avoid even this allocation:

Suggested change
buffer.splice(starting_offset..starting_offset, vec![0u8; header_size]);
buffer.splice(starting_offset..starting_offset, std::iter::repeat_n(0u8, header_size));

Comment on lines +1352 to +1226
let header_size = 1 + // header byte
(if is_large { 4 } else { 1 }) + // num_fields
(num_fields * id_size as usize) + // field IDs
((num_fields + 1) * offset_size as usize); // field offsets + data_size
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing I don't love about this approach is the independent calculation of the splice size vs. the bytes that actually get inserted. If they ever disagreed... badness as the insertions underflow or overflow the splice.

Ideally, we could produce an iterator that emits the desired bytes, and the splice itself can guarantee correct behavior. But for that to work the iterator would need to provide an accurate lower bound, so which rules out std::iter::from_fn. Even if we did craft a custom iterator, computing its lower bound would basically be this same calculation all over again. We could also chain together a bunch of iterators, which preserves the lower bound, but somehow I doubt that would be efficient.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like might almost work?

let field_ids = self.fields.keys().flat_map(|field_id| {
    (field_id as usize).to_le_bytes().into_iter().take(id_size)
});

let offsets = self.fields.values().flat_map(|offset| {
    offset.to_le_bytes().into_iter().take(offset_size)
});

let num_fields = num_fields.to_le_bytes().take(if is_large 4 else 1);

let header_and_num_fields = std::iter::once(header_byte).chain(num_fields);
let field_ids_and_offsets = field_ids.chain(offsets);
let bytes_to_splice = header_and_num_fields.chain(field_ids_and_offsets);
buffer.splice(starting_offset..starting_offset, bytes_to_splice);

... but unfortunately Iterator::flat_map does not (honestly, cannot) compute an accurate lower bound size hint. So the splice call would end up having to allocate an internal temp buffer.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just realized -- a custom iterator that computes its size hint is still safer than the current code, because an incorrect size hint won't corrupt any data. It will just require extra allocations and/or byte shifting. There's still the question of performance tho. If it's drastically slower the extra safety probably wouldn't be worth it.

Another possibility is to take a hybrid approach -- rely on Iterator::chain for most of the heavy lifting, but define a custom iterator for emitting offset/field_id arrays:

let num_fields = num_fields.to_le_bytes().take(if is_large 4 else 1);
let header_and_num_fields = std::iter::once(header_byte).chain(num_fields);

let field_ids = PackedU32Iterator::new(id_size, self.fields.keys().copied());
let offsets = PackedU32Iterator::new(offset_size, self.fields.values().map(|offset| *offset as u32));
let field_ids_and_offsets = field_ids.chain(offsets);

let bytes_to_splice = header_and_num_fields.chain(field_ids_and_offsets);
buffer.splice(starting_offset..starting_offset, bytes_to_splice);
PackedU32Iterator
struct PackedU32Iterator<T: impl Iterator<Item = [u8; 4]>> {
    packed_bytes: usize,
    iterator: T,
    current_item: [u8; 4],
    current_byte: usize, // 0..3
}

impl<T: impl Iterator<Item = [u8; 4]>> PackedU32Iterator<T> {
    fn new(packed_bytes: usize, iterator: T) -> Self {
        // eliminate corner cases in `next` by initializing with a fake already-consumed "first" item
        Self {
            packed_bytes,
            iterator,
            current_item: [0; 4],
            current_byte: packed_bytes,
        }
    }
}

impl<T: impl Iterator<Item = [u8; 4]>> Iterator for PackedU32Iterator<T> {
    fn size_hint(&self) -> (usize, Option<usize>) {
        let lower = (packed_bytes - current_byte) + packed_bytes * iterator.size_hint().0;
        (lower, None)
    }

    fn next(&mut self) -> Option<u8> {
        if self.current_byte >= self.packed_bytes {
            let Some(next_item) = self.iterator.next() else {
                return None;
            };
            self.current_item = next_item;
            self.current_byte = 0;
        }
        let rval = self.current_item[self.current_byte];
        self.current_byte += 1;
        Some(rval)
    }
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... but again, I worry that wrapping up all those for-loops inside an iterator for direct splicing will turn out to be a lot slower than the splice-then-overwrite approach the code currently takes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think it's ok to make this optimization a follow-up pr and add some benchmark for it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or try the other approach that populates the tmp vec with non-zero bytes before splicing it into the main buffer? That's a lot simpler than this iterator-based suggestion, and likely performs better too.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, will do it in a followup

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also think we can mitigate the danger of independent size calculation with testing.

I like the idea of going with the approach in this PR (which is already pretty large)

let ids = self.fields.keys().map(|id| *id as usize);
parent_buffer.append_offset_array(ids, None, id_size);
// Write field IDs
for (&field_id, _) in &self.fields {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for (&field_id, _) in &self.fields {
for field_id in self.fields.keys() {


// Shift existing data to make room for the header
let buffer = parent_buffer.inner_mut();
buffer.splice(starting_offset..starting_offset, vec![0u8; header_size]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively -- that small allocation apparently isn't hurting performance. What if we create the temp vec with initial capacity of header_size and then populate it with the actual header info, field ids, and offsets before splicing it in. That way, an incorrect header_size calculation would not impact correctness:

let mut bytes_to_splice = Vec::with_capacity(header_size);
bytes_to_splice.push(header_byte);
if is_large {
    bytes_to_splice.extend((num_fields as u32).to_le_bytes());
} else {
    bytes_to_splice.push(num_fields as u8);
}
for &field_id in self.fields.keys() {
    bytes_to_splice.extend(field_id.to_le_bytes().into_iter().take(id_size));
}
for &offset in self.fields.values() {
    bytes_to_splice.extend((offset as u32).to_le_bytes().into_iter().take(offset_size));
}
bytes_to_splice.extend((data_size as u32).to_le_bytes().into_iter().take(offset_size));
buffer.splice(starting_offset..starting_offset, bytes_to_splice);

Comment on lines 1401 to 1403
let start_offset_shift = self.parent_state.object_start_offset();
self.parent_state
.finish(starting_offset - start_offset_shift);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why make the caller of ParentState::finish extract the start offset? Seems like parent state can do that more easily and reliably on its own?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, moving this logic to ParentState::finish is clearer. fixed.

Comment on lines 1419 to 1301
if !self.has_been_finished {
self.parent_state
.buffer()
.inner_mut()
.truncate(self.object_start_offset);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While we're at it, we should truncate the MetadataBuilder back to the size it had when we started, to clean up any new field ids the failed builder might have created.

Copy link
Member Author

@klion26 klion26 Jul 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems that in previous pr #7865, some logics want to cover that the metadata hasn't been rolled back in tests like test_list_builder_to_object_builder_inner_no_finish, do we need to roll back the MetadataBuilder here?

        // The parent list should only contain the original values
        list_builder.finish();
        let (metadata, value) = builder.finish();
        let metadata = VariantMetadata::try_new(&metadata).unwrap();
        assert_eq!(metadata.len(), 1);
        assert_eq!(&metadata[0], "name"); // not rolled back

I updated this into a separate commit, we can keep it or revert it easily.

IIUC, the Metadata can be rolled back because the object has not been written successfully, but not sure if there are any cases I did not follow here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, existing tests expect the fields to persist because the original code was unable to roll back the changes. With this PR, we should be able to roll back the changes cleanly, and update the unit tests accordingly.


// Shift existing data to make room for the header
let buffer = parent_buffer.inner_mut();
buffer.splice(starting_offset..starting_offset, vec![0u8; header_size]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aside: I'd be very impressed if the compiler is smart enough to optimize away the allocation in vec![0u8; header_size].into_iter(), which would potentially cause the above to run slower than the current code.

@klion26 klion26 force-pushed the 7899-avoid-extra-allocation-in-object-builder branch from 0fecaa0 to c4d26db Compare July 20, 2025 16:22
Copy link
Member Author

@klion26 klion26 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@scovich Thank you very much for the detailed review and suggestion, I've addressed most of them (add comments for the rest), please take another look when you're free.

Comment on lines 603 to 606
// returns the beginning offset of buffer for the parent if it is object builder, else 0.
// for object builder will reuse the buffer from the parent, this is needed for `finish`
// which needs the relative offset from the current variant.
fn object_start_offset(&self) -> usize {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feature that reuse the parent buffer hasn't been implemented for ListBuilder. Will do it with a follow-up pr.

Comment on lines 639 to 600
match self {
ParentState::Variant { buffer, .. } => buffer.offset(),
ParentState::Object { buffer, .. } => buffer.offset(),
ParentState::List { buffer, .. } => buffer.offset(),
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs the signature changed to buffer_current_offset(&mut self), not sure if this is ok?

changed to the first version

Comment on lines 1401 to 1403
let start_offset_shift = self.parent_state.object_start_offset();
self.parent_state
.finish(starting_offset - start_offset_shift);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, moving this logic to ParentState::finish is clearer. fixed.

Comment on lines 1419 to 1301
if !self.has_been_finished {
self.parent_state
.buffer()
.inner_mut()
.truncate(self.object_start_offset);
}
Copy link
Member Author

@klion26 klion26 Jul 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems that in previous pr #7865, some logics want to cover that the metadata hasn't been rolled back in tests like test_list_builder_to_object_builder_inner_no_finish, do we need to roll back the MetadataBuilder here?

        // The parent list should only contain the original values
        list_builder.finish();
        let (metadata, value) = builder.finish();
        let metadata = VariantMetadata::try_new(&metadata).unwrap();
        assert_eq!(metadata.len(), 1);
        assert_eq!(&metadata[0], "name"); // not rolled back

I updated this into a separate commit, we can keep it or revert it easily.

IIUC, the Metadata can be rolled back because the object has not been written successfully, but not sure if there are any cases I did not follow here.

Comment on lines 620 to 575
ParentState::Variant {
buffer,
metadata_builder,
} => (buffer, metadata_builder),
ParentState::List {
buffer,
metadata_builder,
..
} => (buffer, metadata_builder),
ParentState::Object {
buffer,
metadata_builder,
..
} => (buffer, metadata_builder),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed

});

let max_id = self.fields.iter().map(|(i, _)| *i).max().unwrap_or(0);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using the length of the metadata builder doesn't have a correctness problem, but it will have a tiny problem that will waste some space for the header. added in a separate commit.

Comment on lines +1352 to +1226
let header_size = 1 + // header byte
(if is_large { 4 } else { 1 }) + // num_fields
(num_fields * id_size as usize) + // field IDs
((num_fields + 1) * offset_size as usize); // field offsets + data_size
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think it's ok to make this optimization a follow-up pr and add some benchmark for it?

@klion26
Copy link
Member Author

klion26 commented Jul 20, 2025

@alamb thank you very much for the review and help, happy to see these improvements.

@klion26 klion26 force-pushed the 7899-avoid-extra-allocation-in-object-builder branch from c4d26db to bdf1f2d Compare July 20, 2025 16:32
Comment on lines 1051 to 1052
// as object builder has been reused the parent buffer,
// we need to shift the offset by the starting offset of the parent object
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get it why put the comment here, seems ListBuilder.finish hasn't been updated.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, because if we create a ListBuilder inside an ObjectBuilder, then we need do this shift, but after the last change, the comment here need to be moved to ParentState::finish().

@@ -1028,18 +1078,29 @@ impl Drop for ListBuilder<'_> {
pub struct ObjectBuilder<'a> {
parent_state: ParentState<'a>,
fields: IndexMap<u32, usize>, // (field_id, offset)
buffer: ValueBuffer,
/// the starting offset in the parent's buffer where this object starts
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Style suggestion, it'd be better just following existing doc style.

Suggested change
/// the starting offset in the parent's buffer where this object starts
/// The starting offset in the parent's buffer where this object starts

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar style issues on other comment/doc.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

if self.validate_unique_fields && !self.duplicate_fields.is_empty() {
let metadata_builder = self.parent_state.metadata_builder();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm? Why move metadata_builder inside?

Copy link
Member Author

@klion26 klion26 Jul 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was intended to avoid the double mutable reference problem, but it's not a problem after the implementation has changed. I can revert this if needed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reverted this change in 19bb544 to keep the diff cleaner

@@ -506,6 +506,7 @@ enum ParentState<'a> {
metadata_builder: &'a mut MetadataBuilder,
fields: &'a mut IndexMap<u32, usize>,
field_name: &'a str,
object_start_offset: usize,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
object_start_offset: usize,
parent_offset_base: usize,

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested a more understandable name (at least to me).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, object_start_offset wants to indicate that this is the offset of the current object start.

@@ -1028,18 +1078,29 @@ impl Drop for ListBuilder<'_> {
pub struct ObjectBuilder<'a> {
parent_state: ParentState<'a>,
fields: IndexMap<u32, usize>, // (field_id, offset)
buffer: ValueBuffer,
/// the starting offset in the parent's buffer where this object starts
object_start_offset: usize,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
object_start_offset: usize,
parent_offset_base: usize,

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

object_start_offset: usize,
/// the starting offset in the parent's metadata buffer where this object starts
/// used to truncate the written fields in `drop` if the current object has not been finished
object_meta_start_offset: usize,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
object_meta_start_offset: usize,
parent_metadata_offset_base: usize,

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

Comment on lines 1216 to 1217
// current object starts from `object_start_offset`
let data_size = current_offset - self.object_start_offset;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, so we assume that no other object was appended at the same time? If we create two object builders and they insert into the parent buffer in an interleaved manner? Won't be the base offset (object start offset) same for them?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I forgot that parent status has exclusively access to the parent buffer.

Comment on lines 1208 to 1211
// the length of the metadata's field names is a very cheap to compute the upper bound.
// it will almost always be a tight upper bound as well -- it would take a pretty
// carefully crafted object to use only the early field ids of a large dictionary.
let max_id = metadata_builder.field_names.len();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, honestly I didn't get what this comment trying to say. And the max_id is same as the length of field_names exactly?

Copy link
Member Author

@klion26 klion26 Jul 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, metadata_builder.field_names.len() returns the size of the underlying map, it may be bigger than the actual max field_id, changing to this here is that it can avoid one pass for the field_ids to calucalute the max_id.

But this change can make test more diffict(more difficult to calculate the header size) -- such as the failed case from_json::test::test_json_to_variant_object_very_large (test code here/failed ci)

@alamb @viirya @scovich Maybe we can revert this to the original code(travel one pass from filed_ids) what do you think about this? thanks. -- added a separate commit which reverted this logic to see if any other tests failed.

Comment on lines 1240 to 1252
// Write header byte
let header = object_header(is_large, id_size, offset_size);
parent_buffer.append_header(header, is_large, num_fields);
buffer[header_pos] = header;
header_pos += 1;

// Write field IDs (sorted order)
let ids = self.fields.keys().map(|id| *id as usize);
parent_buffer.append_offset_array(ids, None, id_size);
// Write number of fields
if is_large {
buffer[header_pos..header_pos + 4].copy_from_slice(&(num_fields as u32).to_le_bytes());
header_pos += 4;
} else {
buffer[header_pos] = num_fields as u8;
header_pos += 1;
}
Copy link
Member

@viirya viirya Jul 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is basically append_header did before, right? Maybe we can also extract to a function instead of inline here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

assert_eq!(metadata.len(), 1);
assert_eq!(&metadata[0], "name"); // not rolled back
assert_eq!(metadata.len(), 1); // rolled back
assert_eq!(&metadata[0], "name");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, why update this to "rolled back"? The metadata is same as before.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, my bad, this was unintentionally modified.

Copy link
Member

@viirya viirya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this contribution. It looks reasonable and the benchmark looks good. I left a few comments. I also updated the PR description to more clearly describe what the changes are to help it understandable to others interested in this.

@klion26 klion26 force-pushed the 7899-avoid-extra-allocation-in-object-builder branch from d11441d to 5991cc7 Compare July 21, 2025 06:39
Copy link
Member Author

@klion26 klion26 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@viirya Thanks for the detailed review, I've update the code, please take another look when you're free, thanks.

@@ -506,6 +506,7 @@ enum ParentState<'a> {
metadata_builder: &'a mut MetadataBuilder,
fields: &'a mut IndexMap<u32, usize>,
field_name: &'a str,
object_start_offset: usize,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, object_start_offset wants to indicate that this is the offset of the current object start.

if self.validate_unique_fields && !self.duplicate_fields.is_empty() {
let metadata_builder = self.parent_state.metadata_builder();
Copy link
Member Author

@klion26 klion26 Jul 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was intended to avoid the double mutable reference problem, but it's not a problem after the implementation has changed. I can revert this if needed.

assert_eq!(metadata.len(), 1);
assert_eq!(&metadata[0], "name"); // not rolled back
assert_eq!(metadata.len(), 1); // rolled back
assert_eq!(&metadata[0], "name");
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, my bad, this was unintentionally modified.

@@ -1028,18 +1078,29 @@ impl Drop for ListBuilder<'_> {
pub struct ObjectBuilder<'a> {
parent_state: ParentState<'a>,
fields: IndexMap<u32, usize>, // (field_id, offset)
buffer: ValueBuffer,
/// the starting offset in the parent's buffer where this object starts
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@@ -1028,18 +1078,29 @@ impl Drop for ListBuilder<'_> {
pub struct ObjectBuilder<'a> {
parent_state: ParentState<'a>,
fields: IndexMap<u32, usize>, // (field_id, offset)
buffer: ValueBuffer,
/// the starting offset in the parent's buffer where this object starts
object_start_offset: usize,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

object_start_offset: usize,
/// the starting offset in the parent's metadata buffer where this object starts
/// used to truncate the written fields in `drop` if the current object has not been finished
object_meta_start_offset: usize,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

Comment on lines 1208 to 1211
// the length of the metadata's field names is a very cheap to compute the upper bound.
// it will almost always be a tight upper bound as well -- it would take a pretty
// carefully crafted object to use only the early field ids of a large dictionary.
let max_id = metadata_builder.field_names.len();
Copy link
Member Author

@klion26 klion26 Jul 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, metadata_builder.field_names.len() returns the size of the underlying map, it may be bigger than the actual max field_id, changing to this here is that it can avoid one pass for the field_ids to calucalute the max_id.

But this change can make test more diffict(more difficult to calculate the header size) -- such as the failed case from_json::test::test_json_to_variant_object_very_large (test code here/failed ci)

@alamb @viirya @scovich Maybe we can revert this to the original code(travel one pass from filed_ids) what do you think about this? thanks. -- added a separate commit which reverted this logic to see if any other tests failed.

@klion26 klion26 force-pushed the 7899-avoid-extra-allocation-in-object-builder branch from b6af58d to f0d35de Compare July 21, 2025 11:28
@scovich
Copy link
Contributor

scovich commented Jul 21, 2025

@scovich Thank you very much for the detailed review and suggestion, I've addressed most of them (add comments for the rest), please take another look when you're free.

I'm excited to take a look ASAP, but I'll be on the road a lot this week. Please don't take my absenteeism as a lack of interest!

Copy link
Contributor

@scovich scovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks correct and is already faster than the baseline.

I'd love to see a follow-up that tries populating that temp vec with the actual header bytes instead of zeros -- hopefully it would give another nice speed boost on top of this one.

});

let max_id = self.fields.iter().map(|(i, _)| *i).max().unwrap_or(0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would it waste space? With high probability, the largest field id in the object will take the same number of bytes to encode as self.fields.len() - 1 (the highest field id in the dictionary), so using the latter shouldn't change anything?

Comment on lines +1352 to +1226
let header_size = 1 + // header byte
(if is_large { 4 } else { 1 }) + // num_fields
(num_fields * id_size as usize) + // field IDs
((num_fields + 1) * offset_size as usize); // field offsets + data_size
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or try the other approach that populates the tmp vec with non-zero bytes before splicing it into the main buffer? That's a lot simpler than this iterator-based suggestion, and likely performs better too.

self.parent_state
.buffer()
.inner_mut()
.truncate(self.parent_offset_base);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now that we have two offset bases, should we rename this one as parent_value_offset_base for clarity?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Comment on lines -2256 to +2623
assert_eq!(metadata.len(), 1);
assert_eq!(&metadata[0], "name"); // not rolled back
assert!(metadata.is_empty()); // rolled back
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yay! 🎉

@alamb
Copy link
Contributor

alamb commented Jul 21, 2025

🤖 ./gh_compare_arrow.sh Benchmark Script Running
Linux aal-dev 6.11.0-1016-gcp #16~24.04.1-Ubuntu SMP Wed May 28 02:40:52 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing 7899-avoid-extra-allocation-in-object-builder (f0d35de) to 03a837e diff
BENCH_NAME=variant_kernels
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench variant_kernels
BENCH_FILTER=
BENCH_BRANCH_NAME=7899-avoid-extra-allocation-in-object-builder
Results will be posted here when complete

Copy link
Member Author

@klion26 klion26 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@scovich thanks for the review, parent_offset_base has been renamed to parent_value_offset_base, and fill the tmp vec with header before splice improvements will be done with a follow-up pr.

});

let max_id = self.fields.iter().map(|(i, _)| *i).max().unwrap_or(0);
Copy link
Member Author

@klion26 klion26 Jul 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There will be no waste here, this is where I made a mistake before

Comment on lines +1352 to +1226
let header_size = 1 + // header byte
(if is_large { 4 } else { 1 }) + // num_fields
(num_fields * id_size as usize) + // field IDs
((num_fields + 1) * offset_size as usize); // field offsets + data_size
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, will do it in a followup

self.parent_state
.buffer()
.inner_mut()
.truncate(self.parent_offset_base);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again everyone -- this change is looking really nice now.

I went through this PR again and it looks to me like all the comments have been addressed, but I am not 100% sure

@scovich has approved the PR so I assume he is happy with this approach.
@viirya are you happy with this PR now or shall we wait for another round of reivew?

@alamb
Copy link
Contributor

alamb commented Jul 22, 2025

🤖 ./gh_compare_arrow.sh Benchmark Script Running
Linux aal-dev 6.11.0-1016-gcp #16~24.04.1-Ubuntu SMP Wed May 28 02:40:52 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing 7899-avoid-extra-allocation-in-object-builder (19bb544) to 291e6e5 diff
BENCH_NAME=variant_kernels
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench variant_kernels
BENCH_FILTER=
BENCH_BRANCH_NAME=7899-avoid-extra-allocation-in-object-builder
Results will be posted here when complete

@alamb
Copy link
Contributor

alamb commented Jul 22, 2025

🤖: Benchmark completed

Details

group                                                                7899-avoid-extra-allocation-in-object-builder    main
-----                                                                ---------------------------------------------    ----
batch_json_string_to_variant json_list 8k string                     1.00     28.4±0.11ms        ? ?/sec              1.00     28.6±0.15ms        ? ?/sec
batch_json_string_to_variant random_json(2633 bytes per document)    1.00    368.2±4.36ms        ? ?/sec              1.06    389.3±6.43ms        ? ?/sec
batch_json_string_to_variant repeated_struct 8k string               1.00      8.4±0.02ms        ? ?/sec              1.02      8.5±0.02ms        ? ?/sec
variant_get_primitive                                                1.00   1381.5±3.14µs        ? ?/sec              1.00   1380.5±3.31µs        ? ?/sec

@alamb
Copy link
Contributor

alamb commented Jul 22, 2025

Ok, looks good to me. Thanks again. Let's address any additional comments as a follow on PR

@alamb alamb merged commit 6874ffa into apache:main Jul 22, 2025
12 checks passed
@klion26
Copy link
Member Author

klion26 commented Jul 23, 2025

@alamb @scovich @viirya Thank you for the review; I learned a lot from this PR. I've created issues #7977 and #7978 to track the follow-up

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
parquet Changes to the parquet crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Variant] Avoid extra allocation in ObjectBuilder
4 participants