You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Implement Improved arrow-avro Reader Zero-Byte Record Handling (#7966)
# Which issue does this PR close?
- Part of #4886
- Follow up to #7834
# Rationale for this change
The initial Avro reader implementation contained an under-developed and
temporary safeguard to prevent infinite loops when processing records
that consumed zero bytes from the input buffer.
When the `Decoder` reported that zero bytes were consumed, the `Reader`
would advance it's cursor to the end of the current data block. While
this successfully prevented an infinite loop, it had the critical side
effect of silently discarding any remaining data in that block, leading
to potential data loss.
This change enhances the decoding logic to handle these zero-byte values
correctly, ensuring that the `Reader` makes proper progress without
dropping data and without risking an infinite loop.
# What changes are included in this PR?
- **Refined Decoder Logic**: The `Decoder` has been updated to
accurately track and report the number of bytes consumed for all values,
including valid zero-length records like `null` or empty `bytes`. This
ensures the decoder always makes forward progress.
- **Removal of Data-Skipping Safeguard**: The logic in the `Reader` that
previously advanced to the end of a block on a zero-byte read has been
removed. The reader now relies on the decoder to report accurate
consumption and advances its cursor incrementally and safely.
- * New integration test using a temporary `zero_byte.avro` file created
via this python script:
https://gist.github.com/jecsand838/e57647d0d12853f3cf07c350a6a40395
# Are these changes tested?
Yes, a new `test_read_zero_byte_avro_file` test was added that reads the
new `zero_byte.avro` file and confirms the update.
# Are there any user-facing changes?
N/A
# Follow-Up PRs
1. PR to update `test_read_zero_byte_avro_file` once
apache/arrow-testing#109 is merged in.
0 commit comments