Skip to content

fix: Add support for unsigned Arrow datatypes in schema conversion #1617

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

GeetKrishna
Copy link

Which issue does this PR close?

What changes are included in this PR?

Bug Fixes

  • Fixed crash when ArrowSchemaConverter encounters unsigned datatypes
  • Resolved "Unsupported Arrow data type" errors for UInt8/16/32/64

Features

  • Added round-trip conversion support for all unsigned Arrow types
  • Implemented metadata-based type preservation using field documentation
  • Added byte-to-byte conversion: UInt8/16/32 → Int32, UInt64 → Int64

Code Changes

  • Added constants: UNSIGNED_TYPE_PREFIX, ARROW_FIELD_UNSIGNED_KEY
  • Added helper functions: get_unsigned_type_name(), restore_unsigned_type()
  • Enhanced ArrowSchemaConverter: Support for unsigned types in primitive() method
  • Enhanced ToArrowSchemaConverter: Restoration logic in field() method
  • Added comprehensive tests: test_unsigned_type_conversion() for all unsigned variants

Files Modified

  • crates/iceberg/src/arrow/schema.rs

Impact

  • ✅ No breaking changes - existing functionality preserved
  • ✅ No data loss - all unsigned values preserved through conversion
  • ✅ Backward compatible with existing Iceberg schemas
  • ✅ Minimal performance overhead only for unsigned types

Are these changes tested?

  • All existing schema tests pass
  • New comprehensive test covers UInt8, UInt16, UInt32, UInt64 round-trip conversion

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

bug: ArrowSchemaConverter can't handle unsigned datatypes from arrow
1 participant