Skip to content

Conversation

@ethan-tyler
Copy link

@ethan-tyler ethan-tyler commented Dec 3, 2025

Which issue does this PR close?

Part of [EPIC] Transaction Support Issues and Pull Requests #1339
Contributes to #735 (Delete support) and #1104 (RowDeltaAction)

What changes are included in this PR?

Adds PositionDeleteFileWriter for writing position delete Parquet files per Iceberg spec:

Writer layer:

  • PositionDeleteFileWriter writes position delete files with correct Iceberg field IDs
  • Schema: file_path (string, field ID 2147483546), pos (i64, field ID 2147483545)
  • Tracks referenced_data_file for single-file delete optimization
  • Comprehensive schema validation (column count, types, non-nullable file_path)
  • Runtime validation for negative position values

Are these changes tested?

13 unit tests for PositionDeleteFileWriter:

  • Schema validation (wrong column count, wrong types, nullable file_path)
  • Runtime validation (negative positions)
  • Edge cases (multiple files, multiple batches, Unicode paths, i64::MAX positions)
  • Parquet field ID verification against Iceberg spec

@ethan-tyler ethan-tyler changed the title feat(transaction): Add DeleteAction for committing delete files feat(writer): Add PositionDeleteFileWriter for position delete files Dec 4, 2025
@ethan-tyler
Copy link
Author

This PR is part of a 3-PR series to add delete file support:

  • This PR - PositionDeleteFileWriter (writer layer)
  • PR 2 - DeleteAction + SnapshotProducer changes (transaction layer)
  • PR 3 - Integration test with REST catalog

Add PositionDeleteFileWriter for writing position delete Parquet files
per Iceberg spec. Position deletes use a specific schema with file_path
(field ID 2147483546) and pos (field ID 2147483545) columns.

- Add PositionDeleteWriterConfig with Iceberg spec field IDs
- Implement PositionDeleteFileWriterBuilder with IcebergWriterBuilder trait
- Implement PositionDeleteFileWriter with IcebergWriter trait
- Track referenced_data_file for single-file delete optimization
- Add comprehensive schema validation with detailed error messages
- Add 13 tests covering edge cases, validation, and Parquet field IDs
@ethan-tyler ethan-tyler closed this Dec 4, 2025
@ethan-tyler ethan-tyler deleted the feat/delete-action branch December 4, 2025 20:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant