Skip to content

Conversation

brancz
Copy link
Contributor

@brancz brancz commented Jul 14, 2025

Which issue does this PR close?

Does not yet close, but contributes towards:

Rationale for this change

See the above issues. And this is a follow up to

This was also split out from: #7467

What changes are included in this PR?

The required changes, so that nothing depends on the canonical Schema's Field containing the dict_id field anymore, so that as a follow up, the dict_id field can actually be removed from the deprecated function signatures and remove the field itself.

Are these changes tested?

All tests continue to pass.

Are there any user-facing changes?

The function signatures of these publicly facing APIs changed to provide the appropriate access to the dict ID as it is represented in the respective IPC message(s):

  • flight_data_to_arrow_batch (arrow-flight)
  • arrow_data_from_flight_data (arrow-flight; sql)
  • read_record_batch (arrow-ipc)
  • read_dictionary (arrow-ipc)
  • FileDecoder::new (arrow-ipc)

@tustvold @alamb @thinkharderdev @adriangb

@github-actions github-actions bot added parquet Changes to the parquet crate arrow Changes to the arrow crate arrow-flight Changes to the arrow-flight crate labels Jul 14, 2025
@brancz brancz force-pushed the dict-id-ipc-only branch 3 times, most recently from d185110 to 6042909 Compare July 15, 2025 07:51
@brancz brancz force-pushed the dict-id-ipc-only branch from 6042909 to 8b4c8b6 Compare July 16, 2025 09:32
@brancz
Copy link
Contributor Author

brancz commented Jul 16, 2025

Actually while debugging the test failure, I realized I can split up this PR some more, by just opening the removal of the preserving dict id code paths.

@brancz
Copy link
Contributor Author

brancz commented Jul 17, 2025

Rebasing this on #7940 now.

alamb pushed a commit that referenced this pull request Jul 18, 2025
# Which issue does this PR close?

Does not yet close, but contributes towards:

- #6356 
- #5981 
- #1206

# Rationale for this change

See the above issues. And this is a follow up to

* #6711
* #6873

This was also split out from:
#7929

# What changes are included in this PR?

This removes the API to allow preserving `dict_id` set in the `Schema`'s
`Field` within arrow-ipc and arrow-flight. This is in an effort to
remove the `dict_id` field entirely and make it an IPC/flight-only
concern.

# Are these changes tested?

Yes, all existing tests continue to pass.

# Are there any user-facing changes?

Yes, these previously (in 54.0.0) deprecated functions/fields are
removed:

* `arrow_ipc::DictionaryTracker.set_dict_id`
* `arrow_ipc::DictionaryTracker::new_with_preserve_dict_id`
* `arrow_ipc::IpcWriteOptions.with_preserve_dict_id`
* `arrow_ipc::IpcWriteOptions.preserve_dict_id` (function and field)
* `arrow_ipc::schema_to_fb`
* `arrow_ipc::schema_to_bytes`
@alamb
Copy link
Contributor

alamb commented Jul 21, 2025

Close / reopen to rerun CI (I think github actions was having problems previously)

@alamb alamb closed this Jul 21, 2025
@alamb alamb reopened this Jul 21, 2025
@brancz
Copy link
Contributor Author

brancz commented Jul 22, 2025

One of the tests is actually failing correctly (one of the integration tests), and I've realized what I need to do, but I'll need a few days to get to it. I'll make sure to ping you when I've got it fixed!

@alamb alamb marked this pull request as draft July 23, 2025 11:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate arrow-flight Changes to the arrow-flight crate parquet Changes to the parquet crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants