-
Notifications
You must be signed in to change notification settings - Fork 290
Fix TableIdentifier in TaskFileIOSupplier #2304
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
XN137
wants to merge
1
commit into
apache:main
Choose a base branch
from
XN137:Fix-TableIdentifier-in-TaskFileIOSupplier
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the tableId just be a part of the TaskEntity?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
depends on whether there will ever be tasks that dont operate on a single iceberg table... currently the
TaskEntity
design seems to leave this openThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thinking about this again, since the
TableIdentifier
is downstream only getting used for logging, it would seem an disproportionate amount of effort to change theTaskEntity
to avoid passing in the tableidentifier here (that is already available in each task handler)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why would the existence of a task that involves multiple tables mean that a given task can't include a table name in its properties?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry i might not be fully understanding what you had been asking for.
could you please provide a more detailed design of how we would be making the tableId part of the TaskEntity?
and then also why that would be a better solution that simply passing in the tableId that is already available in the task handlers? (which is taken from
TaskEntity.readData
so its already part of theTaskEntity
in a way)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, sorry, I completely missed that. Anyway, here is a one-line fix that doesn't change any APIs. From my debugger:

There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This correctly extracts the TableIdentifier (via the TableLikeEntity) from the TASK_DATA like I mentioned above.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
afaict your "one-line fix" only works for table cleanup tasks, where we happen to store a
IcebergTableLikeEntity
into theTASK_DATA
:polaris/polaris-core/src/main/java/org/apache/polaris/core/persistence/transactional/TransactionalMetaStoreManagerImpl.java
Lines 1433 to 1437 in 1277eff
for other tasks, it would be throwing an error as we store something else, for example:
polaris/runtime/service/src/main/java/org/apache/polaris/service/task/ManifestFileCleanupTaskHandler.java
Lines 64 to 66 in 20febda
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that's what I was trying to say above -- if there is a TableIdentifier in the TASK_DATA we should be extracting and using that. The current "cast" of a TaskEntity into a TableLikeEntity is definitely wrong, but if we can fix this the easy way we should.
If there is a task that doesn't have a TableIdentifier in TASK_DATA we should either add it there or remove the need for it (i.e. pass in
null
toloadFileIO
and mark itNullable
) if that's easier.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it seems like we have circled back to a previous point in our discussion, so i am posting my reply again:
i still dont have an answer as to why changing
TaskFileIOSupplier
is a no-go when all callers have theTableIdentifier
already and the underlyingFileIOFactory
(in its current form) requires it.