-
Notifications
You must be signed in to change notification settings - Fork 2.8k
Rest: Allow refresh() to load refs only metadata #14166
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the PR, @gabeiglio !
For me this makes much sense in general. I added some comments, mainly focusing on extending this to other use-cases too. Let me know what you think!
any()); | ||
|
||
Table refsTable = catalog.loadTable(TABLE); | ||
refsTable.refresh(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the current implementation of lazy snapshot loading is inconsistent, probably not within the scope of this PR, but for the record:
Table table = catalog.createTable(TABLE, SCHEMA);
table.refresh();
The above code loads the table twice as this test, however both of them loads the full snapshot list not respecting the REFS
snapshot loading mode.
This PR might want to solve this for the second load with passing the snapshotMode
to RESTTableOperations
all across RESTSessionCatalog
.
Adjusting the first load (for createTable() and all) would require a REST spec change. See my question about this on Slack.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Partially scratch what I wrote: there is not much sense to do partial snapshot loading for createTable() since there won't be a list of snapshots anyway. But for subsequent refreshes() or commits() we could still do the partial loading. For this the adjustment for the RESTTableOperations is needed.
In that thread I linked above, @nastra said that partial snapshot loading could make sense for RESTTableOperations.refresh() and commit() but only in case there were no snapshots() call since the creation of the ops, or in other works in case current.snapshotsLoaded is false. This flag is not exposed now, but in case we exposed it, then there might be no need to pass SnapshotMode to the RESTTableOps, because snapshptsLoaded can only be false in case REFS were used for loading the table so would be somewhat redundant.
WDYT @nastra ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or in other works in case current.snapshotsLoaded is false
I actually didn't mean to refer to this flag and we don't want to expose this, since this is internal information. What I meant is that it's not helpful to have ref loading in a place when the calling code potentially triggers an implicit call of snapshots()
, which would do another request to load the remaining snapshots. That also means if there are cases where snapshots()
isn't triggered and the table has lots and lots of snapshots, then this would be a good use case to think about adding ref loading mode there.
I don't want us to add this feature to all places just for the sake of adding it but rather we want to add it where it's actually helpful and makes a difference in terms of loading times of a table
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gaborkaszab @nastra thank you both for putting your thoughts into this! I also concur that we should add this where makes the most sense (mainly refresh, load, and commit). I can add the changes for commit in a separate PR since this will require rest spec changes.
06237d6
to
a023c81
Compare
table.refresh()
shouldn't load all the snapshots if we have refs only configured in the catalog. So we should propagate the queryParams to the table ops