Skip to content

Conversation

gabeiglio
Copy link
Contributor

@gabeiglio gabeiglio commented Sep 23, 2025

table.refresh() shouldn't load all the snapshots if we have refs only configured in the catalog. So we should propagate the queryParams to the table ops

@github-actions github-actions bot added the core label Sep 23, 2025
Copy link
Collaborator

@gaborkaszab gaborkaszab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR, @gabeiglio !
For me this makes much sense in general. I added some comments, mainly focusing on extending this to other use-cases too. Let me know what you think!

any());

Table refsTable = catalog.loadTable(TABLE);
refsTable.refresh();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the current implementation of lazy snapshot loading is inconsistent, probably not within the scope of this PR, but for the record:

Table table = catalog.createTable(TABLE, SCHEMA);
table.refresh();

The above code loads the table twice as this test, however both of them loads the full snapshot list not respecting the REFS snapshot loading mode.
This PR might want to solve this for the second load with passing the snapshotMode to RESTTableOperations all across RESTSessionCatalog.
Adjusting the first load (for createTable() and all) would require a REST spec change. See my question about this on Slack.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Partially scratch what I wrote: there is not much sense to do partial snapshot loading for createTable() since there won't be a list of snapshots anyway. But for subsequent refreshes() or commits() we could still do the partial loading. For this the adjustment for the RESTTableOperations is needed.

In that thread I linked above, @nastra said that partial snapshot loading could make sense for RESTTableOperations.refresh() and commit() but only in case there were no snapshots() call since the creation of the ops, or in other works in case current.snapshotsLoaded is false. This flag is not exposed now, but in case we exposed it, then there might be no need to pass SnapshotMode to the RESTTableOps, because snapshptsLoaded can only be false in case REFS were used for loading the table so would be somewhat redundant.
WDYT @nastra ?

Copy link
Contributor

@nastra nastra Sep 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or in other works in case current.snapshotsLoaded is false

I actually didn't mean to refer to this flag and we don't want to expose this, since this is internal information. What I meant is that it's not helpful to have ref loading in a place when the calling code potentially triggers an implicit call of snapshots(), which would do another request to load the remaining snapshots. That also means if there are cases where snapshots() isn't triggered and the table has lots and lots of snapshots, then this would be a good use case to think about adding ref loading mode there.
I don't want us to add this feature to all places just for the sake of adding it but rather we want to add it where it's actually helpful and makes a difference in terms of loading times of a table

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gaborkaszab @nastra thank you both for putting your thoughts into this! I also concur that we should add this where makes the most sense (mainly refresh, load, and commit). I can add the changes for commit in a separate PR since this will require rest spec changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants