Skip to content

fix(databricks): lazily create memtable volume at first use#11962

Open
NickCrews wants to merge 1 commit intoibis-project:mainfrom
NickCrews:databricks-lazy-memtables
Open

fix(databricks): lazily create memtable volume at first use#11962
NickCrews wants to merge 1 commit intoibis-project:mainfrom
NickCrews:databricks-lazy-memtables

Conversation

@NickCrews
Copy link
Copy Markdown
Contributor

Fixes #11598

Previously, the memtable volume was created eagerly in _post_connect() immediately after connecting, which caused read-only users to fail with a PermissionDenied error even when they never intended to use memtables.

This change defers volume creation until the first memtable is actually registered via _register_in_memory_table(), allowing read-only users to connect and query existing data without requiring CREATE VOLUME privilege.

This is a redo of #11956,
where I take a more complete approach:

  • move all logic into a MemtablesManager class. I didn't like how the state was spread throughout the rest of the Backend, it was leaking everywhere. Now I think it is a lot more clear who is responsible for what
  • This fixed a bug where in .do_connect() we set _memtable_catalog and _memtable_database, but in .from_connection() we did not. I'm pretty sure and backend instance created with .from_connection would be in an invalid state and wouldn't have been able to use memtables at all. By centralizing the memtable management into a class, this also reduces the chance for this sort of bug where backends can get created in multiple ways.
  • This changes how the memtable_volume param is interpreted. Before, it was interpreted as just the suffix to a larger path that was generated, eg if you passed foo then the memtables would actually be created at /Volumes/the_current_catalog_at_init_time/the_current_database_at_init_time/foo. After this change, you just get the plain foo, so the called has much more control over what they want the volume to be.

I didn't add tests because I couldn't find a way to easily create a readonly connection. Perhaps we could create a new role in our databricks account that we use for testing? But mostly I treat this as a refactor, so if existing tests don't begin failing, that's good enough for me.

@github-actions github-actions bot added the databricks The Databricks backend label Mar 5, 2026
@NickCrews NickCrews force-pushed the databricks-lazy-memtables branch from 42b386f to 4429f4c Compare March 5, 2026 20:54
@NickCrews NickCrews changed the title fix(databricks): Lazily create memtable volume at first use fix(databricks): lazily create memtable volume at first use Mar 5, 2026
@NickCrews
Copy link
Copy Markdown
Contributor Author

@VENKATASAI1994 what do you think of this alternate formulation?

@VENKATASAI1994
Copy link
Copy Markdown

VENKATASAI1994 commented Mar 15, 2026

@VENKATASAI1994 what do you think of this alternate formulation?

Looks good now and works, i have tested this with read only.
One of the check failing here, could you look into that and merge. We need this for our company. @NickCrews

Fixes ibis-project#11598

Previously, the memtable volume was created eagerly in _post_connect()
immediately after connecting, which caused read-only users to fail with
a PermissionDenied error even when they never intended to use memtables.

This change defers volume creation until the first memtable is actually
registered via _register_in_memory_table(), allowing read-only users to
connect and query existing data without requiring CREATE VOLUME privilege.
@NickCrews NickCrews force-pushed the databricks-lazy-memtables branch from 4429f4c to 2823bf9 Compare March 25, 2026 19:49
@NickCrews
Copy link
Copy Markdown
Contributor Author

NickCrews commented Mar 25, 2026

The failing check is for codecov and I'm fine merging with it failing, as I said I don't see an easy way to test.

@deepyaman or @gforsyth perhaps you have the bandwidth for a review of this to help unblock @VENKATASAI1994 (and several others who have chimed in on #11598). I think the new implementation is pretty easy to understand, and I think I described the rationale and the changes pretty well in the top comment in this PR. Thanks in case you do have time!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

databricks The Databricks backend

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug: Unable to connect to Databrick with Read only roles

2 participants