Spanner Persistence Backend for Polaris #2328

byronellis · 2025-08-12T08:32:28Z

This is the first commit in a series that implements a Spanner Persistence backend for Polaris. It is largely modeled on the JDBC implementation though it takes advantage of Spanner-specific features such as nested tables.

In this design the parent table is Realm with all other tables nested within that parent table.

In this initial PR we commit the basic classes needed to manage the Spanner connection lifecycle as well as the Realm model as Realm bootstrapping is part of Lifecycle management. This was cleaner than making it part of the persistence implementation and has the added benefit of allowing for a cleaner initial PR.

…M. Storage is included as the larger libraries BOM and the version matches the desired Storage version.

…mmit defines the overall Spanner schema with Realm as the parent table for all entities. This is the minimum needed to bootstrap a realm.

snazy

Overall the code looks good to me.
Hard to judge on the concrete implementation (yet, without knowing the follow-ups).

Appreciate having emulator-configuration for testing purposes!

Couple of thoughts and suggestions, nothing serious.

.../java/org/apache/polaris/persistence/relational/spanner/GoogleCloudSpannerConfiguration.java

snazy · 2025-08-12T09:55:26Z

...polaris/persistence/relational/spanner/GoogleCloudSpannerDatabaseClientLifecycleManager.java

+  }
+
+  @Produces
+  public Consumer<SchemaOptions> getSchemaInitializer() {


Nit: I'd personally introduce a separate interface type that extends Consumer<X> (ran into issues with CDI + generics in the past - might no longer be an issue though).

Converted this to an explicit SchemaInitializer consumer.

For completeness converted the rest of the consumers and suppliers as well.

persistence/google-cloud-spanner/build.gradle.kts

.../java/org/apache/polaris/persistence/relational/spanner/GoogleCloudSpannerConfiguration.java

...polaris/persistence/relational/spanner/GoogleCloudSpannerDatabaseClientLifecycleManager.java

…eplace the Consumer<T> and Supplier<T> classes to avoid potential problems with resolution.

byronellis

Thanks for the review, updated the PR to address your comments.

eric-maynard

Thanks @byronellis for the contribution! Overall this looks promising, I left a few initial comments. How close is this to being regtest-able?

...polaris/persistence/relational/spanner/GoogleCloudSpannerDatabaseClientLifecycleManager.java

eric-maynard · 2025-08-12T18:42:44Z

.../java/org/apache/polaris/persistence/relational/spanner/GoogleCloudSpannerConfiguration.java

+@ConfigMapping(prefix = "polaris.persistence.spanner")
+public interface GoogleCloudSpannerConfiguration {


nit: Since we're calling the config spanner, I wonder if we need the GoogleCloud- prefix everywhere? It's making things quite long. For similar cloud-specific types like S3StorageLocation or GcpStorageConfigInfo we are not quite as verbose

No real preference since my editor autocompletes everything for me... It's already implemented as GoogleCloudSpanner* so it would be a fair amount of toil to rename for what amounts to an aesthetic decision though.

Sorry where is it already implemented? Autocompletion / writing long type names is not an issue, but reading them and dealing with line-size constraints can be.

It is an aesthetic decision, but GoogleCloudSpannerDatabaseClientLifecycleManager.java would set a new record for the longest .java filename in the project :)

The full implementation is here: https://github.com/byronellis/polaris/tree/spanner-persistence just didn't want to drop all of that on y'all in one go.

I'm just going to leave the name as-is for right now, if someone feels really strongly about it they can file a PR to change all the callsites later if that works for you.

...polaris/persistence/relational/spanner/GoogleCloudSpannerDatabaseClientLifecycleManager.java

eric-maynard · 2025-08-12T18:44:44Z

...polaris/persistence/relational/spanner/GoogleCloudSpannerDatabaseClientLifecycleManager.java

+        if (s.startsWith("--") || s.length() == 0) {
+          continue;
+        }
+        lines.add(s);


Suggested change

if (s.startsWith("--") || s.length() == 0) {

continue;

}

lines.add(s);

if (!s.startsWith("--") && s.length() > 0) {

lines.add(s);

}

I think I'm OK? I wanted to remove lines that are comments or just blank... Added a check for only containing ';' in case some monster does that... that said it's not like we're sending arbitrary SQL through this thing so we don't need to be super careful.

eric-maynard · 2025-08-12T18:45:17Z

...polaris/persistence/relational/spanner/GoogleCloudSpannerDatabaseClientLifecycleManager.java

+      List<String> lines = new ArrayList<>();
+      for (String s : schema.split("\n")) {
+        s = s.trim();
+        if (s.startsWith("--") || s.length() == 0) {


Do we also need to check if the line ends with ;? Later we split on that

Added a check for lines only containing ';'

eric-maynard · 2025-08-12T18:46:26Z

...main/java/org/apache/polaris/persistence/relational/spanner/DatabaseAdminClientSupplier.java

Should the package really be org.apache.polaris.persistence.relational.spanner;? Or org.apache.polaris.persistence.spanner? I thought relational was meant for the JDBC metastore or RDBMS more generally

I don't have any preference (personally I think relational vs not is just implementation detail), though if you have the pick Spanner in 2025 really fits better in the relational model than the nosql model.

Yeah so this is what I'm saying, I think the package name is a bit misleading. This is really for relational-jdbc, it's not saying everything in this package uses a relational model (and implicitly everything outside of it does not)

I think what mostly happened is that when I started this y'all had a relational package for eclipselink and jdbc but somewhere along the way you moved things up a level in the project structure... Probably the best way to resolve that would be to merge the PRs and then do a package name refactor since the refactoring tool could do it all in one go. Also makes it clear what's happening in the commit chain.

eh, changed my mind and just started moving things now. A little more work building the follow on PRs but maybe clearer for folks

...oud-spanner/src/main/java/org/apache/polaris/persistence/relational/spanner/model/Realm.java

...d-spanner/src/main/java/org/apache/polaris/persistence/relational/spanner/util/Modifier.java

...panner/src/main/java/org/apache/polaris/persistence/relational/spanner/util/SpannerUtil.java

flyrain · 2025-08-12T19:09:15Z

gradle/libs.versions.toml

 eclipselink = { module = "org.eclipse.persistence:eclipselink", version = "4.0.7" }
 errorprone = { module = "com.google.errorprone:error_prone_core", version = "2.41.0" }
-google-cloud-storage-bom = { module = "com.google.cloud:google-cloud-storage-bom", version = "2.55.0" }
+google-cloud-libraries-bom = { module = "com.google.cloud:libraries-bom", version = "26.64.0" }


We will need to change the License file for this change, somewhere like here

polaris/runtime/server/distribution/LICENSE

Line 556 in d7f15a2

Group: com.google.api.grpc Name: proto-google-cloud-storage-v2 Version: 2.53.0

. cc @jbonofre

Good point about LICENSE updates, however using a BOM does not necessary require LICENSE changes... only real dependencies need to be mentioned... IMHO, that can be done later (we have to double check dependencies for every release anyway).

I'm OK with a followup PR.

byronellis

@eric-maynard The implementation is largely code complete, I just thought it would be rude to drop a 3000 line PR on folks and the commits didn't make a whole lot of sense as split points so I opted for a set of clean commits that reflect well-defined pieces of the implementation instead since I ended up doing it in kind of a big marathon session.

Right this second I believe there are 2 integration tests of the ~200 or so that are failing (they seem like they may be related) but all of the happy path integration tests seem to be working through the emulator at least.

dimas-b · 2025-08-12T21:39:54Z

gradle/libs.versions.toml

 eclipselink = { module = "org.eclipse.persistence:eclipselink", version = "4.0.7" }
 errorprone = { module = "com.google.errorprone:error_prone_core", version = "2.41.0" }
-google-cloud-storage-bom = { module = "com.google.cloud:google-cloud-storage-bom", version = "2.55.0" }
+google-cloud-libraries-bom = { module = "com.google.cloud:libraries-bom", version = "26.64.0" }


Just wondering: does libraries-bom get updated as frequently as any of its upstream artifacts are published?

I believe most of the Cloud SDKs, libraries included, are on a two week cadence more or less. The advantage (coming from the Beam experience with the Cloud Java SDKs) of using the BOM is that it keeps the various support libraries synchronized across specific SDKS. What happens otherwise is you get version drift in shared components like Protobuf or gRPC core libraries which can be really hard to spot.

dimas-b

Glad to see Spanner Persistence materializing 🙂 thanks for you work on this @byronellis !

...polaris/persistence/relational/spanner/GoogleCloudSpannerDatabaseClientLifecycleManager.java

dimas-b · 2025-08-12T21:47:59Z

...polaris/persistence/relational/spanner/GoogleCloudSpannerDatabaseClientLifecycleManager.java

+      try {
+        spanner
+            .getDatabaseClient(databaseId)
+            .write(ImmutableList.of(Realm.upsert(realmContext.getRealmIdentifier())));


I wonder if this should be done under the "bootstrap" call path as opposed to on observing new realm IDs in runtime. The difference would be delegating realm initialization to the "admin" user / admin tool. Cf. #2196

Also RealmContext CDI beans may come and go very frequently in runtime (once per request at least).

Cf. https://lists.apache.org/thread/9nl0dt6fhqx4t4q55kyzf1v9r2vhl2gg

Agreed with @dimas-b. The realm initialization doesn't happen very often. It only happens when we bootstrap a new realm, https://polaris.apache.org/in-dev/unreleased/admin-tool/#bootstrapping-realms-and-principal-credentials. Producing a bean here isn't necessary to me, as Polaris server will never use it for realm initialization. Here is the reference code path in JDBC impl.: https://github.com/polaris-catalog/polaris/blob/main/persistence/relational-jdbc/src/main/java/org/apache/polaris/persistence/relational/jdbc/JdbcMetaStoreManagerFactory.java#L142-L142

Fair enough, moving this to the bootstrap code.

dimas-b · 2025-08-12T21:58:16Z

...polaris/persistence/relational/spanner/GoogleCloudSpannerDatabaseClientLifecycleManager.java

+  protected DatabaseId databaseId;
+
+  @PostConstruct
+  protected void init() {


just wondering: why not do the init work in the constructor?

flyrain

Thanks @byronellis for working on it. LGTM generally. Left minor comments and I think some beans are not necessary.

...polaris/persistence/relational/spanner/GoogleCloudSpannerDatabaseClientLifecycleManager.java

flyrain · 2025-08-12T23:08:29Z

...polaris/persistence/relational/spanner/GoogleCloudSpannerDatabaseClientLifecycleManager.java

+            .getDatabaseClient(databaseId)
+            .write(ImmutableList.of(Realm.upsert(realmContext.getRealmIdentifier())));
+      } catch (SpannerException e) {
+        LOGGER.error("Unable to initialize realm " + realmContext.getRealmIdentifier(), e);


Throw a runtime exception instead of logging an error? So that the stack trace will show the complete call chain.

flyrain · 2025-08-12T23:15:56Z

...polaris/persistence/relational/spanner/GoogleCloudSpannerDatabaseClientLifecycleManager.java

+      try {
+        spanner
+            .getDatabaseClient(databaseId)
+            .write(ImmutableList.of(Realm.upsert(realmContext.getRealmIdentifier())));


Agreed with @dimas-b. The realm initialization doesn't happen very often. It only happens when we bootstrap a new realm, https://polaris.apache.org/in-dev/unreleased/admin-tool/#bootstrapping-realms-and-principal-credentials. Producing a bean here isn't necessary to me, as Polaris server will never use it for realm initialization. Here is the reference code path in JDBC impl.: https://github.com/polaris-catalog/polaris/blob/main/persistence/relational-jdbc/src/main/java/org/apache/polaris/persistence/relational/jdbc/JdbcMetaStoreManagerFactory.java#L142-L142

flyrain · 2025-08-12T23:32:48Z

...polaris/persistence/relational/spanner/GoogleCloudSpannerDatabaseClientLifecycleManager.java

+  }
+
+  @Produces
+  public SchemaInitializer getSchemaInitializer() {


Same here. The scheme options are only used while bootstrapping. Bootstrappping was done by the Admin tool. It doesn't seem necessary to me that we need a SchemaInitializer bean for any dynamic options inside Polaris server. A normal function with schema options should be good enough.

flyrain · 2025-08-12T23:33:49Z

...polaris/persistence/relational/spanner/GoogleCloudSpannerDatabaseClientLifecycleManager.java

+    databaseId = SpannerUtil.databaseFromConfiguration(spannerConfiguration);
+  }
+
+  protected List<String> getSpannerDatabaseDdl(SchemaOptions options) {


I think we could make this method static.

XN137 · 2025-08-13T08:21:13Z

...panner/src/main/java/org/apache/polaris/persistence/relational/spanner/util/SpannerUtil.java

+  public static Map<String, String> jsonMap(String properties) {
+    HashMap<String, String> map = new HashMap<>();
+    if (properties != null && !properties.isBlank()) {
+      JsonObject obj = GSON.fromJson(properties, JsonObject.class);


wondering: is GSON a strict requirement here?
the rest of the codebase uses jackson for stuff like this

I don't think it's a strict requirement. It comes along for the ride with Spanner so I used it more or less out of habit. Jackson would be fine too I think? Let me try it

…e also refactored

github-actions · 2025-09-15T02:08:02Z

This PR is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

jbonofre · 2025-09-22T13:29:08Z

Reviving this PR as it's a very interesting one. I will start the review.

byronellis · 2025-09-22T15:02:44Z

Thanks JB, I think there might be one thing that people wanted me to change but Life has been pretty busy this month (probably also needs to be rebased since I'm sure some breaking change has happened). There's more to the implementation, but wanted to break things up so nobody had to review a 5k line PR :-)

…

On Mon, Sep 22, 2025 at 6:29 AM JB Onofré ***@***.***> wrote: *jbonofre* left a comment (apache/polaris#2328) <#2328 (comment)> Reviving this PR as it's a very interesting one. I will start the review. — Reply to this email directly, view it on GitHub <#2328 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAE7OMA3ND6HSXCTUQV7TFT3T72TZAVCNFSM6AAAAACDVXF7FOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTGMJZGA2DQNZZGQ> . You are receiving this because you were mentioned.Message ID: ***@***.***>

-- Byron Ellis ***@***.***) "Oook" -- The Librarian

dimas-b

My earlier comments got addressed, so I'm ok merging this PR (unless other reviewers object) and waiting for further Spanner contributions in other PRs 🙂

byronellis added 3 commits August 12, 2025 01:25

Add vscode settings to .gitignore

c9a69db

Use the Google Cloud Libraries BOM instead of the Storage specific BO…

309b5ca

…M. Storage is included as the larger libraries BOM and the version matches the desired Storage version.

Add the initial implementation of the Spanner DAO for review. This co…

09c4d42

…mmit defines the overall Spanner schema with Realm as the parent table for all entities. This is the minimum needed to bootstrap a realm.

github-project-automation bot added this to Basic Kanban Board Aug 12, 2025

github-project-automation bot moved this to PRs In Progress in Basic Kanban Board Aug 12, 2025

snazy reviewed Aug 12, 2025

View reviewed changes

Updated in response to comments. In particular added new classes to r…

6819410

…eplace the Consumer<T> and Supplier<T> classes to avoid potential problems with resolution.

byronellis commented Aug 12, 2025

View reviewed changes

eric-maynard reviewed Aug 12, 2025

View reviewed changes

flyrain reviewed Aug 12, 2025

View reviewed changes

byronellis commented Aug 12, 2025

View reviewed changes

dimas-b reviewed Aug 12, 2025

View reviewed changes

flyrain reviewed Aug 12, 2025

View reviewed changes

byronellis added 2 commits August 12, 2025 17:45

Addressing more reviewer comments

5ee86ee

spotless fix

191e14f

XN137 reviewed Aug 13, 2025

View reviewed changes

byronellis added 2 commits August 14, 2025 18:14

Change package name to get rid of relational since other packages wer…

e04ca6a

…e also refactored

spotless

4c976cc

github-actions bot added the Stale label Sep 15, 2025

github-actions bot closed this Sep 21, 2025

github-project-automation bot moved this from PRs In Progress to Done in Basic Kanban Board Sep 21, 2025

jbonofre reopened this Sep 22, 2025

github-project-automation bot moved this from Done to PRs In Progress in Basic Kanban Board Sep 22, 2025

dimas-b approved these changes Sep 22, 2025

View reviewed changes

github-project-automation bot moved this from PRs In Progress to Ready to merge in Basic Kanban Board Sep 22, 2025

github-actions bot removed the Stale label Sep 23, 2025

		@ConfigMapping(prefix = "polaris.persistence.spanner")
		public interface GoogleCloudSpannerConfiguration {

Spanner Persistence Backend for Polaris #2328

Are you sure you want to change the base?

Spanner Persistence Backend for Polaris #2328

Uh oh!

Conversation

byronellis commented Aug 12, 2025

Uh oh!

snazy left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

byronellis left a comment

Choose a reason for hiding this comment

Uh oh!

eric-maynard left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

byronellis left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dimas-b left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!