Skip to content

Spanner Persistence Backend for Polaris #2328

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

byronellis
Copy link

This is the first commit in a series that implements a Spanner Persistence backend for Polaris. It is largely modeled on the JDBC implementation though it takes advantage of Spanner-specific features such as nested tables.

In this design the parent table is Realm with all other tables nested within that parent table.

In this initial PR we commit the basic classes needed to manage the Spanner connection lifecycle as well as the Realm model as Realm bootstrapping is part of Lifecycle management. This was cleaner than making it part of the persistence implementation and has the added benefit of allowing for a cleaner initial PR.

…M. Storage is included as the larger libraries BOM and the version matches the desired Storage version.
…mmit defines the overall Spanner schema with Realm as the parent table for all entities. This is the minimum needed to bootstrap a realm.
Copy link
Member

@snazy snazy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall the code looks good to me.
Hard to judge on the concrete implementation (yet, without knowing the follow-ups).

Appreciate having emulator-configuration for testing purposes!

Couple of thoughts and suggestions, nothing serious.

}

@Produces
public Consumer<SchemaOptions> getSchemaInitializer() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I'd personally introduce a separate interface type that extends Consumer<X> (ran into issues with CDI + generics in the past - might no longer be an issue though).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Converted this to an explicit SchemaInitializer consumer.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For completeness converted the rest of the consumers and suppliers as well.

…eplace the Consumer<T> and Supplier<T> classes to avoid potential problems with resolution.
Copy link
Author

@byronellis byronellis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review, updated the PR to address your comments.

Copy link
Contributor

@eric-maynard eric-maynard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @byronellis for the contribution! Overall this looks promising, I left a few initial comments. How close is this to being regtest-able?

Comment on lines 25 to 26
@ConfigMapping(prefix = "polaris.persistence.spanner")
public interface GoogleCloudSpannerConfiguration {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Since we're calling the config spanner, I wonder if we need the GoogleCloud- prefix everywhere? It's making things quite long. For similar cloud-specific types like S3StorageLocation or GcpStorageConfigInfo we are not quite as verbose

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No real preference since my editor autocompletes everything for me... It's already implemented as GoogleCloudSpanner* so it would be a fair amount of toil to rename for what amounts to an aesthetic decision though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry where is it already implemented? Autocompletion / writing long type names is not an issue, but reading them and dealing with line-size constraints can be.

It is an aesthetic decision, but GoogleCloudSpannerDatabaseClientLifecycleManager.java would set a new record for the longest .java filename in the project :)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The full implementation is here: https://github.com/byronellis/polaris/tree/spanner-persistence just didn't want to drop all of that on y'all in one go.

I'm just going to leave the name as-is for right now, if someone feels really strongly about it they can file a PR to change all the callsites later if that works for you.

Comment on lines 90 to 93
if (s.startsWith("--") || s.length() == 0) {
continue;
}
lines.add(s);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (s.startsWith("--") || s.length() == 0) {
continue;
}
lines.add(s);
if (!s.startsWith("--") && s.length() > 0) {
lines.add(s);
}

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'm OK? I wanted to remove lines that are comments or just blank... Added a check for only containing ';' in case some monster does that... that said it's not like we're sending arbitrary SQL through this thing so we don't need to be super careful.

List<String> lines = new ArrayList<>();
for (String s : schema.split("\n")) {
s = s.trim();
if (s.startsWith("--") || s.length() == 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we also need to check if the line ends with ;? Later we split on that

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a check for lines only containing ';'

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the package really be org.apache.polaris.persistence.relational.spanner;? Or org.apache.polaris.persistence.spanner? I thought relational was meant for the JDBC metastore or RDBMS more generally

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have any preference (personally I think relational vs not is just implementation detail), though if you have the pick Spanner in 2025 really fits better in the relational model than the nosql model.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah so this is what I'm saying, I think the package name is a bit misleading. This is really for relational-jdbc, it's not saying everything in this package uses a relational model (and implicitly everything outside of it does not)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think what mostly happened is that when I started this y'all had a relational package for eclipselink and jdbc but somewhere along the way you moved things up a level in the project structure... Probably the best way to resolve that would be to merge the PRs and then do a package name refactor since the refactoring tool could do it all in one go. Also makes it clear what's happening in the commit chain.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

eh, changed my mind and just started moving things now. A little more work building the follow on PRs but maybe clearer for folks

@@ -49,7 +49,7 @@ commons-lang3 = { module = "org.apache.commons:commons-lang3", version = "3.18.0
commons-text = { module = "org.apache.commons:commons-text", version = "1.14.0" }
eclipselink = { module = "org.eclipse.persistence:eclipselink", version = "4.0.7" }
errorprone = { module = "com.google.errorprone:error_prone_core", version = "2.41.0" }
google-cloud-storage-bom = { module = "com.google.cloud:google-cloud-storage-bom", version = "2.55.0" }
google-cloud-libraries-bom = { module = "com.google.cloud:libraries-bom", version = "26.64.0" }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will need to change the License file for this change, somewhere like here

Group: com.google.api.grpc Name: proto-google-cloud-storage-v2 Version: 2.53.0
. cc @jbonofre

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point about LICENSE updates, however using a BOM does not necessary require LICENSE changes... only real dependencies need to be mentioned... IMHO, that can be done later (we have to double check dependencies for every release anyway).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm OK with a followup PR.

Copy link
Author

@byronellis byronellis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eric-maynard The implementation is largely code complete, I just thought it would be rude to drop a 3000 line PR on folks and the commits didn't make a whole lot of sense as split points so I opted for a set of clean commits that reflect well-defined pieces of the implementation instead since I ended up doing it in kind of a big marathon session.

Right this second I believe there are 2 integration tests of the ~200 or so that are failing (they seem like they may be related) but all of the happy path integration tests seem to be working through the emulator at least.

@@ -49,7 +49,7 @@ commons-lang3 = { module = "org.apache.commons:commons-lang3", version = "3.18.0
commons-text = { module = "org.apache.commons:commons-text", version = "1.14.0" }
eclipselink = { module = "org.eclipse.persistence:eclipselink", version = "4.0.7" }
errorprone = { module = "com.google.errorprone:error_prone_core", version = "2.41.0" }
google-cloud-storage-bom = { module = "com.google.cloud:google-cloud-storage-bom", version = "2.55.0" }
google-cloud-libraries-bom = { module = "com.google.cloud:libraries-bom", version = "26.64.0" }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wondering: does libraries-bom get updated as frequently as any of its upstream artifacts are published?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe most of the Cloud SDKs, libraries included, are on a two week cadence more or less. The advantage (coming from the Beam experience with the Cloud Java SDKs) of using the BOM is that it keeps the various support libraries synchronized across specific SDKS. What happens otherwise is you get version drift in shared components like Protobuf or gRPC core libraries which can be really hard to spot.

Copy link
Contributor

@dimas-b dimas-b left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Glad to see Spanner Persistence materializing 🙂 thanks for you work on this @byronellis !

this.spannerConfiguration = spannerConfiguration;
}

protected Spanner spanner;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

having mutable protected fields in an @ApplicationScoped class looks a bit odd to me. What is the intent here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly because until a bit earlier today there wasn't a constructor... but now that is not the case so we can just make these final.

try {
spanner
.getDatabaseClient(databaseId)
.write(ImmutableList.of(Realm.upsert(realmContext.getRealmIdentifier())));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this should be done under the "bootstrap" call path as opposed to on observing new realm IDs in runtime. The difference would be delegating realm initialization to the "admin" user / admin tool. Cf. #2196

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also RealmContext CDI beans may come and go very frequently in runtime (once per request at least).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed with @dimas-b. The realm initialization doesn't happen very often. It only happens when we bootstrap a new realm, https://polaris.apache.org/in-dev/unreleased/admin-tool/#bootstrapping-realms-and-principal-credentials. Producing a bean here isn't necessary to me, as Polaris server will never use it for realm initialization. Here is the reference code path in JDBC impl.: https://github.com/polaris-catalog/polaris/blob/main/persistence/relational-jdbc/src/main/java/org/apache/polaris/persistence/relational/jdbc/JdbcMetaStoreManagerFactory.java#L142-L142

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough, moving this to the bootstrap code.

protected DatabaseId databaseId;

@PostConstruct
protected void init() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just wondering: why not do the init work in the constructor?

Copy link
Contributor

@flyrain flyrain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @byronellis for working on it. LGTM generally. Left minor comments and I think some beans are not necessary.

Database dbInfo =
client.newDatabaseBuilder(databaseId).setDialect(Dialect.GOOGLE_STANDARD_SQL).build();
try {
spanner.getDatabaseAdminClient().updateDatabaseDdl(dbInfo, ddlStatements, null).get();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: reuse the client?

Suggested change
spanner.getDatabaseAdminClient().updateDatabaseDdl(dbInfo, ddlStatements, null).get();
client.updateDatabaseDdl(dbInfo, ddlStatements, null).get();

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

spanner.getDatabaseAdminClient().updateDatabaseDdl(dbInfo, ddlStatements, null).get();
LOGGER.info("Successfully applied DDL update.");
} catch (InterruptedException | ExecutionException e) {
LOGGER.error("Unable to update Spanner DDL.", e);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the error message isn't necessary as we throw the runtime exception next line.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed

.getDatabaseClient(databaseId)
.write(ImmutableList.of(Realm.upsert(realmContext.getRealmIdentifier())));
} catch (SpannerException e) {
LOGGER.error("Unable to initialize realm " + realmContext.getRealmIdentifier(), e);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Throw a runtime exception instead of logging an error? So that the stack trace will show the complete call chain.

try {
spanner
.getDatabaseClient(databaseId)
.write(ImmutableList.of(Realm.upsert(realmContext.getRealmIdentifier())));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed with @dimas-b. The realm initialization doesn't happen very often. It only happens when we bootstrap a new realm, https://polaris.apache.org/in-dev/unreleased/admin-tool/#bootstrapping-realms-and-principal-credentials. Producing a bean here isn't necessary to me, as Polaris server will never use it for realm initialization. Here is the reference code path in JDBC impl.: https://github.com/polaris-catalog/polaris/blob/main/persistence/relational-jdbc/src/main/java/org/apache/polaris/persistence/relational/jdbc/JdbcMetaStoreManagerFactory.java#L142-L142

}

@Produces
public SchemaInitializer getSchemaInitializer() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here. The scheme options are only used while bootstrapping. Bootstrappping was done by the Admin tool. It doesn't seem necessary to me that we need a SchemaInitializer bean for any dynamic options inside Polaris server. A normal function with schema options should be good enough.

databaseId = SpannerUtil.databaseFromConfiguration(spannerConfiguration);
}

protected List<String> getSpannerDatabaseDdl(SchemaOptions options) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could make this method static.

public static Map<String, String> jsonMap(String properties) {
HashMap<String, String> map = new HashMap<>();
if (properties != null && !properties.isBlank()) {
JsonObject obj = GSON.fromJson(properties, JsonObject.class);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wondering: is GSON a strict requirement here?
the rest of the codebase uses jackson for stuff like this

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's a strict requirement. It comes along for the ride with Spanner so I used it more or less out of habit. Jackson would be fine too I think? Let me try it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants