Remove overloads of BasePersistence.listEntities #2262

XN137 · 2025-08-05T07:51:17Z

BasePersistence.listEntities and TransactionalPersistence.listEntitiesInCurrentTxn
are afaict only overloaded to make the entityFilter and transformer
parameters "optional" for some callers.
the most central method still has to have support for both of them.

using default methods in the interface would enable this more clearly
but as it turns out, hardly any callers are utilizing those overloads
and thus if we simply remove the overloads we have less code and things
become clearer across the board.

we also add a EntityNameLookupRecord.fromEntity factory method to
replace the overloaded constructor.

`BasePersistence.listEntities` and `TransactionalPersistence.listEntitiesInCurrentTxn` are afaict only overloaded to make the `entityFilter` and `transformer` parameters "optional" for some callers. the most central method still has to have support for both of them. using default methods in the interface would enable this more clearly but as it turns out, hardly any callers are utilizing those overloads and thus if we simply remove the overloads we have less code and things become clearer across the board. we also add a `EntityNameLookupRecord.fromEntity` factory method to replace the overloaded constructor.

snazy

Looks straightforward
+1

flyrain · 2025-08-05T21:58:55Z

polaris-core/src/main/java/org/apache/polaris/core/persistence/BasePersistence.java

-   * @return the list of entities for the specified list operation
-   */
-  @Nonnull
-  Page<EntityNameLookupRecord> listEntities(


I'd suggest a dev ML discussion or vote for interface change in this class. cc @dennishuo cc @singhpk234

This particular type is actually not even used by any production code outside of a particular persistence implementation.

If a particular implementation needs more functions, those implementations are free to keep it.

can you help me understand what particular concern one might have?

the innermost methods that support all the parameters are still there (so not that hard to switch to if backward-compatibility is the concern)

Because BasePersistence is a core persistence interface, any change—no matter how small—deserves a note on the dev mailing list so everyone stays in the loop.

sfc-gh-yzou · 2025-08-06T19:20:07Z

polaris-core/src/main/java/org/apache/polaris/core/entity/EntityNameLookupRecord.java

-    this.typeCode = entity.getTypeCode();
-    this.name = entity.getName();
-    this.subTypeCode = entity.getSubTypeCode();
+  public static EntityNameLookupRecord fromEntity(PolarisBaseEntity entity) {


what is the motivation for this change here? if no particular reason, can we keep the original constructor?

I'm fine to add a new static method like this. I'd suggest to deprecate the constructor first and remove it later if we don't need it.

dennishuo

Let's discuss on mailing list. At least one big service provider with a custom persistence backend would be hit very badly by this change and if we had implemented the JdbcBasePersistenceImpl more efficiently this would've been a big regression. We should fix the JDBC persistence forward to be efficient.

The TL;DR: The index used for listing by name isn't generally a fully-covering index for any known Polaris persistence impls, and even if it was, we mustn't load full entities from the database all the time just to discard most columns in the "pure name-listing" use cases like in Iceberg listTables or listNamespaces.

Some archaeology:

Historically TreeMapStore was sloppy to store an entire PolarisBaseEntity in entitiesActive even though it was representing storing only an EntityNameLookupRecord in entitiesActive; originally this was "protected" from the outside world because all entitiesActive interactions only ever produced EntityNameLookupRecord or equivalent, but then the addition of the 6-arg version with entityFilter and the 7-arg version with transformer incorrectly assumed that this means it's equally cheap to produce the entire entity.

This is analogous to whether we use the INCLUDE keyword in Postgres when creating an index to make it a COVERING INDEX (https://www.postgresql.org/docs/current/indexes-index-only-scans.html#INDEXES-INDEX-ONLY-SCANS) or use the STORING keyword in Google Spanner: https://cloud.google.com/blog/products/databases/how-to-use-a-storing-clause-in-cloud-spanner/

For Postgres here it looks like we let the UNIQUE CONSTRAINT automatically create a secondary index under the hoot: https://github.com/singhpk234/polaris/blob/6cf26f8e5213802161fde6bd1e190d6fdc345822/scripts/postgres/schema-v1-postgresql.sql#L44

The Postgres docs say:

the uniqueness condition applies to just column x, not to the combination of x and y. (An INCLUDE clause can also be written in UNIQUE and PRIMARY KEY constraints, providing alternative syntax for setting up an index like this.)

Which implies by negation that if you don't have an INCLUDE clause on your UNIQUE constraint then the UNIQUE index would not automatically be a covering index.

Ultimately we'll need at least 2 variations of the method (one for pure-name-listing, another for the current Filter and Transformer versions), but we probably actually want to do a better job of figuring out what fields we actually needed in either the transformer or filter versions.

Maybe we'd have a catch-all that indeed forces a base join from the secondary index to retrieve the full PolarisEntity, but it's unclear whether the main use cases today actually need all the fields.

dennishuo · 2025-08-08T05:10:36Z

...a/org/apache/polaris/core/persistence/transactional/TreeMapTransactionalPersistenceImpl.java

-        parentId,
-        entityType,
-        entityFilter,
-        entity ->


Okay so we shouldn't have been lazy with properly expressing the distinctions in the behaviors of the different methods here in TreeMapTransactionalPersistenceImpl because it hid the real behavioral nuance since inmemory operations are too cheap.

In general the entitesActive is not a "covering index" for the entire PolarisBaseEntity, so operations that require the whole entity are not in the same category as those which only require EntityNameLookupRecord.

dennishuo · 2025-08-08T05:16:21Z

...bc/src/main/java/org/apache/polaris/persistence/relational/jdbc/JdbcBasePersistenceImpl.java

-        parentId,
-        entityType,
-        entityFilter,
-        EntityNameLookupRecord::new,


This was an unfortunate inefficiency that accidentally slipped through probably because the TreeMap impl gets away with it due to everything being in-memory. The "basic" version should never have delegated to the "filter" or "transformer" version, and indeed even the filter version probably shouldn't have just delegated to the "transformer" version either.

In practice, at the very least the "basic" version (5-arg version) should have had its own underlying query which only retrieved the columns used in EntityNameLookupRecord.

It's a pretty huge inefficiency (borderline bug) right now for "pure name listing" to have to select all columns from the database only to throw away all the big columns.

XN137 · 2025-08-11T15:45:42Z

@dennishuo thanks a lot of the detailed reply!

BasePersistence.listEntities has 3 variants:

Page<EntityNameLookupRecord> listEntities(..., PageToken);

Page<EntityNameLookupRecord> listEntities(..., Predicate<PolarisBaseEntity>, PageToken)

<T> Page<T> listEntities(..., Predicate<PolarisBaseEntity>, Function<PolarisBaseEntity, T>, PageToken);

I understand your point that method 1 was supposed to be used by "name-listing" polaris code to only retrieve the entity properties required for building a EntityNameLookupRecord (i.e. most prominently listTables or listNamespaces).
while method 3 needs to load the full entity properties to support the predicate and transformer parameters.

so method 1 can be faster and cheaper based on the persistence implementation.

as your comments also pointed out method 2 is weird as it supports a Predicate<PolarisBaseEntity> which requires loading the full entity, but then throws most of the loaded entity properties away to only return a EntityNameLookupRecord.
so any performance benefit of the latter are lost due to the former.

due to the above fact, in our current codebase method 2 is frequently implemented as forwarding to method 3.

also on current main method 1 has no callers at all, as listTables for example needs to pass in a PolarisEntitySubType subType filter and is thus using method 2:

polaris/runtime/service/src/main/java/org/apache/polaris/service/catalog/iceberg/IcebergCatalog.java

Lines 2574 to 2591 in 2f985ab

    
           private Page<TableIdentifier> listTableLike( 
        
               PolarisEntitySubType subType, Namespace namespace, PageToken pageToken) { 
        
             PolarisResolvedPathWrapper resolvedEntities = resolvedEntityView.getResolvedPath(namespace); 
        
             if (resolvedEntities == null) { 
        
               // Illegal state because the namespace should've already been in the static resolution set. 
        
               throw new IllegalStateException( 
        
                   String.format("Failed to fetch resolved namespace '%s'", namespace)); 
        
             } 
        
             List<PolarisEntity> catalogPath = resolvedEntities.getRawFullPath(); 
        
             ListEntitiesResult listResult = 
        
                 getMetaStoreManager() 
        
                     .listEntities( 
        
                         getCurrentPolarisContext(), 
        
                         PolarisEntity.toCoreList(catalogPath), 
        
                         PolarisEntityType.TABLE_LIKE, 
        
                         subType, 
        
                         pageToken);

I am guessing this changed with the introduction of GenericTables and/or the rework of pagination.

this also means that even if a private PolarisMetaStoreManager impl is using method 1, its a bit of a private implementation detail as no general polaris code is calling method 1 directly.

the above is an explanation as to how I arrived at this PR suggesting the removal of method 1 (which is unused) and method 2 (which is forwarding to method 3).

As for a proper way forward that keeps the performance benefit in mind I have put up 2 PRs:

#2317
allows the removal of method 2 by letting the other methods support a PolarisEntitySubType subType parameter that can even be pushed down into the queries.

#2290
removes a common anti-pattern in the codebase where callers do listEntites - stream - loadEntity which also defeats any performance optimizations that listEntites might have.

I would appreciate it, if you took the time to look at these PRs as they should address a lot of the shortcomings identified as part of this discussion.
After receiving your feedback we can just close this PR (as it's mostly replaced by the first PR mentioned above).

dennishuo · 2025-08-12T03:04:16Z

@XN137 Good digging! I didn't realize that method 1 callsites had regressed. I did a bit more archaeology and it looks like #1938 was actually what introduced the regression for the main listTables/listNamespaces behaviors:

Previously AtomicMetaStoreManager (and TransactionalMetaStoreManagerImpl):

// prune the returned list with only entities matching the entity subtype
if (entitySubType != PolarisEntitySubType.ANY_SUBTYPE) {
  resultPage =
      pageToken.buildNextPage(
          resultPage.items.stream()
              .filter(rec -> rec.getSubTypeCode() == entitySubType.getCode())
              .collect(Collectors.toList()));
}

After:

// prune the returned list with only entities matching the entity subtype
Predicate<PolarisBaseEntity> filter =
    entitySubType != PolarisEntitySubType.ANY_SUBTYPE
        ? e -> e.getSubTypeCode() == entitySubType.getCode()
        : entity -> true;

Page<EntityNameLookupRecord> resultPage =
    ms.listEntities(callCtx, catalogId, parentId, entityType, filter, pageToken);

Fortunately this looks like this never made it into 1.0.x:

polaris/polaris-core/src/main/java/org/apache/polaris/core/persistence/AtomicOperationMetaStoreManager.java

Line 713 in d7e28e4

if (entitySubType != PolarisEntitySubType.ANY_SUBTYPE) {

That change would've been somewhat worse than the more-noticeable removal of the overloads for anyone using the standard AtomicMetaStoreManager or TransactionalMetaStoreManagerImpl with a private implementation of BasePersistence/TransactionalPersistence who may have come to depend strongly on the differentiation in performance between the different overloads, since unfortunately #1938 would silently break this behavioral expectation.

The very original introduction of both filter and transformer was honestly probably too rushed as a short-term crutch, and was exclusively for the loadTasks which required very open-ended post-processing filtering.

Overall I think your other two PRs are the right direction. The various listEntities...map(loadEntity...) flows are indeed one of the big known pain points no one got around to fixing yet so that's actually great to see!

I'll comment on specifics within each PR.

XN137 · 2025-08-13T16:20:51Z

as discussed, closing this ticket as https://github.com/apache/polaris/pull/2317 is mostly the replacement.

github-project-automation bot moved this to PRs In Progress in Basic Kanban Board Aug 5, 2025

github-project-automation bot added this to Basic Kanban Board Aug 5, 2025

XN137 force-pushed the Simplify-listEntities branch from 5944cb1 to 314b3dd Compare August 5, 2025 14:23

XN137 changed the title ~~Simplify listEntities~~ Remove overloads of BasePersistence.listEntities Aug 5, 2025

XN137 force-pushed the Simplify-listEntities branch from 314b3dd to b3a06bb Compare August 5, 2025 14:26

XN137 marked this pull request as ready for review August 5, 2025 14:27

snazy approved these changes Aug 5, 2025

View reviewed changes

github-project-automation bot moved this from PRs In Progress to Ready to merge in Basic Kanban Board Aug 5, 2025

flyrain reviewed Aug 5, 2025

View reviewed changes

sfc-gh-yzou reviewed Aug 6, 2025

View reviewed changes

dennishuo requested changes Aug 8, 2025

View reviewed changes

github-project-automation bot moved this from Ready to merge to PRs In Progress in Basic Kanban Board Aug 8, 2025

XN137 closed this Aug 13, 2025

github-project-automation bot moved this from PRs In Progress to Done in Basic Kanban Board Aug 13, 2025

XN137 deleted the Simplify-listEntities branch August 13, 2025 16:20

XN137 mentioned this pull request Aug 14, 2025

Optimize JdbcBasePersistenceImpl.listEntities #2352

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Remove overloads of BasePersistence.listEntities #2262

Remove overloads of BasePersistence.listEntities #2262

Uh oh!

XN137 commented Aug 5, 2025 •

edited

Loading

Uh oh!

snazy left a comment

Uh oh!

flyrain Aug 5, 2025 •

edited

Loading

Uh oh!

snazy Aug 6, 2025

Uh oh!

XN137 Aug 6, 2025

Uh oh!

flyrain Aug 6, 2025

Uh oh!

sfc-gh-yzou Aug 6, 2025

Uh oh!

flyrain Aug 6, 2025

Uh oh!

dennishuo left a comment

Uh oh!

dennishuo Aug 8, 2025

Uh oh!

dennishuo Aug 8, 2025

Uh oh!

XN137 commented Aug 11, 2025

Uh oh!

dennishuo commented Aug 12, 2025

Uh oh!

XN137 commented Aug 13, 2025

Uh oh!

Uh oh!

Remove overloads of BasePersistence.listEntities #2262

Remove overloads of BasePersistence.listEntities #2262

Uh oh!

Conversation

XN137 commented Aug 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

snazy left a comment

Choose a reason for hiding this comment

Uh oh!

flyrain Aug 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

snazy Aug 6, 2025

Choose a reason for hiding this comment

Uh oh!

XN137 Aug 6, 2025

Choose a reason for hiding this comment

Uh oh!

flyrain Aug 6, 2025

Choose a reason for hiding this comment

Uh oh!

sfc-gh-yzou Aug 6, 2025

Choose a reason for hiding this comment

Uh oh!

flyrain Aug 6, 2025

Choose a reason for hiding this comment

Uh oh!

dennishuo left a comment

Choose a reason for hiding this comment

Uh oh!

dennishuo Aug 8, 2025

Choose a reason for hiding this comment

Uh oh!

dennishuo Aug 8, 2025

Choose a reason for hiding this comment

Uh oh!

XN137 commented Aug 11, 2025

Uh oh!

dennishuo commented Aug 12, 2025

Uh oh!

XN137 commented Aug 13, 2025

Uh oh!

Uh oh!

XN137 commented Aug 5, 2025 •

edited

Loading

flyrain Aug 5, 2025 •

edited

Loading