Skip to content

Add PolarisMetaStoreManager.loadEntities #2290

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

XN137
Copy link
Contributor

@XN137 XN137 commented Aug 7, 2025

this is a followup to #2261

currently PolarisMetaStoreManager.listEntities only exposes a limited
subset of the underlying BasePersistence.listEntities functionality.

most of the callers have to post-process the EntityNameLookupRecord of
ListEntitiesResult and call PolarisMetaStoreManager.loadEntity
on the individual items sequentually to transform and filter them.

this is bad for the following reasons:

  • suboptimal performance as we run N+1 queries to basically load every
    entity twice from the persistence backend
  • suffering from race-conditions when entities get dropped between the
    listEntities and loadEntity call
  • a lot of repeated code in all the callers (of which only some are
    dealing with the race-condition by filtering out null values)

as a solution we add PolarisMetaStoreManager.loadEntities that takes
advantage of the already existing BasePersistence methods.
we rename one of the listEntities methods to loadEntities for
consistency.

since many callers dont need paging and want the result as a list, we
add PolarisMetaStoreManager.loadEntitiesAll as a convenient wrapper.

we also remove the PolarisEntity.nameAndId method as callers who only
need name and id should not be loading the full entity to begin with.

note we rework testCatalogNotReturnedWhenDeletedAfterListBeforeGet
from ManagementServiceTest because the simulated race-condition
scenario can no longer happen.

@github-project-automation github-project-automation bot moved this to PRs In Progress in Basic Kanban Board Aug 7, 2025
@XN137 XN137 force-pushed the Rework-PolarisMetaStoreManager.listEntities branch 3 times, most recently from b3363a7 to e10c856 Compare August 7, 2025 08:31
@XN137 XN137 marked this pull request as ready for review August 7, 2025 08:57
@XN137 XN137 force-pushed the Rework-PolarisMetaStoreManager.listEntities branch 3 times, most recently from c4acd84 to 4a62e67 Compare August 8, 2025 16:06
@XN137 XN137 marked this pull request as draft August 8, 2025 19:21
@XN137 XN137 force-pushed the Rework-PolarisMetaStoreManager.listEntities branch from 4a62e67 to 9793869 Compare August 11, 2025 07:27
@XN137 XN137 changed the title Rework PolarisMetaStoreManager.listEntities Add PolarisMetaStoreManager.loadEntities Aug 11, 2025
@XN137 XN137 force-pushed the Rework-PolarisMetaStoreManager.listEntities branch from 9793869 to f960edd Compare August 11, 2025 07:53
PolarisEntitySubType.NULL_SUBTYPE,
PolicyEntity::of)
.stream()
.filter(policyEntity -> policyType == null || policyEntity.getPolicyType() == policyType)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

side note:
if the required policyType is null we could in theory use the optimized listEntities call, as we only need the name of the entity to build the PolicyIdentifier, but for filtering by policyType we need to load the full entity.

Copy link
Contributor

@dennishuo dennishuo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be a big improvement, thanks for taking this on!

I can do a more detailed dive, but at a high level, I'd like to see if we can better sort out the responsibilities of the MetaStoreManager layer vs the BasePersistence layer here. Specifically, if we can pull up the entityFilter/transformer from BasePersistence to only live in the MetaStoreManager level, just as your #2317 is a push down of the entitySubType filter. Essentially:

  1. We want the [Base]Persistence layer to most closely represent what a lower-level database is directly capable of doing
  2. We put more "advanced" logic in the MetaStoreManager layer
  3. In the filter/transformer scenario, what we're really saying is "BasePersistence needs to give back complete PolarisBaseEntities instead of only EntityNameLookupRecords", and then something might run Polaris-side filtering/transforms -- logically this would be in MetaStoreManager or higher.

This also tells us why it made sense for the loadTasks to use a Predicate<> as the filter, but why subTypeCode should not be an opaque Predicate<> -- subTypeCode filtering is something the lower-level database is capable of understanding so therefore it is pushed down to the layer that represents the raw database.

Whereas advanced timestamp comparison, leasing, etc., of TaskEntities is a Polaris concept that the database isn't designed to understand directly (for now) so we fallback to the opaque Predicate.

Overall, I'd like to see if we can:

  1. Also rename the filter/transformer variations of listEntities within the BasePersistence, similar to how you introduced a totally different method name in the MetaStoreManager layer -- because now it's really not an overload of the same method, but really a different action entirely. I think loadEntities is reasonable to push down as the method name, but we just might want to consider the difference between "list and load full entities under a parent" and "load the full entities provided in this Collection" that we might need in the future
  2. Pull filter/transformer evaluation out of BasePersistence into MetaStoreManager
  3. Reconsider whether we actually need a Function<PolarisBaseEntity, T> generic type return value -- as far as I recall, originally we just abused that to "transform" a PolarisBaseEntity into a trimmed EntityNameLookupRecord, which was an antipattern as you identified, and then nearly every other case was just the "identity transformation". Now that we clarified the "EntityNameLookup" case, perhaps callsites never actually use the transformer anymore? I didn't have time to dig deeper yet into all the transformer use cases.

@XN137
Copy link
Contributor Author

XN137 commented Aug 13, 2025

thanks for the feedback, since the other PR is somewhat merge-ready i will wait for that and rebase this one afterwards.

Also rename the filter/transformer variations of listEntities within the BasePersistence

yeah we can use the same loadEntities name in BasePersistence imo

Pull filter/transformer evaluation out of BasePersistence into MetaStoreManager

i havent investigated this in detail yet but my current guess is that the filter needs to remain "pushed down" in order for pagination to work correctly, but will double check that later.

Reconsider whether we actually need a Function<PolarisBaseEntity, T> generic type return value

afaict the transformer is used heavily by all callers of load(All)Entities in this PR i.e. to turn base entities into their more specific type (for example via CatalogRoleEntity::of).
so i dont see that going away.

unless you mean that this transformation can also happen on the "outside" ?
this might be possible however one idea I had was that one might not need both the filter and the transformer but we could let the transformer return null for entities the caller does not need (or that cant be converted to the more specific type).
but even then the transformer needs to remain "Pushed down" to work well with pagination most likely.

@XN137 XN137 force-pushed the Rework-PolarisMetaStoreManager.listEntities branch 2 times, most recently from 18e3529 to 9c7d42d Compare August 14, 2025 09:52
@XN137
Copy link
Contributor Author

XN137 commented Aug 14, 2025

rebased on latest main due to a few merge conflict. also included the rename of one of the BasePersistence.listEntities methods

* Load entities where some predicate returns true and transform the entities with a function
*
* @param callCtx call context
* @param catalogPath path inside a catalog. If null or empty, the entities to list are top-level,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: a nullable List is not very useful, can we pass List.of() instead of null?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its possible yes but we are just "mirroring" the listEntities api which has this param as nullable. for method params isnt it "ok" to have nullable lists?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's unnecessary imho since there is no distinction between what a null list and an empty list represent, so basically accepting nulls just makes your code more error prone.

But since other methods exhibit the same behavior, and given that countless discussions already happened around nullability without us reaching any conclusion, let's table this topic for now.

adutra
adutra previously approved these changes Aug 14, 2025
@github-project-automation github-project-automation bot moved this from PRs In Progress to Ready to merge in Basic Kanban Board Aug 14, 2025
@XN137 XN137 force-pushed the Rework-PolarisMetaStoreManager.listEntities branch from 9c7d42d to 36657d7 Compare August 15, 2025 08:39
@XN137
Copy link
Contributor Author

XN137 commented Aug 15, 2025

rebased after trivial merge-conflict in PolarisAdminService.java

currently `PolarisMetaStoreManager.listEntities` only exposes a limited
subset of the underlying `BasePersistence.listEntities` functionality.

most of the callers have to post-process the `EntityNameLookupRecord` of
`ListEntitiesResult` and call `PolarisMetaStoreManager.loadEntity`
on the individual items sequentually to transform and filter them.

this is bad for the following reasons:

- suboptimal performance as we run N+1 queries to basically load every
  entity twice from the persistence backend
- suffering from race-conditions when entities get dropped between the
  `listEntities` and `loadEntity` call
- a lot of repeated code in all the callers (of which only some are
  dealing with the race-condition by filtering out null values)

as a solution we add `PolarisMetaStoreManager.loadEntities` that takes
advantage of the already existing `BasePersistence` methods.
we rename one of the `listEntities` methods to `loadEntities` for
consistency.

since many callers dont need paging and want the result as a list, we
add `PolarisMetaStoreManager.loadEntitiesAll` as a convenient wrapper.

we also remove the `PolarisEntity.nameAndId` method as callers who only
need name and id should not be loading the full entity to begin with.

note we rework `testCatalogNotReturnedWhenDeletedAfterListBeforeGet`
from `ManagementServiceTest` because the simulated race-condition
scenario can no longer happen.
@XN137 XN137 force-pushed the Rework-PolarisMetaStoreManager.listEntities branch from 36657d7 to ab37780 Compare August 15, 2025 15:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants