Skip to content

Add memoization to resolve_available_items#1964

Merged
WilliamDee merged 3 commits intomainfrom
wtj/mfs-resolve-available-memoization
Jan 28, 2026
Merged

Add memoization to resolve_available_items#1964
WilliamDee merged 3 commits intomainfrom
wtj/mfs-resolve-available-memoization

Conversation

@wiggzz
Copy link
Contributor

@wiggzz wiggzz commented Jan 27, 2026

Why

When resolving group-by items for queries with many metrics (e.g., 260+ metrics in a PowerBI metadata discovery query), resolve_available_items() was being called repeatedly (23+ times in observed traces) with the same arguments. Each call performs a full DAG traversal taking 9-20 seconds, leading to query resolution times of 40+ minutes.

This was causing:

  • CPU pegging in metricflow-server (MFS)
  • Kubernetes liveness probe failures
  • Service outages

The root cause is in push_down_visitor.py - when a measure node has no matching items and suggestion_generator is set, it calls resolve_available_items() to generate error suggestions. With many metrics, this happens for each measure node, and without caching, each call does a full expensive DAG traversal.

What

Add a simple instance-level cache to GroupByItemResolver.resolve_available_items() that caches results by (resolution_node_id, pattern_ids). Since the same GroupByItemResolver instance is used throughout a single query resolution, and the same patterns are passed each time, subsequent calls return cached results instead of re-traversing the DAG.

This reduces query resolution from O(N × DAG_traversal_time) to O(1 × DAG_traversal_time) for repeated calls with the same arguments.

Notes

  • The cache uses id() for the resolution node and patterns since they're immutable during resolution
  • The cache is instance-scoped, so it naturally gets cleared when a new GroupByItemResolver is created
  • Existing tests pass without modification

Drafted by Claude Opus 4.5 under the direction of @wiggzz

…ted DAG traversals

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@wiggzz wiggzz requested a review from a team as a code owner January 27, 2026 19:52
@cla-bot cla-bot bot added the cla:yes label Jan 27, 2026
@github-actions
Copy link

Thank you for your pull request! We could not find a changelog entry for this change. For details on how to document a change, see the contributing guide.

@wiggzz
Copy link
Contributor Author

wiggzz commented Jan 27, 2026

Tested this by pulling this into metricflow-server in devspace. By using a semantic manifest override with a lot of metrics, and generating a big query with a lot of metrics, I was able to reproduce a query that took 30s. After this change, that query went down to 2.1s. So, that's a fairly big improvement. I think this is worth merging and bringing in.

Copy link
Contributor

@plypaul plypaul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update! LGTM but see inline comment.

self._manifest_lookup = manifest_lookup
self._resolution_dag = resolution_dag
# Cache for resolve_available_items to avoid repeated expensive DAG traversals
self._available_items_cache: dict[Tuple[int, Tuple[int, ...]], AvailableGroupByItemsResolution] = {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've been trying out a wrapper for caches like this so that we can instrument cache hit rates / sizes at some point:

https://github.com/dbt-labs/metricflow/blob/main/metricflow-semantics/metricflow_semantics/semantic_graph/attribute_resolution/sg_linkable_spec_resolver.py#L74

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@plypaul updated for this. Once this is merged, will a release be automatically made? Then we'll need to bump metricflow in MFS.

wiggzz and others added 2 commits January 28, 2026 09:56
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@WilliamDee WilliamDee merged commit 5150143 into main Jan 28, 2026
13 checks passed
@WilliamDee WilliamDee deleted the wtj/mfs-resolve-available-memoization branch January 28, 2026 16:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants