Add memoization to resolve_available_items by wiggzz · Pull Request #1964 · dbt-labs/metricflow

wiggzz · 2026-01-27T19:52:36Z

Why

When resolving group-by items for queries with many metrics (e.g., 260+ metrics in a PowerBI metadata discovery query), resolve_available_items() was being called repeatedly (23+ times in observed traces) with the same arguments. Each call performs a full DAG traversal taking 9-20 seconds, leading to query resolution times of 40+ minutes.

This was causing:

CPU pegging in metricflow-server (MFS)
Kubernetes liveness probe failures
Service outages

The root cause is in push_down_visitor.py - when a measure node has no matching items and suggestion_generator is set, it calls resolve_available_items() to generate error suggestions. With many metrics, this happens for each measure node, and without caching, each call does a full expensive DAG traversal.

What

Add a simple instance-level cache to GroupByItemResolver.resolve_available_items() that caches results by (resolution_node_id, pattern_ids). Since the same GroupByItemResolver instance is used throughout a single query resolution, and the same patterns are passed each time, subsequent calls return cached results instead of re-traversing the DAG.

This reduces query resolution from O(N × DAG_traversal_time) to O(1 × DAG_traversal_time) for repeated calls with the same arguments.

Notes

The cache uses id() for the resolution node and patterns since they're immutable during resolution
The cache is instance-scoped, so it naturally gets cleared when a new GroupByItemResolver is created
Existing tests pass without modification

Drafted by Claude Opus 4.5 under the direction of @wiggzz

…ted DAG traversals Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

github-actions · 2026-01-27T19:52:47Z

Thank you for your pull request! We could not find a changelog entry for this change. For details on how to document a change, see the contributing guide.

wiggzz · 2026-01-27T23:11:22Z

Tested this by pulling this into metricflow-server in devspace. By using a semantic manifest override with a lot of metrics, and generating a big query with a lot of metrics, I was able to reproduce a query that took 30s. After this change, that query went down to 2.1s. So, that's a fairly big improvement. I think this is worth merging and bringing in.

plypaul

Thanks for the update! LGTM but see inline comment.

plypaul · 2026-01-27T23:53:04Z

metricflow-semantics/metricflow_semantics/query/group_by_item/group_by_item_resolver.py

        self._manifest_lookup = manifest_lookup
        self._resolution_dag = resolution_dag
+        # Cache for resolve_available_items to avoid repeated expensive DAG traversals
+        self._available_items_cache: dict[Tuple[int, Tuple[int, ...]], AvailableGroupByItemsResolution] = {}


We've been trying out a wrapper for caches like this so that we can instrument cache hit rates / sizes at some point:

https://github.com/dbt-labs/metricflow/blob/main/metricflow-semantics/metricflow_semantics/semantic_graph/attribute_resolution/sg_linkable_spec_resolver.py#L74

@plypaul updated for this. Once this is merged, will a release be automatically made? Then we'll need to bump metricflow in MFS.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Add memoization to resolve_available_items to prevent expensive repea…

eb2f001

…ted DAG traversals Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

wiggzz requested a review from a team as a code owner January 27, 2026 19:52

cla-bot bot added the cla:yes label Jan 27, 2026

plypaul approved these changes Jan 27, 2026

View reviewed changes

wiggzz and others added 2 commits January 28, 2026 09:56

Use ResultCache wrapper for resolve_available_items memoization

8d23335

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Add changelog entry

b045112

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

WilliamDee merged commit 5150143 into main Jan 28, 2026
13 checks passed

WilliamDee deleted the wtj/mfs-resolve-available-memoization branch January 28, 2026 16:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add memoization to resolve_available_items#1964

Add memoization to resolve_available_items#1964
WilliamDee merged 3 commits intomainfrom
wtj/mfs-resolve-available-memoization

wiggzz commented Jan 27, 2026

Uh oh!

github-actions bot commented Jan 27, 2026

Uh oh!

wiggzz commented Jan 27, 2026

Uh oh!

plypaul left a comment

Uh oh!

plypaul Jan 27, 2026

Uh oh!

wiggzz Jan 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

wiggzz commented Jan 27, 2026

Why

What

Notes

Uh oh!

github-actions bot commented Jan 27, 2026

Uh oh!

wiggzz commented Jan 27, 2026

Uh oh!

plypaul left a comment

Choose a reason for hiding this comment

Uh oh!

plypaul Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

wiggzz Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants