Add memoization to resolve_available_items#1964
Conversation
…ted DAG traversals Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
Thank you for your pull request! We could not find a changelog entry for this change. For details on how to document a change, see the contributing guide. |
|
Tested this by pulling this into metricflow-server in devspace. By using a semantic manifest override with a lot of metrics, and generating a big query with a lot of metrics, I was able to reproduce a query that took 30s. After this change, that query went down to 2.1s. So, that's a fairly big improvement. I think this is worth merging and bringing in. |
plypaul
left a comment
There was a problem hiding this comment.
Thanks for the update! LGTM but see inline comment.
| self._manifest_lookup = manifest_lookup | ||
| self._resolution_dag = resolution_dag | ||
| # Cache for resolve_available_items to avoid repeated expensive DAG traversals | ||
| self._available_items_cache: dict[Tuple[int, Tuple[int, ...]], AvailableGroupByItemsResolution] = {} |
There was a problem hiding this comment.
We've been trying out a wrapper for caches like this so that we can instrument cache hit rates / sizes at some point:
There was a problem hiding this comment.
@plypaul updated for this. Once this is merged, will a release be automatically made? Then we'll need to bump metricflow in MFS.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Why
When resolving group-by items for queries with many metrics (e.g., 260+ metrics in a PowerBI metadata discovery query),
resolve_available_items()was being called repeatedly (23+ times in observed traces) with the same arguments. Each call performs a full DAG traversal taking 9-20 seconds, leading to query resolution times of 40+ minutes.This was causing:
The root cause is in
push_down_visitor.py- when a measure node has no matching items and suggestion_generator is set, it callsresolve_available_items()to generate error suggestions. With many metrics, this happens for each measure node, and without caching, each call does a full expensive DAG traversal.What
Add a simple instance-level cache to
GroupByItemResolver.resolve_available_items()that caches results by(resolution_node_id, pattern_ids). Since the sameGroupByItemResolverinstance is used throughout a single query resolution, and the same patterns are passed each time, subsequent calls return cached results instead of re-traversing the DAG.This reduces query resolution from O(N × DAG_traversal_time) to O(1 × DAG_traversal_time) for repeated calls with the same arguments.
Notes
id()for the resolution node and patterns since they're immutable during resolutionGroupByItemResolveris createdDrafted by Claude Opus 4.5 under the direction of @wiggzz