Skip to content

Conversation

@dstein-st
Copy link

[ai-assisted]

see: #1780

Fix MetricFlow Issue #1780: Enable PRIMARY/FOREIGN Entity Path Resolution

Description

This PR fixes Issue #1780 where MetricFlow fails to resolve queries that use the same dimension accessed through different entity paths when one metric uses a PRIMARY entity and another uses a FOREIGN entity.

Problem

Previously, MetricFlow would fail with "the given input does not match to a common item" when querying:

mf query --metrics job_count_primary,material_costs --group-by job__businessunit__businessunit_name

This occurred because:

  • job_count_primary metric (using PRIMARY entity) accesses the dimension as businessunit__businessunit_name
  • material_costs metric (using FOREIGN entity) accesses it as job__businessunit__businessunit_name
  • The system didn't recognize these as semantically equivalent paths to the same dimension

Solution

The fix implements semantic equivalence recognition between PRIMARY and FOREIGN entity paths while preserving the user's requested dimension naming in output columns. This is achieved through three key changes:

  1. Enhanced Pattern Matching (entity_link_pattern.py):

    • Added suffix matching for DimensionSpecs when the pattern has exactly one more entity link than the candidate
    • This allows job__businessunit__businessunit_name to match businessunit__businessunit_name
    • Applied only to dimensions (not entities) to maintain precise semantics
  2. Semantic Equivalence Recognition (linkable_element_set.py):

    • Added _path_keys_are_semantically_equivalent to recognize equivalence when paths differ by exactly one entity link
    • Enhanced intersection_by_path_key to properly group semantically equivalent elements during intersection operations
    • Selects the longer (FOREIGN) path as the representative to ensure universal accessibility
  3. Output Column Name Preservation (query_resolver.py):

    • Added logic in _resolve_group_by_item_input to preserve the original requested entity links
    • When resolution uses semantic equivalence, creates a spec with the user's requested entity links for output naming
    • Ensures users get column names matching their query (e.g., job__businessunit__businessunit_name)

Technical Details

  1. Conservative Approach: The suffix matching only activates when there's exactly a 1-level difference in entity links, preventing overly broad matching that could break existing behavior.

  2. Time Dimension Handling: Fixed _match_time_granularities to properly pass through non-time dimension specs, resolving issues with the new matching logic.

  3. Test Impact:

    • Updated test_non_resolvable_ambiguous_entity_path_due_to_mismatch which now succeeds (positive change)
    • Fixed SCD integration tests that were affected by column naming changes
    • Updated dimension listing test snapshot to reflect newly accessible dimensions
    • Added new test specifically for PRIMARY/FOREIGN suffix matching

Testing

  • ✅ Original failing query now succeeds
  • ✅ All 770 existing tests pass
  • ✅ Added new test for PRIMARY/FOREIGN suffix matching
  • ✅ Linting passes
  • ✅ Output column names match user's requested format

Breaking Changes

No breaking changes. Queries that previously failed due to PRIMARY/FOREIGN entity path differences will now succeed. All existing working queries continue to work as before.

Example

Before this fix:

# This would fail
mf query --metrics job_count_primary,material_costs --group-by job__businessunit__businessunit_name
# Error: "the given input does not match to a common item"

After this fix:

# This now succeeds and preserves the requested column name
mf query --metrics job_count_primary,material_costs --group-by job__businessunit__businessunit_name
# Successfully generates SQL with output column: job__businessunit__businessunit_name

Files Changed

  • metricflow-semantics/metricflow_semantics/specs/patterns/entity_link_pattern.py

    • Modified _match_entity_links to add PRIMARY/FOREIGN suffix matching for dimensions
    • Fixed _match_time_granularities to properly handle non-time dimension specs
  • metricflow-semantics/metricflow_semantics/model/semantics/linkable_element_set.py

    • Added _path_keys_are_semantically_equivalent for recognizing equivalent paths
    • Rewrote intersection_by_path_key to group semantically equivalent elements
  • metricflow-semantics/metricflow_semantics/query/query_resolver.py

    • Added logic to preserve original entity links for output column naming
  • Test updates:

    • Updated test_ambiguous_entity_path.py to reflect more flexible behavior
    • Added test_primary_foreign_suffix_matching.py for new functionality
    • Updated dimension listing snapshot with newly accessible dimensions

Fixes #1780

@dstein-st dstein-st requested a review from a team as a code owner July 25, 2025 18:54
@cla-bot cla-bot bot added the cla:yes label Jul 25, 2025
@plypaul
Copy link
Contributor

plypaul commented Jul 27, 2025

Thanks for contributing the pull request! There is a bit of a backlog, so we'll update when it has cleared.

…ionships (dbt-labs#1780)

This fixes an issue where MetricFlow incorrectly treated semantically
equivalent entity paths as incompatible when querying metrics that access
the same dimension through different paths (PRIMARY vs FOREIGN entities).

The solution recognizes when two paths differ by exactly one entity link
and the shorter path is a suffix of the longer path, indicating a
PRIMARY/FOREIGN relationship. These paths are now treated as semantically
equivalent for query resolution.

Importantly, while using semantic equivalence for internal resolution,
the fix preserves the exact entity path form requested by the user for
output column naming. This ensures that users get column names matching
their query even when the system internally resolves through a shorter equivalent path.

This change:
- Enables queries combining metrics with different paths to the same dimension
- Maintains the user's requested column names even when multiple valid path names exist
- Updates affected test snapshots to reflect newly accessible dimensions
@dstein-st dstein-st force-pushed the dstein-st/primary-foreign-entity-path branch from 6fd7e53 to 70046ea Compare July 28, 2025 17:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] MetricFlow fails to resolve equivalent dimension paths across PRIMARY and FOREIGN entity relationships

2 participants