Skip to content

[Feature][Unit Testing] Use contracted data types in unit test fixtures where available #12606

@davem-bis

Description

@davem-bis

Is this your first time submitting a feature request?

  • I have read the expectations for open source contributors
  • I have searched the existing issues, and I could not find an existing issue for this feature
  • I am requesting a straightforward extension of existing dbt functionality, rather than a Big Idea better suited to a discussion

Describe the feature

When running a unit test against a model, the data types of any inputs are ultimately derived by the adapter.get_columns_in_relation macro called by the get_fixture_sql. This requires that the inputs to already exist on the target database to derive the column names and types. This potentially requiring a large number of upstream models to be built before a model can be unit tested, if it is towards the end of a dependency chain.

If the upstream models are contracted, they cannot be run unless the schemas match those defined in their YAML configuration files. Given they cannot exist without matching the contracts, the contracted data types may as well be used, removing the need for the upstream models to exist.

For a project that is fully contracted, this removes the need to run any models prior to unit testing, allowing for CI to run the unit tests first and fail faster if they are incorrect.

Describe alternatives you've considered

Leaving logic as is.

Who will this benefit?

Anyone unit testing models where any of the inputs are contracted, as fewer database calls will need to be made.

Are you interested in contributing this feature?

I have a working PoC (code examples below), but am unable to commit to the timeframes for getting a PR through.

Anything else?

Proposed changes:

  • dbt-adapters/src/dbt/include/global_project/macros/unit_test_sql/get_fixture_sql.sql - get_fixture_sql:

    Remove the column_name_to_quoted dictionary and replace with adapter.quote calls, as the macro currently fails if column_name_to_data_types is supplied as column_name_to_quoted is not instantiated.

      {% macro get_fixture_sql(rows, column_name_to_data_types) %}
      -- Fixture for {{ model.name }}
      {% set default_row = {} %}
    
      {%- if not column_name_to_data_types -%}
      {#-- Use defer_relation IFF it is available in the manifest and 'this' is missing from the database --#}
      {%-   set this_or_defer_relation = defer_relation if (defer_relation and not load_relation(this)) else this -%}
      {%-   set columns_in_relation = adapter.get_columns_in_relation(this_or_defer_relation) -%}
    
      {%-   set column_name_to_data_types = {} -%}
      {%-   for column in columns_in_relation -%}
    
      {#-- This needs to be a case-insensitive comparison --#}
      {%-     do column_name_to_data_types.update({column.name|lower: column.data_type}) -%}
      {%-   endfor -%}
      {%- endif -%}
    
      {%- if not column_name_to_data_types -%}
          {{ exceptions.raise_compiler_error("Not able to get columns for unit test '" ~ model.name ~ "' from relation " ~ this ~ " because the relation doesn't exist") }}
      {%- endif -%}
    
      {%- for column_name, column_type in column_name_to_data_types.items() -%}
          {%- do default_row.update({column_name: (safe_cast("null", column_type) | trim )}) -%}
      {%- endfor -%}
    
      {{ validate_fixture_rows(rows, row_number) }}
    
      {%- for row in rows -%}
      {%-   set formatted_row = format_row(row, column_name_to_data_types) -%}
      {%-   set default_row_copy = default_row.copy() -%}
      {%-   do default_row_copy.update(formatted_row) -%}
      select
      {%-   for column_name, column_value in default_row_copy.items() %} {{ column_value }} as {{ adapter.quote(column_name) }}{% if not loop.last -%}, {%- endif %}
      {%-   endfor %}
      {%-   if not loop.last %}
      union all
      {%    endif %}
      {%- endfor -%}
    
      {%- if (rows | length) == 0 -%}
          select
          {%- for column_name, column_value in default_row.items() %} {{ column_value }} as {{ adapter.quote(column_name) }}{% if not loop.last -%},{%- endif %}
          {%- endfor %}
          limit 0
      {%- endif -%}
      {% endmacro %}
    
  • core/dbt/parser/unit_tests.py - _build_fixture_raw_code

    Add in a section to identify the column_name_to_data_types from a supplied node if it has not been supplied.

    def _build_fixture_raw_code(self, rows, column_name_to_data_types, fixture_format, node):
    
        if fixture_format == UnitTestFormat.SQL:
          return rows
    
        if column_name_to_data_types is None and node:
            column_name_to_data_types = {
                col.name: col.data_type
                for col in node.columns.values()
                if col.data_type
            }
    
        return ("{{{{ get_fixture_sql({rows}, {column_name_to_data_types}) }}}}").format(
          rows=rows, column_name_to_data_types=column_name_to_data_types
        )
  • core/dbt/parser/unit_tests.py - parse_unit_test_case

    Pass the original input node into _build_fixture_raw_code when defining the common_fields dict.

    common_fields = {
        "resource_type": NodeType.Model,
        # root directory for input and output fixtures
        "original_file_path": unit_test_node.original_file_path,
        "config": ModelConfig(materialized="ephemeral"),
        "database": original_input_node.database,
        "alias": original_input_node.identifier,
        "schema": original_input_node.schema,
        "fqn": original_input_node.fqn,
        "checksum": FileHash.empty(),
        "raw_code": self._build_fixture_raw_code(given.rows, None, given.format, original_input_node),
        "package_name": original_input_node.package_name,
        "unique_id": f"model.{original_input_node.package_name}.{input_name}",
        "name": input_name,
        "path": f"{input_name}.sql",
    }

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions