Skip to content

fix: normalize_column_name does not account for output case settings on non-Snowflake DBs #320

@ota2000

Description

@ota2000

What

normalize_column_name() does not perform case conversion on non-Snowflake databases, which breaks idempotency in inject_missing_columns, remove_columns_not_in_database, and synchronize_data_types when combined with output-to-upper or output-to-lower settings.

Reproduction scenario

PostgreSQL + output-to-upper: true:

  1. First run: DB returns zebraget_columns() key is zebra (normalize is a no-op) → inject_missing_columns adds node.columns["ZEBRA"] (output-to-upper applied)
  2. Second run: current_columns = {normalize("ZEBRA", "postgres")} = {"ZEBRA"}, incoming_name = "zebra""zebra" not in {"ZEBRA"} → True → re-added (overwritten)

Root cause

Both current_columns and incoming_columns are compared using normalize_column_name, but normalize_column_name is case-preserving on non-Snowflake databases. This means column keys transformed by output-to-upper/output-to-lower won't match the DB-derived normalized names.

Affected locations

  • src/dbt_osmosis/core/transforms.py: inject_missing_columns (L333-336)
  • src/dbt_osmosis/core/transforms.py: remove_columns_not_in_database (L379-382)
  • src/dbt_osmosis/core/transforms.py: synchronize_data_types (L535-538)

Notes

  • A naive case-insensitive comparison could break databases that distinguish quoted columns ("Foo" vs "foo")
  • This likely requires either rethinking normalize_column_name or introducing a separate comparison normalizer that accounts for output case settings

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions