Skip to content

[Bug] safe_create_table_as skips partition handling for tmp tables, causing O(N) scan cost in batch incremental #1744

@dtaniwaki

Description

@dtaniwaki

Is this a new bug?

  • I believe this is a new bug
  • I have searched the existing issues, and I could not find an existing issue for this bug

Which packages are affected?

  • dbt-adapters
  • dbt-tests-adapter
  • dbt-athena
  • dbt-athena-community
  • dbt-bigquery
  • dbt-postgres
  • dbt-redshift
  • dbt-snowflake
  • dbt-spark

Current Behavior

In safe_create_table_as, the temporary=True branch always creates the tmp table (__dbt_tmp) without partitioning by calling create_table_as(..., skip_partitioning=True) directly, bypassing run_query_with_partitions_limit_catching entirely.

-- Simplified before-state
{%- if temporary -%}
  {%- do run_query(create_table_as(temporary, relation, compiled_code, language, true)) -%}
  -- skip_partitioning=True: ignores partitioned_by config, no TOO_MANY_OPEN_PARTITIONS handling
{%- else -%}
  {%- set result = adapter.run_query_with_partitions_limit_catching(...) -%}
  ...
{%- endif -%}

For models with partitioned_by configured and using insert_overwrite incremental strategy, batch_incremental_insert reads from __dbt_tmp once per partition batch:

INSERT INTO {{ target }}
SELECT * FROM {{ tmp }}
WHERE partition_col IN (...)

Because __dbt_tmp has no partitions, every batch performs a full scan of the tmp table. With N batches, total Athena scan cost is (N + 1) × full scan.

Additionally, if the source query yields many distinct partition values, the unguarded CTAS fails immediately with TOO_MANY_OPEN_PARTITIONS instead of falling back to create_table_as_with_partitions.

Expected Behavior

The tmp table should be created with partitions when partitioned_by is configured, so that:

  • Each batch in batch_incremental_insert benefits from partition pruning, reducing total scan cost to approximately 2 × full scan regardless of N.
  • TOO_MANY_OPEN_PARTITIONS is handled consistently with the non-temporary path (fallback to batch creation).

Models without partitioned_by should be unaffected — they produce an unpartitioned CTAS as before.

Steps To Reproduce

  1. Configure an Athena model with:
    config:
      materialized: incremental
      incremental_strategy: insert_overwrite
      partitioned_by: [date_col]
  2. Run dbt run with a large dataset that yields many distinct partition values.
  3. Observe Athena query history — each batch INSERT scans the full __dbt_tmp table instead of a single partition slice.

Relevant log output

Environment

- OS: macOS
- Python: 3.12.8
- dbt-adapters: 1.22.8
- dbt-athena: 1.9.4

Additional Context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    triage:productIn Product's queuetype:bugSomething isn't working as documented

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions