-
Notifications
You must be signed in to change notification settings - Fork 286
Description
Is this a new bug?
- I believe this is a new bug
- I have searched the existing issues, and I could not find an existing issue for this bug
Which packages are affected?
- dbt-adapters
- dbt-tests-adapter
- dbt-athena
- dbt-athena-community
- dbt-bigquery
- dbt-postgres
- dbt-redshift
- dbt-snowflake
- dbt-spark
Current Behavior
In safe_create_table_as, the temporary=True branch always creates the tmp table (__dbt_tmp) without partitioning by calling create_table_as(..., skip_partitioning=True) directly, bypassing run_query_with_partitions_limit_catching entirely.
-- Simplified before-state
{%- if temporary -%}
{%- do run_query(create_table_as(temporary, relation, compiled_code, language, true)) -%}
-- skip_partitioning=True: ignores partitioned_by config, no TOO_MANY_OPEN_PARTITIONS handling
{%- else -%}
{%- set result = adapter.run_query_with_partitions_limit_catching(...) -%}
...
{%- endif -%}For models with partitioned_by configured and using insert_overwrite incremental strategy, batch_incremental_insert reads from __dbt_tmp once per partition batch:
INSERT INTO {{ target }}
SELECT * FROM {{ tmp }}
WHERE partition_col IN (...)Because __dbt_tmp has no partitions, every batch performs a full scan of the tmp table. With N batches, total Athena scan cost is (N + 1) × full scan.
Additionally, if the source query yields many distinct partition values, the unguarded CTAS fails immediately with TOO_MANY_OPEN_PARTITIONS instead of falling back to create_table_as_with_partitions.
Expected Behavior
The tmp table should be created with partitions when partitioned_by is configured, so that:
- Each batch in
batch_incremental_insertbenefits from partition pruning, reducing total scan cost to approximately 2 × full scan regardless of N. TOO_MANY_OPEN_PARTITIONSis handled consistently with the non-temporary path (fallback to batch creation).
Models without partitioned_by should be unaffected — they produce an unpartitioned CTAS as before.
Steps To Reproduce
- Configure an Athena model with:
config: materialized: incremental incremental_strategy: insert_overwrite partitioned_by: [date_col]
- Run
dbt runwith a large dataset that yields many distinct partition values. - Observe Athena query history — each batch INSERT scans the full
__dbt_tmptable instead of a single partition slice.
Relevant log output
Environment
- OS: macOS
- Python: 3.12.8
- dbt-adapters: 1.22.8
- dbt-athena: 1.9.4Additional Context
No response