Skip to content

fix(athena): exclude ICEBERG_FILESYSTEM_ERROR from outer retry#1740

Open
dtaniwaki wants to merge 3 commits intodbt-labs:mainfrom
dtaniwaki:fix/athena-iceberg-filesystem-error-no-retry
Open

fix(athena): exclude ICEBERG_FILESYSTEM_ERROR from outer retry#1740
dtaniwaki wants to merge 3 commits intodbt-labs:mainfrom
dtaniwaki:fix/athena-iceberg-filesystem-error-no-retry

Conversation

@dtaniwaki
Copy link
Contributor

resolves #1739
docs N/A

Thank you for maintaining this project! I'd appreciate your review on this small bug fix for the Athena adapter.

Problem

The outer retry decorator in AthenaCursor.execute() does not exclude ICEBERG_FILESYSTEM_ERROR from retries. When a transient error causes a retry of an Iceberg CTAS query, partial data is left at the same hardcoded S3 UUID location (Iceberg CTAS is non-atomic), and delete_from_s3 is only invoked during Jinja evaluation (first attempt only). All subsequent retries therefore hit ICEBERG_FILESYSTEM_ERROR: Cannot create a table on a non-empty location — and since this error was not excluded from the outer retry, it gets retried up to num_retries times against the same S3 path, always failing.

The existing comment already explains why TOO_MANY_OPEN_PARTITIONS is excluded for the same reason, but ICEBERG_FILESYSTEM_ERROR was never added to the exclusion list.

Note: this is distinct from the TOO_MANY_OPEN_PARTITIONS → batch fallback path handled by run_query_with_partitions_limit_catching. That path already works correctly. This issue surfaces when a different transient error triggers the outer retry on an Iceberg CTAS.

Solution

Add ICEBERG_FILESYSTEM_ERROR to the no-retry condition alongside TOO_MANY_OPEN_PARTITIONS. Since retrying the same SQL with the same S3 UUID location will always fail, failing fast avoids unnecessary retries and surfaces the original error more clearly.

Checklist

  • I have read the contributing guide and understand what's expected of me
  • I have run this code in development and it appears to resolve the stated issue
  • This PR includes tests, or tests are not required/relevant for this PR
  • This PR has no interface changes (e.g. macros, cli, logs, json artifacts, config files, adapter interface, etc) or this PR has already received feedback and approval from Product or DX

@cla-bot cla-bot bot added the cla:yes The PR author has signed the CLA label Mar 11, 2026
@dtaniwaki dtaniwaki marked this pull request as ready for review March 11, 2026 03:07
@dtaniwaki dtaniwaki requested a review from a team as a code owner March 11, 2026 03:07
Signed-off-by: Daisuke Taniwaki <daisuketaniwaki@gmail.com>
@dtaniwaki dtaniwaki force-pushed the fix/athena-iceberg-filesystem-error-no-retry branch from 1f5c493 to 9b54cd0 Compare March 11, 2026 03:13
@iconara
Copy link
Contributor

iconara commented Mar 11, 2026

I believe this will be indirectly fixed by #1637 (but the change would still be good to apply to the legacy mode introduced by that PR). In the new retry logic, only ICEBERG_COMMIT_ERROR is retried, all other Iceberg errors are assumed to be non-retryable. Iceberg errors are also not retried by the throttling retry logic anymore.

Signed-off-by: Daisuke Taniwaki <daisuketaniwaki@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla:yes The PR author has signed the CLA

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] dbt-athena: ICEBERG_FILESYSTEM_ERROR is retried by outer retry, causing repeated failures with the same S3 location

2 participants