Skip to content

Conversation

ggiallo28
Copy link

@ggiallo28 ggiallo28 commented Aug 28, 2025

Feature or Bugfix

  • Feature

Detail

  • Added start_query_executions to submit multiple Athena queries in one call.
  • Enabled parallel query submission and wait, significantly reducing end-to-end execution time.
  • Introduced configurable concurrency to adapt performance to available system resources.

Relates

  • Improves efficiency and responsiveness for workflows requiring multiple Athena queries.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

Introduce `wr.athena.start_query_executions` as a parallelized variant of
`start_query_execution`. It allows submitting multiple queries in one call,
with support for:

- Sequential or threaded submission (`use_threads`)
- Lazy or eager consumption of results (`as_iterator`)
- Per-query `client_request_token` (string or list)
- Optional workgroup checks (`check_workgroup`, `enforce_workgroup`)
- Full Athena cache integration

This improves performance when dispatching batches of queries by reducing
workgroup lookups and enabling concurrent execution.
…nd parallel wait

- Simplified client_request_token handling:
  - Removed manual padding/truncation.
  - Let Athena enforce length constraints.
  - Tokens generated as `<base_token>-<index>` or provided as list.
- Improved wait logic:
  - Added optional wait handling directly inside _submit.
  - Queries can now be waited in parallel with submission (reduced overhead).
- Configurable default threads:
  - Replaced hardcoded defaults with os.cpu_count().
  - Added support for AWSWRANGLER_THREADS_DEFAULT env var override.
- Removed unused `reduce` import from Athena module.
- Applied ruff formatting to `start_query_executions`.
- Fixed static check issues to pass CI.
- Added ruff check on Athena tests file.
Copy link
Contributor

@kukushking kukushking left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, thank you for opening this PR!

What is the reason to build this into the SDK? Is this a common use case to submit x queries in parallel? Feels like it is specific to your concrete application logic

@ggiallo28
Copy link
Author

Hi @kukushking, thanks for the review!

I see that this pattern shows up in data workflows where many short Athena queries must run together. A few common examples:

  • Dashboard refresh/precompute: populate multiple queries that feed BI tiles concurrently. Running them one by one slows down the process and forces repeated checks for each query even when I make small changes in the queries, like checking the workgroup every time, and so on.
  • Data quality checks: run the same validation across dozens of tables/prefixes in parallel.

API symmetry & ergonomics:
Wrangler already provides athena.get_query_executions for fetching many query details in parallel. start_query_executions is the natural counterpart for submitting many queries at once.

Parallel submission and coordinated wait help improve performance while respecting quotas via a configurable concurrency. The implementation is opt-in, non-breaking, and uses the same guardrails already present for single-query flows.

Why I contributed this:
I took inspiration from awswrangler.athena.get_query_executions(...), found myself repeatedly re-implementing the batch submit and wait pattern for data use cases, and decided to contribute a reusable, documented version back to the community.

Happy to adjust naming, docs, or move it behind a helper/recipe if you prefer, but I believe the symmetry and the prevalence of these data workflows justify having this in the SDK. In the future, it would be great to support retrieving results in the same parallel manner.

@jaidisido
Copy link
Contributor

AWS CodeBuild CI Report

  • CodeBuild project: GitHubCodeBuild8756EF16-4rfo0GHQ0u9a
  • Commit ID: b8a607f
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@jaidisido
Copy link
Contributor

AWS CodeBuild CI Report

  • CodeBuild project: GitHubCodeBuild8756EF16-4rfo0GHQ0u9a
  • Commit ID: 89449cb
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@jaidisido
Copy link
Contributor

AWS CodeBuild CI Report

  • CodeBuild project: GitHubCodeBuild8756EF16-4rfo0GHQ0u9a
  • Commit ID: f20d694
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@jaidisido
Copy link
Contributor

AWS CodeBuild CI Report

  • CodeBuild project: GitHubDistributedCodeBuild6-jWcl5DLmvupS
  • Commit ID: b8a607f
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@jaidisido
Copy link
Contributor

AWS CodeBuild CI Report

  • CodeBuild project: GitHubDistributedCodeBuild6-jWcl5DLmvupS
  • Commit ID: 89449cb
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@jaidisido
Copy link
Contributor

AWS CodeBuild CI Report

  • CodeBuild project: GitHubDistributedCodeBuild6-jWcl5DLmvupS
  • Commit ID: f20d694
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants