Skip to content

Fix N+1 query pattern in task instance states and count endpoints#60352

Open
steveahnahn wants to merge 3 commits intoapache:mainfrom
steveahnahn:fix/n-plus-1-query-task-instances
Open

Fix N+1 query pattern in task instance states and count endpoints#60352
steveahnahn wants to merge 3 commits intoapache:mainfrom
steveahnahn:fix/n-plus-1-query-task-instances

Conversation

@steveahnahn
Copy link
Contributor

Problem

The previous implementation fetched all task instances from the database and then filtered by map_index in Python. For DAGs with mapped tasks containing large map indices, this caused unnecessary database load and memory usage.

Solution

Push the map_index filter to the SQL query, allowing the database to handle filtering efficiently:

  • Move map_index filtering from Python to SQL in get_task_instance_states and get_task_instance_count endpoints
  • Add map_index parameter to _get_group_tasks helper function to filter at the database level

@boring-cyborg boring-cyborg bot added the area:API Airflow's REST/HTTP API label Jan 9, 2026
@SameerMesiah97
Copy link
Contributor

This looks good at first glance. But as the filter based on map index is a new addition (and likely not to be covered by existing tests), I think it would be worth adding a small test to lock in the expected behavior and guard against future regressions.

In particular, it would be a good idea to cover:

  1. A mapped task within a task group that produces multiple task instances (i.e. multiple map indices for the same task ID).
  2. The default behavior when map_index is not provided, ensuring all relevant task instances are returned (including unmapped tasks).

For (1), you could test both code paths by passing and omitting task_group_id. If feasible (and if doesn't make the test too bulky), you could cover all scenarios in a single parametrized test.

@steveahnahn
Copy link
Contributor Author

Thanks for the review, the scenarios you mentioned are already covered in airflow-core/tests/unit/api_fastapi/execution_api/versions/head/test_task_instances.py:

  • test_get_count_mix_of_task_and_task_group_dynamic_task_mapping
  • test_get_task_states_mix_of_task_and_task_group_dynamic_task_mapping

The existing parametrized cases cover mapped tasks with multiple map indices, default behavior without map_index, and filtering with task_group_id. These tests will now exercise the new db-level filtering path.

I did end up adding one missed test combination, filtering by map_index without providing task_group_id

@github-actions
Copy link

github-actions bot commented Mar 1, 2026

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale Stale PRs per the .github/workflows/stale.yml policy file label Mar 1, 2026
@steveahnahn
Copy link
Contributor Author

@SameerMesiah97 it has been some time but tagging for an approval, thanks!

@github-actions github-actions bot removed the stale Stale PRs per the .github/workflows/stale.yml policy file label Mar 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:API Airflow's REST/HTTP API

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants