-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Description
Is this related to an existing feature request or issue?
No existing issue
Summary
This RFC proposes a new MCP server (awslabs.mwaa-mcp-server) for Amazon Managed Workflows for Apache Airflow (MWAA). The server exposes 20 tools (13 read-only, 7 write) that let AI coding assistants monitor, troubleshoot, and operate Airflow workflows and MWAA environments through the AWS-native APIs — without requiring direct Airflow UI/CLI access or web login tokens.
Use case
MWAA users today must context-switch between their AI coding assistant and the Airflow UI or AWS Console to investigate failed DAG runs, check import errors, review task logs, or trigger retries. This breaks flow and slows down incident response.
With this MCP server, an AI assistant can:
- Investigate failures end-to-end — list DAG runs, find the failed run, list task instances, pull task logs — all in a single conversational flow.
- Monitor environment health — check for DAG import errors, list environments, review DAG schedules and states.
- Operate workflows (with
--allow-write) — trigger DAG runs, pause/unpause DAGs, clear failed task instances for retry. - Manage MWAA environments (with
--allow-write) — create, update, and delete MWAA environments. - Inspect Airflow metadata — list connections (with passwords redacted), variables (with sensitive values redacted).
This is particularly valuable for on-call engineers triaging pipeline failures and for data engineers iterating on DAG development.
Proposal
The server is a Python package (awslabs-mwaa-mcp-server) built on FastMCP. It communicates with MWAA through the AWS SDK (boto3) — specifically mwaa:ListEnvironments, mwaa:GetEnvironment, mwaa:CreateEnvironment, mwaa:UpdateEnvironment, mwaa:DeleteEnvironment, and mwaa:InvokeRestApi.
Architecture:
server.py— Entry point, CLI argument parsing (--allow-write), FastMCP server creation.environment_tools.py— 5 tools for MWAA environment discovery and lifecycle management.airflow_tools.py— 15 tools wrapping the Airflow REST API viainvoke_rest_api.consts.py— Path constants and environment variable names.aws_client.py— Shared boto3 client factory with custom user-agent.
Tools provided (20 total):
| Category | Tools | Mode |
|---|---|---|
| Environments | list-environments, get-environment |
read |
| Environments | create-environment, update-environment, delete-environment |
write |
| DAGs | list-dags, get-dag, get-dag-source |
read |
| DAGs | pause-dag, unpause-dag |
write |
| DAG Runs | list-dag-runs, get-dag-run |
read |
| DAG Runs | trigger-dag-run |
write |
| Task Instances | list-task-instances, get-task-instance |
read |
| Task Instances | clear-task-instances |
write |
| Task Logs | get-task-logs |
read |
| Import Errors | get-import-errors |
read |
| Connections | list-connections |
read |
| Variables | list-variables |
read |
Key design decisions:
- Read-only by default. All 7 write tools (
create-environment,update-environment,delete-environment,trigger-dag-run,pause-dag,unpause-dag,clear-task-instances) are gated behind--allow-write. Without it, these tools return an error explaining the flag is required. - No version prefix in API paths. The
invoke_rest_apiAPI handles Airflow version routing internally, so paths are bare (e.g.,/dagsnot/api/v1/dags). This ensures compatibility with both Airflow 2.x and 3.x environments. - Automatic environment resolution. When
environment_nameis omitted: (1) checkMWAA_ENVIRONMENTenv var, (2) auto-select if only one environment exists in the region, (3) error listing available environments. This minimizes configuration while staying explicit. - Security redaction. Connection passwords/extras and variable values are automatically redacted in responses to prevent credential leakage through the AI assistant.
- Enriched error messages.
RestApiClientExceptionerrors are enriched with HTTP status code and response body for faster debugging. - Safe defaults for clear-task-instances. Defaults to
dry_run=trueandonly_failed=trueso the LLM previews the impact before actually clearing, and only affects failed tasks unless explicitly told otherwise.
Before/after UX:
Before: User asks AI assistant about a failed DAG run → assistant has no MWAA access → user must open Airflow UI, navigate to the DAG, find the failed run, click into task instances, open logs, copy/paste back to assistant.
After: User asks AI assistant about a failed DAG run → assistant calls list-dag-runs → list-task-instances → get-task-logs → provides root cause analysis in seconds, all within the conversation.
Out of scope
- DAG code editing/deployment — This server reads and operates on existing DAGs; it does not modify DAG source files or manage S3 DAG uploads.
- Airflow provider/plugin management — No tools for installing or managing Airflow providers or custom plugins.
- Cross-region operations — Each server instance operates in a single AWS region. Multi-region setups require multiple server instances.
- Dataset and asset events — Airflow 2.x datasets / 3.x assets API endpoints are not included in this initial release.
Potential challenges
- Airflow API differences across versions. Airflow 2.x and 3.x have different REST API schemas (e.g., field names, response structures). The current implementation uses
invoke_rest_apiwhich abstracts version routing, but response parsing may need version-aware handling as edge cases emerge. - Rate limiting.
invoke_rest_apihas AWS API throttling limits. Heavy tool usage (e.g., polling DAG run status in a loop) could hit these limits. The server does not currently implement retry/backoff for throttling. - Large response payloads. Environments with hundreds of DAGs or thousands of task instances could produce large responses. The server supports
limit/offsetpagination but relies on the AI assistant to use them appropriately. - IAM permissions. Users need
airflow:InvokeRestApipermission scoped to their MWAA environment, which is a relatively new IAM action that some organizations may not have in their policies yet.
Dependencies and Integrations
Python dependencies (all already used by other servers in this repository):
mcp[cli] >=1.6.0— MCP SDKpydantic >=2.0.0— Input validationboto3 >=1.37.0, botocore >=1.37.0— AWS SDKloguru >=0.7.0— Logging
AWS service integrations:
mwaa:ListEnvironments— Environment discoverymwaa:GetEnvironment— Environment metadata and Airflow version detectionmwaa:CreateEnvironment— Environment provisioningmwaa:UpdateEnvironment— Environment configuration changesmwaa:DeleteEnvironment— Environment teardownmwaa:InvokeRestApi— All Airflow REST API operations
Complementary MCP servers:
aws-iac-mcp-server/cdk-mcp-server/cfn-mcp-server— For infrastructure-as-code provisioning of MWAA environmentscloudwatch-mcp-server— For CloudWatch metrics and alarms related to MWAAs3-tables-mcp-server— For managing DAG files in S3
Alternative solutions
1. Airflow CLI via `create-cli-token` — MWAA supports generating CLI tokens for direct Airflow CLI access. However, this exposes short-lived tokens through the AI assistant, requires parsing CLI text output, and doesn't work consistently across Airflow versions. The `invoke_rest_api` approach is AWS-native, returns structured JSON, and handles authentication transparently.
2. Direct Airflow REST API via `create-web-login-token` — MWAA can generate web login tokens for direct HTTP access to the Airflow REST API. This requires managing token lifecycle, constructing full API URLs, and handling version-specific paths. It also exposes the Airflow webserver URL and session tokens. `invoke_rest_api` wraps all of this behind a single AWS API call with IAM authentication.
3. Extending the existing `aws-api-mcp-server` — The generic AWS API server could theoretically call `invoke_rest_api`. However, it would lack Airflow-specific input validation, response redaction, environment auto-resolution, and the curated tool surface that makes the AI assistant effective. A dedicated server provides a much better user experience.Metadata
Metadata
Assignees
Labels
Type
Projects
Status