RFC: Amazon MWAA MCP Server

### Is this related to an existing feature request or issue?

No existing issue

### Summary

This RFC proposes a new MCP server (`awslabs.mwaa-mcp-server`) for Amazon Managed Workflows for Apache Airflow (MWAA). The server exposes 20 tools (13 read-only, 7 write) that let AI coding assistants monitor, troubleshoot, and operate Airflow workflows and MWAA environments through the AWS-native APIs — without requiring direct Airflow UI/CLI access or web login tokens.

### Use case

MWAA users today must context-switch between their AI coding assistant and the Airflow UI or AWS Console to investigate failed DAG runs, check import errors, review task logs, or trigger retries. This breaks flow and slows down incident response.

With this MCP server, an AI assistant can:
- **Investigate failures end-to-end** — list DAG runs, find the failed run, list task instances, pull task logs — all in a single conversational flow.
- **Monitor environment health** — check for DAG import errors, list environments, review DAG schedules and states.
- **Operate workflows** (with `--allow-write`) — trigger DAG runs, pause/unpause DAGs, clear failed task instances for retry.
- **Manage MWAA environments** (with `--allow-write`) — create, update, and delete MWAA environments.
- **Inspect Airflow metadata** — list connections (with passwords redacted), variables (with sensitive values redacted).

This is particularly valuable for on-call engineers triaging pipeline failures and for data engineers iterating on DAG development.

### Proposal

The server is a Python package (`awslabs-mwaa-mcp-server`) built on FastMCP. It communicates with MWAA through the AWS SDK (`boto3`) — specifically `mwaa:ListEnvironments`, `mwaa:GetEnvironment`, `mwaa:CreateEnvironment`, `mwaa:UpdateEnvironment`, `mwaa:DeleteEnvironment`, and `mwaa:InvokeRestApi`.

**Architecture:**
- `server.py` — Entry point, CLI argument parsing (`--allow-write`), FastMCP server creation.
- `environment_tools.py` — 5 tools for MWAA environment discovery and lifecycle management.
- `airflow_tools.py` — 15 tools wrapping the Airflow REST API via `invoke_rest_api`.
- `consts.py` — Path constants and environment variable names.
- `aws_client.py` — Shared boto3 client factory with custom user-agent.

**Tools provided (20 total):**

| Category | Tools | Mode |
|----------|-------|------|
| Environments | `list-environments`, `get-environment` | read |
| Environments | `create-environment`, `update-environment`, `delete-environment` | write |
| DAGs | `list-dags`, `get-dag`, `get-dag-source` | read |
| DAGs | `pause-dag`, `unpause-dag` | write |
| DAG Runs | `list-dag-runs`, `get-dag-run` | read |
| DAG Runs | `trigger-dag-run` | write |
| Task Instances | `list-task-instances`, `get-task-instance` | read |
| Task Instances | `clear-task-instances` | write |
| Task Logs | `get-task-logs` | read |
| Import Errors | `get-import-errors` | read |
| Connections | `list-connections` | read |
| Variables | `list-variables` | read |

**Key design decisions:**
- **Read-only by default.** All 7 write tools (`create-environment`, `update-environment`, `delete-environment`, `trigger-dag-run`, `pause-dag`, `unpause-dag`, `clear-task-instances`) are gated behind `--allow-write`. Without it, these tools return an error explaining the flag is required.
- **No version prefix in API paths.** The `invoke_rest_api` API handles Airflow version routing internally, so paths are bare (e.g., `/dags` not `/api/v1/dags`). This ensures compatibility with both Airflow 2.x and 3.x environments.
- **Automatic environment resolution.** When `environment_name` is omitted: (1) check `MWAA_ENVIRONMENT` env var, (2) auto-select if only one environment exists in the region, (3) error listing available environments. This minimizes configuration while staying explicit.
- **Security redaction.** Connection passwords/extras and variable values are automatically redacted in responses to prevent credential leakage through the AI assistant.
- **Enriched error messages.** `RestApiClientException` errors are enriched with HTTP status code and response body for faster debugging.
- **Safe defaults for clear-task-instances.** Defaults to `dry_run=true` and `only_failed=true` so the LLM previews the impact before actually clearing, and only affects failed tasks unless explicitly told otherwise.

**Before/after UX:**

*Before:* User asks AI assistant about a failed DAG run → assistant has no MWAA access → user must open Airflow UI, navigate to the DAG, find the failed run, click into task instances, open logs, copy/paste back to assistant.

*After:* User asks AI assistant about a failed DAG run → assistant calls `list-dag-runs` → `list-task-instances` → `get-task-logs` → provides root cause analysis in seconds, all within the conversation.

### Out of scope

- **DAG code editing/deployment** — This server reads and operates on existing DAGs; it does not modify DAG source files or manage S3 DAG uploads.
- **Airflow provider/plugin management** — No tools for installing or managing Airflow providers or custom plugins.
- **Cross-region operations** — Each server instance operates in a single AWS region. Multi-region setups require multiple server instances.
- **Dataset and asset events** — Airflow 2.x datasets / 3.x assets API endpoints are not included in this initial release.

### Potential challenges

- **Airflow API differences across versions.** Airflow 2.x and 3.x have different REST API schemas (e.g., field names, response structures). The current implementation uses `invoke_rest_api` which abstracts version routing, but response parsing may need version-aware handling as edge cases emerge.
- **Rate limiting.** `invoke_rest_api` has AWS API throttling limits. Heavy tool usage (e.g., polling DAG run status in a loop) could hit these limits. The server does not currently implement retry/backoff for throttling.
- **Large response payloads.** Environments with hundreds of DAGs or thousands of task instances could produce large responses. The server supports `limit`/`offset` pagination but relies on the AI assistant to use them appropriately.
- **IAM permissions.** Users need `airflow:InvokeRestApi` permission scoped to their MWAA environment, which is a relatively new IAM action that some organizations may not have in their policies yet.

### Dependencies and Integrations

**Python dependencies** (all already used by other servers in this repository):
- `mcp[cli] >=1.6.0` — MCP SDK
- `pydantic >=2.0.0` — Input validation
- `boto3 >=1.37.0, botocore >=1.37.0` — AWS SDK
- `loguru >=0.7.0` — Logging

**AWS service integrations:**
- `mwaa:ListEnvironments` — Environment discovery
- `mwaa:GetEnvironment` — Environment metadata and Airflow version detection
- `mwaa:CreateEnvironment` — Environment provisioning
- `mwaa:UpdateEnvironment` — Environment configuration changes
- `mwaa:DeleteEnvironment` — Environment teardown
- `mwaa:InvokeRestApi` — All Airflow REST API operations

**Complementary MCP servers:**
- `aws-iac-mcp-server` / `cdk-mcp-server` / `cfn-mcp-server` — For infrastructure-as-code provisioning of MWAA environments
- `cloudwatch-mcp-server` — For CloudWatch metrics and alarms related to MWAA
- `s3-tables-mcp-server` — For managing DAG files in S3

### Alternative solutions

```markdown
1. Airflow CLI via `create-cli-token` — MWAA supports generating CLI tokens for direct Airflow CLI access. However, this exposes short-lived tokens through the AI assistant, requires parsing CLI text output, and doesn't work consistently across Airflow versions. The `invoke_rest_api` approach is AWS-native, returns structured JSON, and handles authentication transparently.

2. Direct Airflow REST API via `create-web-login-token` — MWAA can generate web login tokens for direct HTTP access to the Airflow REST API. This requires managing token lifecycle, constructing full API URLs, and handling version-specific paths. It also exposes the Airflow webserver URL and session tokens. `invoke_rest_api` wraps all of this behind a single AWS API call with IAM authentication.

3. Extending the existing `aws-api-mcp-server` — The generic AWS API server could theoretically call `invoke_rest_api`. However, it would lack Airflow-specific input validation, response redaction, environment auto-resolution, and the curated tool surface that makes the AI assistant effective. A dedicated server provides a much better user experience.
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: Amazon MWAA MCP Server #2507

Is this related to an existing feature request or issue?

Summary

Use case

Proposal

Out of scope

Potential challenges

Dependencies and Integrations

Alternative solutions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Category	Tools	Mode
Environments	`list-environments`, `get-environment`	read
Environments	`create-environment`, `update-environment`, `delete-environment`	write
DAGs	`list-dags`, `get-dag`, `get-dag-source`	read
DAGs	`pause-dag`, `unpause-dag`	write
DAG Runs	`list-dag-runs`, `get-dag-run`	read
DAG Runs	`trigger-dag-run`	write
Task Instances	`list-task-instances`, `get-task-instance`	read
Task Instances	`clear-task-instances`	write
Task Logs	`get-task-logs`	read
Import Errors	`get-import-errors`	read
Connections	`list-connections`	read
Variables	`list-variables`	read

RFC: Amazon MWAA MCP Server #2507

Description

Is this related to an existing feature request or issue?

Summary

Use case

Proposal

Out of scope

Potential challenges

Dependencies and Integrations

Alternative solutions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions