Skip to content

bug: ECR integration requires AssumeRole even with IRSA authentication #2291

@sljeff

Description

@sljeff

Provide environment information

System:

  • OS: Linux 6.12 Debian GNU/Linux 11 (bullseye) 11 (bullseye)
  • CPU: (4) x64 AMD EPYC 7R13 Processor
  • Memory: 4.57 GB / 7.55 GB
  • Container: Yes
  • Shell: 5.1.4 - /bin/bash

Binaries:

  • Node: 20.11.1 - /usr/local/bin/node
  • npm: 10.2.4 - /usr/local/bin/npm
  • pnpm: 8.15.5 - /usr/local/bin/pnpm

Deployment:

  • Trigger.dev: v4.0.0-beta.23
  • Helm Chart: v4.0.0-beta.18
  • Registry: AWS ECR
  • Authentication: EKS IRSA (IAM Roles for Service Accounts)

Describe the bug

When running npx trigger.dev@v4-beta deploy against a self-hosted instance on EKS using IRSA for ECR authentication, deployments fail if DEPLOY_REGISTRY_ECR_ASSUME_ROLE_ARN is not set.

The CLI returns the following error:

Failed to start deployment: Failed to get deployment image ref

And the webapp logs show a ValidationError from the AWS SDK:

{
  "assumeRole":{},
  "sessionName":"TriggerWebappECRAccess_1753172908266_70oxc8",
  "error":"1 validation error detected: Value null at 'roleArn' failed to satisfy constraint: Member must not be null",
  "http":{"requestId":"P4kJ62bmgEtC8hVTFmwrk","path":"/api/v1/deployments"},
  "level":"error",
  "message":"Failed to assume role"
}
{
  "cause":"1 validation error detected: Value null at 'roleArn' failed to satisfy constraint: Member must not be null",
  "level":"error",
  "message":"Failed to get deployment image ref"
}

This seems related to the code always attempting an AssumeRole operation, even when the pod already has the necessary ECR permissions via IRSA's default credential chain. This requires users to configure workarounds that add complexity and may differ from typical practices for IRSA.

Root Cause

The implementation in initializeDeployment.server.ts unconditionally passes an assumeRole object to getDeploymentImageRef, even when the corresponding environment variables are undefined:

// This always creates an object, never undefined
assumeRole: {
  roleArn: env.DEPLOY_REGISTRY_ECR_ASSUME_ROLE_ARN,     // undefined in my case
  externalId: env.DEPLOY_REGISTRY_ECR_ASSUME_ROLE_EXTERNAL_ID,  // also undefined
}

This prevents the code from using the default credential chain and causes the AWS SDK to throw a ValidationError because roleArn is null.

Use Case Comparison

It seems there might be two different use cases for ECR integration:

  1. Trigger.dev Cloud (Cross-Account): AssumeRole is necessary for the central webapp to access a user's ECR in another account. This works as expected.
  2. Self-Hosted on EKS (Same Account): The webapp and ECR are in the same account, and the pod already has direct permissions via IRSA. In this common setup, an AssumeRole operation is typically not needed.

The current implementation appears to be designed for the first use case.

Reproduction repo

Not applicable - this is a deployment configuration issue.

To reproduce

  1. Create IRSA for Trigger.dev on EKS:

    eksctl create iamserviceaccount \
      --cluster=my-cluster \
      --namespace=trigger-dev \
      --name=trigger-dev-webapp \
      --attach-policy-arn=arn:aws:iam::<ACCOUNT_ID>:policy/TriggerDevECRAccess \
      --approve
  2. Deploy Trigger.dev with Helm and Kustomize:

    values.yaml:

    registry:
      deploy: false
      repositoryNamespace: "trigger"
      external:
        host: "<ACCOUNT_ID>.dkr.ecr.<REGION>.amazonaws.com"
        auth:
          enabled: false

    kustomization.yaml:

    patches:
      - path: sa-patch.yaml
        target:
          kind: ServiceAccount
          name: trigger-dev-webapp

    sa-patch.yaml:

    apiVersion: v1
    kind: ServiceAccount
    metadata:
      name: trigger-dev-webapp
      annotations:
        eks.amazonaws.com/role-arn: "arn:aws:iam::<ACCOUNT_ID>:role/eksctl-my-cluster-addon-iamserviceaccount-trigger-dev-webapp"
  3. Attempt to deploy a project:

    npx trigger.dev@v4-beta deploy
  4. Observe the ValidationError in the webapp logs.

Expected Behavior

When DEPLOY_REGISTRY_ECR_ASSUME_ROLE_ARN is not set, npx trigger.dev@v4-beta deploy should succeed by using the pod's default AWS credential chain (provided by IRSA), without attempting an AssumeRole operation.

Additional information

Suggested Fix

A possible solution could be to only construct the assumeRole object when the DEPLOY_REGISTRY_ECR_ASSUME_ROLE_ARN environment variable is explicitly set. This would allow the default credential chain to be used when the variable is absent.

// In apps/webapp/app/v3/services/initializeDeployment.server.ts

+ const assumeRole = env.DEPLOY_REGISTRY_ECR_ASSUME_ROLE_ARN
+   ? {
+       roleArn: env.DEPLOY_REGISTRY_ECR_ASSUME_ROLE_ARN,
+       externalId: env.DEPLOY_REGISTRY_ECR_ASSUME_ROLE_EXTERNAL_ID,
+     }
+   : undefined;

  const [imageRefError, imageRefResult] = await tryCatch(
    getDeploymentImageRef({
      // ... other params
-     assumeRole: {
-       roleArn: env.DEPLOY_REGISTRY_ECR_ASSUME_ROLE_ARN,
-       externalId: env.DEPLOY_REGISTRY_ECR_ASSUME_ROLE_EXTERNAL_ID,
-     },
+     assumeRole,
    })
  );

Discussion on Workarounds

Without this change, users in a same-account IRSA setup need to implement workarounds that have notable drawbacks:

  1. Setting the ARN to the pod's own role: This requires configuring the role's trust policy to allow self-assumption, which is an uncommon pattern and adds extra STS API calls.
  2. Creating a dedicated intermediate role: This adds the complexity of creating and maintaining an additional IAM role.

Both approaches seem to add unnecessary complexity for a standard self-hosted EKS setup. It would be great if Trigger.dev could natively support the direct use of IRSA credentials, which aligns with the "optional STS assume role support" mentioned in PR #2224.

@nicktrn

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions