-
-
Notifications
You must be signed in to change notification settings - Fork 769
Description
Provide environment information
System:
- OS: Linux 6.12 Debian GNU/Linux 11 (bullseye) 11 (bullseye)
- CPU: (4) x64 AMD EPYC 7R13 Processor
- Memory: 4.57 GB / 7.55 GB
- Container: Yes
- Shell: 5.1.4 - /bin/bash
Binaries:
- Node: 20.11.1 - /usr/local/bin/node
- npm: 10.2.4 - /usr/local/bin/npm
- pnpm: 8.15.5 - /usr/local/bin/pnpm
Deployment:
- Trigger.dev: v4.0.0-beta.23
- Helm Chart: v4.0.0-beta.18
- Registry: AWS ECR
- Authentication: EKS IRSA (IAM Roles for Service Accounts)
Describe the bug
When running npx trigger.dev@v4-beta deploy
against a self-hosted instance on EKS using IRSA for ECR authentication, deployments fail if DEPLOY_REGISTRY_ECR_ASSUME_ROLE_ARN
is not set.
The CLI returns the following error:
Failed to start deployment: Failed to get deployment image ref
And the webapp logs show a ValidationError
from the AWS SDK:
{
"assumeRole":{},
"sessionName":"TriggerWebappECRAccess_1753172908266_70oxc8",
"error":"1 validation error detected: Value null at 'roleArn' failed to satisfy constraint: Member must not be null",
"http":{"requestId":"P4kJ62bmgEtC8hVTFmwrk","path":"/api/v1/deployments"},
"level":"error",
"message":"Failed to assume role"
}
{
"cause":"1 validation error detected: Value null at 'roleArn' failed to satisfy constraint: Member must not be null",
"level":"error",
"message":"Failed to get deployment image ref"
}
This seems related to the code always attempting an AssumeRole
operation, even when the pod already has the necessary ECR permissions via IRSA's default credential chain. This requires users to configure workarounds that add complexity and may differ from typical practices for IRSA.
Root Cause
The implementation in initializeDeployment.server.ts
unconditionally passes an assumeRole
object to getDeploymentImageRef
, even when the corresponding environment variables are undefined:
// This always creates an object, never undefined
assumeRole: {
roleArn: env.DEPLOY_REGISTRY_ECR_ASSUME_ROLE_ARN, // undefined in my case
externalId: env.DEPLOY_REGISTRY_ECR_ASSUME_ROLE_EXTERNAL_ID, // also undefined
}
This prevents the code from using the default credential chain and causes the AWS SDK to throw a ValidationError
because roleArn
is null.
Use Case Comparison
It seems there might be two different use cases for ECR integration:
- Trigger.dev Cloud (Cross-Account):
AssumeRole
is necessary for the central webapp to access a user's ECR in another account. This works as expected. - Self-Hosted on EKS (Same Account): The webapp and ECR are in the same account, and the pod already has direct permissions via IRSA. In this common setup, an
AssumeRole
operation is typically not needed.
The current implementation appears to be designed for the first use case.
Reproduction repo
Not applicable - this is a deployment configuration issue.
To reproduce
-
Create IRSA for Trigger.dev on EKS:
eksctl create iamserviceaccount \ --cluster=my-cluster \ --namespace=trigger-dev \ --name=trigger-dev-webapp \ --attach-policy-arn=arn:aws:iam::<ACCOUNT_ID>:policy/TriggerDevECRAccess \ --approve
-
Deploy Trigger.dev with Helm and Kustomize:
values.yaml
:registry: deploy: false repositoryNamespace: "trigger" external: host: "<ACCOUNT_ID>.dkr.ecr.<REGION>.amazonaws.com" auth: enabled: false
kustomization.yaml
:patches: - path: sa-patch.yaml target: kind: ServiceAccount name: trigger-dev-webapp
sa-patch.yaml
:apiVersion: v1 kind: ServiceAccount metadata: name: trigger-dev-webapp annotations: eks.amazonaws.com/role-arn: "arn:aws:iam::<ACCOUNT_ID>:role/eksctl-my-cluster-addon-iamserviceaccount-trigger-dev-webapp"
-
Attempt to deploy a project:
npx trigger.dev@v4-beta deploy
-
Observe the
ValidationError
in the webapp logs.
Expected Behavior
When DEPLOY_REGISTRY_ECR_ASSUME_ROLE_ARN
is not set, npx trigger.dev@v4-beta deploy
should succeed by using the pod's default AWS credential chain (provided by IRSA), without attempting an AssumeRole
operation.
Additional information
Suggested Fix
A possible solution could be to only construct the assumeRole
object when the DEPLOY_REGISTRY_ECR_ASSUME_ROLE_ARN
environment variable is explicitly set. This would allow the default credential chain to be used when the variable is absent.
// In apps/webapp/app/v3/services/initializeDeployment.server.ts
+ const assumeRole = env.DEPLOY_REGISTRY_ECR_ASSUME_ROLE_ARN
+ ? {
+ roleArn: env.DEPLOY_REGISTRY_ECR_ASSUME_ROLE_ARN,
+ externalId: env.DEPLOY_REGISTRY_ECR_ASSUME_ROLE_EXTERNAL_ID,
+ }
+ : undefined;
const [imageRefError, imageRefResult] = await tryCatch(
getDeploymentImageRef({
// ... other params
- assumeRole: {
- roleArn: env.DEPLOY_REGISTRY_ECR_ASSUME_ROLE_ARN,
- externalId: env.DEPLOY_REGISTRY_ECR_ASSUME_ROLE_EXTERNAL_ID,
- },
+ assumeRole,
})
);
Discussion on Workarounds
Without this change, users in a same-account IRSA setup need to implement workarounds that have notable drawbacks:
- Setting the ARN to the pod's own role: This requires configuring the role's trust policy to allow self-assumption, which is an uncommon pattern and adds extra STS API calls.
- Creating a dedicated intermediate role: This adds the complexity of creating and maintaining an additional IAM role.
Both approaches seem to add unnecessary complexity for a standard self-hosted EKS setup. It would be great if Trigger.dev could natively support the direct use of IRSA credentials, which aligns with the "optional STS assume role support" mentioned in PR #2224.