Skip to content

Conversation

@gnought
Copy link
Contributor

@gnought gnought commented Aug 25, 2024

Using the below sample config. the temporary_iam_instance_profile_policy_document may not be immediately visible after a EC2 instance starts due to eventual consistency of PutRolePolicy and AddRoleToInstanceProfile. As a result, the amazon-ssm-agent service may fail to connect to SSM because the required SSM role does not available yet. This issue requires logging into the instance to manually restart the service or wait for 30 mins to self heal. (please see the packer log and ec2 amazon-ssm-agent log below)

This PR automatically creates a custom instance profile associated with AmazonSSMManagedInstanceCore managed policy when session_manager is used without specifying iam_instance_profile key attribute. If a user defines temporary_iam_instance_profile_policy_document, it will be added as an inline policy to the custom profile. This will solve the racing condition ensuring the amazon-ssm-agent service could consistently connect to SSM on the first start.

As a bonus, this PR also supports AWS China region, closing #50

sample config

ssh_interface           = "session_manager"
temporary_key_pair_type = "ed25519"
temporary_key_pair_bits = 384
// copy from AmazonSSMManagedInstanceCore managed policy
temporary_iam_instance_profile_policy_document {
  Version = "2012-10-17"
  Statement {
    Action = [
      "ssm:DescribeAssociation",
      "ssm:GetDeployablePatchSnapshotForInstance",
      "ssm:GetDocument",
      "ssm:DescribeDocument",
      "ssm:GetManifest",
      "ssm:GetParameter",
      "ssm:GetParameters",
      "ssm:ListAssociations",
      "ssm:ListInstanceAssociations",
      "ssm:PutInventory",
      "ssm:PutComplianceItems",
      "ssm:PutConfigurePackageResult",
      "ssm:UpdateAssociationStatus",
      "ssm:UpdateInstanceAssociationStatus",
      "ssm:UpdateInstanceInformation"
    ]
    Effect   = "Allow"
    Resource = ["*"]
  }
  Statement {
    Action = [
      "ssmmessages:CreateControlChannel",
      "ssmmessages:CreateDataChannel",
      "ssmmessages:OpenControlChannel",
      "ssmmessages:OpenDataChannel"
    ]
    Effect   = "Allow"
    Resource = ["*"]
  }
  Statement {
    Action = [
      "ec2messages:AcknowledgeMessage",
      "ec2messages:DeleteMessage",
      "ec2messages:FailMessage",
      "ec2messages:GetEndpoint",
      "ec2messages:GetMessages",
      "ec2messages:SendReply"
    ]
    Effect   = "Allow"
    Resource = ["*"]
  }
}

packer build log:

2024/08/26 00:56:29 packer-plugin-amazon_v1.3.3-dev_x5.0_darwin_arm64 plugin: 2024/08/26 00:56:29 Retryable error: TargetNotConnected: i-011a46c740a76676e is not connected.
2024/08/26 00:56:31 packer-plugin-amazon_v1.3.3-dev_x5.0_darwin_arm64 plugin: 2024/08/26 00:56:31 [DEBUG] TCP connection to SSH ip/port failed: dial tcp [::1]:8973: connect: connection refused
2024/08/26 00:56:36 packer-plugin-amazon_v1.3.3-dev_x5.0_darwin_arm64 plugin: 2024/08/26 00:56:36 [DEBUG] TCP connection to SSH ip/port failed: dial tcp [::1]:8973: connect: connection refused
2024/08/26 00:56:41 packer-plugin-amazon_v1.3.3-dev_x5.0_darwin_arm64 plugin: 2024/08/26 00:56:41 [DEBUG] TCP connection to SSH ip/port failed: dial tcp [::1]:8973: connect: connection refused

The ec2 amazon-ssm-agent log:

status code: 404, request id:
2024-08-25 16:54:21 ERROR EC2RoleProvider Failed to connect to Systems Manager with SSM role credentials. error calling RequestManagedInstanceRoleToken: AccessDeniedException: Systems Manager's instance management role is not configured for account: 1234567890
	status code: 400, request id: 906a00a0-9eec-42b7-b385-xxxxxxxxx
2024-08-25 16:54:21 ERROR [CredentialRefresher] Retrieve credentials produced error: no valid credentials could be retrieved for ec2 identity. Default Host Management Err: error calling RequestManagedInstanceRoleToken: AccessDeniedException: Systems Manager's instance management role is not configured for account: 1234567890
	status code: 400, request id: 906a00a0-9eec-42b7-b385-xxxxxxxxx
2024-08-25 16:54:21 INFO [CredentialRefresher] Sleeping for 27m6s before retrying retrieve credentials

@gnought gnought requested a review from a team as a code owner August 25, 2024 19:04
@gnought gnought force-pushed the fix/ssm_race_condition branch from 98a5b76 to f630f5c Compare October 21, 2024 07:26
@gnought gnought force-pushed the fix/ssm_race_condition branch from f630f5c to ffbf88b Compare January 10, 2025 17:31
@gnought gnought force-pushed the fix/ssm_race_condition branch from ffbf88b to 7aba92c Compare May 2, 2025 16:32
@gnought gnought force-pushed the fix/ssm_race_condition branch from 7aba92c to 28f2951 Compare August 4, 2025 10:46
@gnought gnought force-pushed the fix/ssm_race_condition branch from 28f2951 to 0b70ffc Compare October 13, 2025 12:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant