-
Notifications
You must be signed in to change notification settings - Fork 624
✨ Add support for EKSConfig LaunchTemplate bootstrapping for AL2023 using nodeadm #5553
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
✨ Add support for EKSConfig LaunchTemplate bootstrapping for AL2023 using nodeadm #5553
Conversation
Hi @AmitSahastra. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
/ok-to-test |
/test pull-cluster-api-provider-aws-e2e-blocking |
@@ -24,6 +24,9 @@ import ( | |||
|
|||
// EKSConfigSpec defines the desired state of Amazon EKS Bootstrap Configuration. | |||
type EKSConfigSpec struct { | |||
// NodeType specifies the type of node (e.g., "al2023") | |||
// +optional | |||
NodeType string `json:"nodeType,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any way to derive this from the AMI being used rather than asking the user to specify in the API?
Co-authored-by: Michel Loiseleur <[email protected]>
Code refactor for NodeTypeAL2023 enum
As per documentation at https://github.com/awslabs/amazon-eks-ami/blob/v20250813/nodeadm/api/v1alpha1/nodeconfig_types.go#L52-L53: ``` // CIDR is your cluster's service CIDR block. This value is used to infer your cluster's DNS address. CIDR string `json:"cidr,omitempty"` ``` Previously setting it to the VPC CIDR was breaking DNS resolution in pods because they were expecting CoreDNS at 10.0.0.10 (10th IP in VPC CIDR) rather than the 10th IP in the service CIDR. Also change the default service CIDR to EKS default of 172.20.0.0/12.
Hey folks - curious as to when we're looking at seeing this merged / released? |
/assign @damdo @nrb @richardcase @AndiDog @dlipovetsky @Ankitasw What do you think of this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR, I think we polish a bit the API first, then we can adjust the logic according to it
// NodeType specifies the type of nodeq | ||
// +kubebuilder:validation:Enum=al2023 | ||
type NodeType string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my comments below regarding the naming of this field
const ( | ||
// NodeTypeAL2023 represents the AL2023 node type. | ||
NodeTypeAL2023 NodeType = "al2023" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here a NodeTypeAL2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could use the ones you defined for the AMIFamilyType also here
// AMIFamilyAL2 is the Amazon Linux 2 AMI family.
AMIFamilyAL2 = "AmazonLinux2"
// AMIFamilyAL2023 is the Amazon Linux 2023 AMI family.
AMIFamilyAL2023 = "AmazonLinux2023"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// NodeType specifies the type of node (e.g., "al2023") | ||
// +optional | ||
NodeType NodeType `json:"nodeType,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think NodeType
is a bit weak as a field name because Type is quite generic.
How about AMIFamilyType like the field we have in the NodeInput struct? So then we can directly pass that down to it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We would also need to specify what the default is (I guess AmazonLinux2
) and an enum validation for this.
// +kubebuilder:validation:Enum="";AmazonLinux2;AmazonLinux2023
// +optional
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The enum is added on the type definition (https://github.com/kubernetes-sigs/cluster-api-provider-aws/pull/5553/files#diff-64e2d3d6a1323c5ae37a84cfc459f926fe7e329e83a4f695df28de89f19a3172R26). Only al2023
is allowed at this point, no other values are required to change behaviour, default (unset) is current behaviour.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This API is a little confusing to me. We're essentially specifying an AMI family here. We have a field like this already in AWSMachine
EKSOptimizedLookupType *EKSAMILookupType `json:"eksLookupType,omitempty"` |
I think we shouldn't repeat this information here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another thing that's worth noting here is that Amazon Linux 2 will be EOL coming up very soon - june 2026 source: https://aws.amazon.com/amazon-linux-2/faqs/
It seems like EKS has largely moved away from this AL2 support to either AL2023 or bottlerocket.
Bottlerocket seems to have its own set backs see kubernetes-sigs/cluster-api#7840
so realistically the only option is going to be AL2023.
EDIT:
[!IMPORTANT] Amazon EKS will no longer publish EKS-optimized Amazon Linux 2 (AL2) AMIs after November 26th, 2025. Additionally, Kubernetes version 1.32 is the last version for which Amazon EKS will release AL2 AMIs. From version 1.33 onwards, Amazon EKS will continue to release AL2023 and Bottlerocket based AMIs. To learn more, see Guide to EKS AL2 & AL2-Accelerated AMIs transition features.
source: https://awslabs.github.io/amazon-eks-ami/
it looks like they won't be building this AMI
// AMIFamilyAL2 is the Amazon Linux 2 AMI family. | ||
AMIFamilyAL2 = "AmazonLinux2" | ||
// AMIFamilyAL2023 is the Amazon Linux 2023 AMI family. | ||
AMIFamilyAL2023 = "AmazonLinux2023" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's move them in the outer API and leverage them here too
// AL2023 specific fields | ||
AMIImageID string | ||
APIServerEndpoint string | ||
Boundary string | ||
CACert string | ||
CapacityType *v1beta2.ManagedMachinePoolCapacityType | ||
ClusterCIDR string // CIDR range for the cluster | ||
ClusterDNS string | ||
MaxPods *int32 | ||
NodeGroupName string | ||
NodeLabels string // Not exposed in CRD, computed from user input |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, is there a way to make this more specific to AmazonLinux2023 if that's only for it?
How about a discriminated union?
Have we thought about what's going to happen if there is going to be, for example an AmazonLinux2027 or similar, which may have different fields from this one?
@@ -56,3 +56,40 @@ clusterctl init --infrastructure aws | |||
``` | |||
|
|||
NOTE: you will need to enable the creation of the default Fargate IAM role. The easiest way is using `clusterawsadm` and using the `fargate` configuration option, for instructions see the [prerequisites](../using-clusterawsadm-to-fulfill-prerequisites.md). | |||
|
|||
### Amazon Linux 2023 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd call it something more generic depending on what we agree on for the API, like AMIFamilyType, and then we can have various subsections for each type.
@@ -199,6 +203,70 @@ func TestEKSConfigReconciler(t *testing.T) { | |||
gomega.Expect(string(secret.Data["value"])).To(Not(Equal(string(expectedUserData)))) | |||
}).Should(Succeed()) | |||
}) | |||
|
|||
t.Run("Should return requeue when control plane is not initialized", func(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can also add webhook tests for the values we are going to be validating for the AMIFamilyType
// Set node type to AL2023 to trigger requeue | ||
config.Spec.NodeType = "al2023" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will need changing based on the API we agree on
// validateAL2023Input validates the input for AL2023 user data generation. | ||
func validateAL2023Input(input *NodeInput) error { | ||
if input.APIServerEndpoint == "" { | ||
return fmt.Errorf("API server endpoint is required for AL2023") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Everywhere in this PR it would be better to use the full extended AmazonLinux2023, rather than AL2023, same goes for AL2 vs AmazonLinux2.
🐛 Use cluster service CIDR in NodeConfig CIDR
Adding label Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
} | ||
|
||
// Get AMI ID from AWSManagedMachinePool's launch template if specified | ||
if configOwner.GetKind() == "AWSManagedMachinePool" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any reason that users who don' use managed machine pools can't set AMI IDs like users of AWSMachineTemplate
?
Generally speaking I think we can simplify this to just support the nodeadm script because the other stuff isn't going to be applicable. Please take a look at some of the work I did here that attempts to simplify some of this stuff faiq@cf6a280 |
AFAIK nodeadm will only apply to AL2023 so if using other distros they'll still use cloud-init. I think so anyway 😅 |
I think they'll still be using nodeadm 🤔 so from my POV there's 2 things happening here
i could be wrong, but that's how im interpreting the situation 🤷♂️ |
{{.}} | ||
{{- end}} | ||
{{- range .PostBootstrapCommands}} | ||
{{.}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cloud-init handles the text/x-shellscript
content type and all of that is going to run before Nodeadm runs to join the cluster. I think we should remove the PostBootstrapCommands
because of this.
Running the nodeadm command and then running the postBootstrapCommands will likely be very cumbersome (disabling the service) and subject to a lot of downstream changes which will be difficult to maintain
Hey @AmitSahastra after looking and thinking about this PR a little bit more I think the right approach here would be to create a new API type that handles bootstrapping for The I think the cleanest way to support this would be to create a new type so when Amazon Linux and |
I wrote this document that outlines the need for a new bootstrap API here https://docs.google.com/document/d/1n2v0Q4D5VAzydu7aJno3gRjMlSHueJ8BlGIOrKLDaiQ/edit?tab=t.0#heading=h.ly39fzd1bgc0 |
@damdo @faiq @jimmidyson |
@pavansokkenagaraj that's a good point however it seems that bottlerocket support in cluster API is set back due to kubernetes-sigs/cluster-api#7840 as well as this kubernetes-sigs/cluster-api#5294 Given that it's not going to use a cloud-init based configuration, we should consider that out of the scope of this new API |
Description
This PR adds support for using launch template EKSConfig bootstrapping for Amazon Linux 2023 nodes. The EKSConfig controller now able to create bootstrap datasecrete with nodeConfig that will enable using AL2023 images with LaunchTemplates.
Changes
Testing
Impact
These changes allow users to use launch templates with AL2023 images. Also to specify custom AMIs through launch templates while maintaining compatibility with CAPA's auto AMI lookup mechanism.
Related Issues
Fixes #5546
What type of PR is this?
/kind feature
What this PR does / why we need it:
Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close the issue(s) when PR gets merged):Fixes #
Special notes for your reviewer:
Checklist:
Release note: