Skip to content

Conversation

@LiangquanLi930
Copy link
Contributor

@LiangquanLi930 LiangquanLi930 commented Oct 15, 2025

✨ Implement autoscaling from zero by auto-populating AWSMachineTemplate capacity and NodeInfo

What type of PR is this?
/kind feature

What this PR does / why we need it:
This PR implements the Cluster API autoscaling from zero proposal for CAPA by adding a controller that automatically populates AWSMachineTemplate.Status.Capacity with instance type information.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #

Special notes for your reviewer:

Checklist:

  • squashed commits
  • includes documentation
  • includes emoji in title
  • adds unit tests
  • adds or updates e2e tests

Release note:

Implement autoscaling from zero by auto-populating AWSMachineTemplate capacity

@k8s-ci-robot k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Oct 15, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign richardcase for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Oct 15, 2025
@k8s-ci-robot
Copy link
Contributor

Hi @LiangquanLi930. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@LiangquanLi930 LiangquanLi930 changed the title ✨ Implement autoscaling from zero by auto-populating AWSMachineTemplate capacity WIP ✨ Implement autoscaling from zero by auto-populating AWSMachineTemplate capacity Oct 15, 2025
@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 15, 2025
@elmiko
Copy link

elmiko commented Oct 15, 2025

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Oct 15, 2025
@LiangquanLi930 LiangquanLi930 force-pushed the opt-in-autoscaling-from-zero branch 7 times, most recently from f1ee365 to 3be8f4d Compare October 20, 2025 11:21
@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Oct 20, 2025
@LiangquanLi930 LiangquanLi930 force-pushed the opt-in-autoscaling-from-zero branch 2 times, most recently from 01be987 to 915f55b Compare October 20, 2025 12:57
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Oct 20, 2025
@LiangquanLi930 LiangquanLi930 force-pushed the opt-in-autoscaling-from-zero branch from 915f55b to b3850d1 Compare October 20, 2025 14:42
@k8s-ci-robot k8s-ci-robot removed the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Oct 20, 2025
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 30, 2025
@LiangquanLi930 LiangquanLi930 force-pushed the opt-in-autoscaling-from-zero branch 3 times, most recently from b0eae44 to 6ab7428 Compare October 31, 2025 06:06
@k8s-ci-robot k8s-ci-robot added do-not-merge/contains-merge-commits and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Oct 31, 2025
@LiangquanLi930 LiangquanLi930 force-pushed the opt-in-autoscaling-from-zero branch from 6df5996 to 5b5a139 Compare October 31, 2025 07:26
@LiangquanLi930 LiangquanLi930 force-pushed the opt-in-autoscaling-from-zero branch 2 times, most recently from af3844f to 8371031 Compare November 3, 2025 08:43
@LiangquanLi930 LiangquanLi930 changed the title ✨ feat: Implement autoscaling from zero by auto-populating AWSMachineTemplate capacity WIP ✨ feat: Implement autoscaling from zero by auto-populating AWSMachineTemplate capacity Nov 3, 2025
@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 3, 2025
@LiangquanLi930 LiangquanLi930 force-pushed the opt-in-autoscaling-from-zero branch 2 times, most recently from 56615b5 to 3bf0acb Compare November 3, 2025 13:23
@LiangquanLi930
Copy link
Contributor Author

/test pull-cluster-api-provider-aws-e2e

@k8s-ci-robot
Copy link
Contributor

@LiangquanLi930: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-cluster-api-provider-aws-e2e 3bf0acb link false /test pull-cluster-api-provider-aws-e2e

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@LiangquanLi930 LiangquanLi930 force-pushed the opt-in-autoscaling-from-zero branch 3 times, most recently from ca67a57 to 2090092 Compare November 4, 2025 06:45
@LiangquanLi930 LiangquanLi930 changed the title WIP ✨ feat: Implement autoscaling from zero by auto-populating AWSMachineTemplate capacity WIP ✨ feat: Implement autoscaling from zero by auto-populating AWSMachineTemplate Capacity and NodeInfo Nov 4, 2025
@LiangquanLi930 LiangquanLi930 changed the title WIP ✨ feat: Implement autoscaling from zero by auto-populating AWSMachineTemplate Capacity and NodeInfo ✨ feat: Implement autoscaling from zero by auto-populating AWSMachineTemplate capacity Nov 4, 2025
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 4, 2025
@LiangquanLi930
Copy link
Contributor Author

@elmiko @nrb @chrischdi Thanks for your review! When you have time, could you help to review again? Thanks!

Comment on lines +207 to +205
if info.MemoryInfo != nil && info.MemoryInfo.SizeInMiB != nil {
memoryBytes := *info.MemoryInfo.SizeInMiB * 1024 * 1024
resourceList[corev1.ResourceMemory] = *resource.NewQuantity(memoryBytes, resource.BinarySI)
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure whether my handling of this part is correct, and I’m also uncertain about which unit I should be using.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the machine-api annotations used a calculation of MiB for its value. for the cluster-api implementation we should simply be using a Quantity for this value, no need to convert to MiB.

you can see an example in the cluster-api autoscaler docs.

i /think/ the logic you have here is correct though, but i don't think we need to do the conversion. unless we need it for the Quantity.

Comment on lines 87 to 91
// Skip if capacity and nodeInfo are already set
if len(awsMachineTemplate.Status.Capacity) > 0 && awsMachineTemplate.Status.NodeInfo != nil {
return ctrl.Result{}, nil
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Skip if capacity and nodeInfo are already set
if len(awsMachineTemplate.Status.Capacity) > 0 && awsMachineTemplate.Status.NodeInfo != nil {
return ctrl.Result{}, nil
}

Either drop the skip, or do not skip if template.Spec.Template.Spec.AMI.ID is empty.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

@LiangquanLi930 LiangquanLi930 force-pushed the opt-in-autoscaling-from-zero branch from 2090092 to b0477c1 Compare November 4, 2025 14:17
Copy link

@elmiko elmiko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think this is looking good, although i'm not sure we need to do the conversion for the memory value.

Comment on lines +207 to +205
if info.MemoryInfo != nil && info.MemoryInfo.SizeInMiB != nil {
memoryBytes := *info.MemoryInfo.SizeInMiB * 1024 * 1024
resourceList[corev1.ResourceMemory] = *resource.NewQuantity(memoryBytes, resource.BinarySI)
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the machine-api annotations used a calculation of MiB for its value. for the cluster-api implementation we should simply be using a Quantity for this value, no need to convert to MiB.

you can see an example in the cluster-api autoscaler docs.

i /think/ the logic you have here is correct though, but i don't think we need to do the conversion. unless we need it for the Quantity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. needs-priority ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants