Skip to content

Conversation

@iholder101
Copy link
Contributor

@iholder101 iholder101 commented Apr 21, 2025

VEP Metadata

Tracking issue: #45
SIG label: sig/compute

What this PR does

Recently, memory and CPU hotplug was added to KubeVirt. This allows users to add memory and CPU to a running VM on the fly.

When a VM gets hotplugged, the underlying virt-launcher pod's resources need to be modified accordingly. Traditionally in Kubernetes, pods are immutable. Once a pod is created, its resource requests and limits cannot be changed. Therefore, the hotplug feature was implemented by live-migrating the VM which would result in a new virt-launcher pod with the updated resources.

On the hotplug's original design proposal (that pre-dated the VEP process) it's written:

Implementation should be achievable today, with Kubernetes APIs that are at least in beta.
Unfortunately, at the time of writing, the Kubernetes vertical pod scaling API is only alpha.

Fortunately, the in-place pod resize feature was graduated to beta in Kubernetes 1.33. Therefore, Kubevirt should aim to move away from live-migrating the VM on hotplug and instead use the in-place pod resize feature.

Notes for reviewers

This is more of a tracker VEP for this change, as there is no API change.

@kubevirt-bot kubevirt-bot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Apr 21, 2025
@kubevirt-bot
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign vladikr for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@iholder101 iholder101 force-pushed the vep/in-place-hotplug branch from ff667aa to 88936ee Compare April 21, 2025 11:31
@iholder101 iholder101 marked this pull request as ready for review April 21, 2025 11:31
@kubevirt-bot kubevirt-bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 21, 2025
@iholder101
Copy link
Contributor Author

FYI @vladikr @acardace @jean-edouard

@iholder101
Copy link
Contributor Author

/release-note-none

@kubevirt-bot kubevirt-bot added release-note-none Denotes a PR that doesn't merit a release note. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Apr 21, 2025
@vladikr
Copy link
Member

vladikr commented Apr 21, 2025

@iholder101 can you please double check if this works with guaranteed resources? As far as I know, it doesn't.

@iholder101
Copy link
Contributor Author

@iholder101 can you please double check if this works with guaranteed resources? As far as I know, it doesn't.

Very good observation, thank you!
So it turns out that every QoS class pod can be resized, however this is not the case for dedicated cpus/memory [1]:

Resizing pods with static CPU & memory policy configured is out-of-scope for the beta release of in-place resize. If a pod is a guaranteed QOS on a node with a static CPU or memory policy configured, then the resize will be marked as infeasible.

This will be reconsidered post-beta as a future enhancement.

I will definitely try to dig into understanding where this limitation is coming from and emphasizing its importance to us.

If it will be decided that indeed dedicated-cpus are out of scope, we can fallback to the migration strategy for these specific VMs. I will update the VEP accordingly.

@esotsal
Copy link

esotsal commented Apr 23, 2025

/cc ( to follow up )

For resize question, please check PR

kubernetes/kubernetes#129719

( there is a screencast at PRs description with current proposal status ) , please note solution considers lowest threshold to be the CPU pinned during startup. Looking forward for your comments/review in that PR, your use case is very interesting.

@kubevirt-bot
Copy link

@esotsal: GitHub didn't allow me to request PR reviews from the following users: esotsal.

Note that only kubevirt members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

/cc

For resize question, please check PR

kubernetes/kubernetes#129719

( there is a screencast at PRs description with current proposal status ) , please note solution considers lowest threshold to be the CPU pinned during startup. Looking forward for your comments/review in that PR, your use case is very interesting.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@iholder101 iholder101 force-pushed the vep/in-place-hotplug branch from 88936ee to 6db00f3 Compare April 23, 2025 07:59
@iholder101
Copy link
Contributor Author

iholder101 commented Apr 23, 2025

I will definitely try to dig into understanding where this limitation is coming from and emphasizing its importance to us.

If it will be decided that indeed dedicated-cpus are out of scope, we can fallback to the migration strategy for these specific VMs. I will update the VEP accordingly.

Updated the VEP accordingly.

@iholder101
Copy link
Contributor Author

/cc ( to follow up )

For resize question, please check PR

kubernetes/kubernetes#129719

( there is a screencast at PRs description with current proposal status ) , please note solution considers lowest threshold to be the CPU pinned during startup. Looking forward for your comments/review in that PR, your use case is very interesting.

Thank you very much for the pointer! I'll review the PR soon!

Comment on lines +122 to +125
- [ ] Turn this feature on by default.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should add GA Kubernetes In-Place resize as a requirement

Copy link
Contributor Author

@iholder101 iholder101 Apr 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My opinion is that beta features should (in most cases) be enabled by default, this is also Kubernetes' policy and I think it makes a lot of sense. It will enable to test and experiment a feature upstream before fully committing to it.

There's a related discussion about about this in kubevirt/kubevirt#14427, which aims to enable a path forward for beta features being on by default.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I understand that Beta features should be enabled by default. This is why I think it would be better to wait until this feature gate is GA in Kubernetes before making it Beta in KubeVirt. I'm not sure we should have default behavior that depends on Kubernetes Beta/Alpha feature gates. We have a similar requirement for the ImageVolume feature gate.

Copy link
Contributor Author

@iholder101 iholder101 Apr 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The general rule is that our feature gate will be 1 graduation phase below Kubernetes.
ImageVolumes are Beta on Kubernetes, therefore we allow it to become Alpha in Kubevirt.
Same goes for GA and Beta with the pod resize FG.

* Admins / namespace owners.

## User Stories

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a user, in order to hotplug additional devices, I also need to hotplug more memory to support the new pass through devices.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, @rthallisey are we planning to support hot plug for host devices?
@iholder101, what resources are currently supported, or is it entirely open?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @rthallisey! I've added that!

@iholder101, what resources are currently supported, or is it entirely open?

@vladikr According to the feature's non-goals:

updating extended resources or any other resource types besides CPU, memory

So yeah, only memory and CPU resources are currently supported.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With PT devices, it's very difficult support to live migration - also kvm won't allow it. So avoiding live migrate will make hot plug of host devices possible. The tricky part to support this feature is you will often need to hot plug memory to be used only by the compute container, not given to the guest, as part of the device hotplug.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tricky part to support this feature is you will often need to hot plug memory to be used only by the compute container, not given to the guest

That should be entirely possible!

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How will you secure when reducing memory that you will avoid out of memory situations, given that it is not possible to get the exact memory usage from the container runtime ?

Copy link
Contributor Author

@iholder101 iholder101 May 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How will you secure when reducing memory that you will avoid out of memory situations, given that it is not possible to get the exact memory usage from the container runtime ?

This topic is still under discussions, since with cgroups v2 lowering memory limits will be accepted and cause OOM kills, and if k8s will do the validation it's bound to an inherit TOC/TOU race. More info can be found here: https://github.com/kubernetes/enhancements/tree/61abddca34caac56d22b7db48734b7040dc68b43/keps/sig-node/1287-in-place-update-pod-resources#memory-limit-decreases.

In any case, hot-unplug is not yet supported in Kubevirt in general. We can think of relaxing this limitation if the k8s feature will come up with a decent solution to this problem.

@iholder101 iholder101 force-pushed the vep/in-place-hotplug branch from 6db00f3 to bdccf80 Compare May 4, 2025 10:25
iholder101 added 2 commits May 4, 2025 13:25
Signed-off-by: Itamar Holder <[email protected]>
@iholder101 iholder101 force-pushed the vep/in-place-hotplug branch from bdccf80 to 2d2c805 Compare May 4, 2025 10:25
@kubevirt-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@kubevirt-bot kubevirt-bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 10, 2025
@kubevirt-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

/lifecycle rotten

@kubevirt-bot kubevirt-bot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Sep 9, 2025
@rthallisey
Copy link

/remove-lifecycle rotten

@kubevirt-bot kubevirt-bot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Sep 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dco-signoff: yes Indicates the PR's author has DCO signed all their commits. release-note-none Denotes a PR that doesn't merit a release note. size/L

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants