In-place VM Hotplug (without live-migration) #46

iholder101 · 2025-04-21T11:18:03Z

VEP Metadata

Tracking issue: #45
SIG label: sig/compute

What this PR does

Recently, memory and CPU hotplug was added to KubeVirt. This allows users to add memory and CPU to a running VM on the fly.

When a VM gets hotplugged, the underlying virt-launcher pod's resources need to be modified accordingly. Traditionally in Kubernetes, pods are immutable. Once a pod is created, its resource requests and limits cannot be changed. Therefore, the hotplug feature was implemented by live-migrating the VM which would result in a new virt-launcher pod with the updated resources.

On the hotplug's original design proposal (that pre-dated the VEP process) it's written:

Implementation should be achievable today, with Kubernetes APIs that are at least in beta.
Unfortunately, at the time of writing, the Kubernetes vertical pod scaling API is only alpha.

Fortunately, the in-place pod resize feature was graduated to beta in Kubernetes 1.33. Therefore, Kubevirt should aim to move away from live-migrating the VM on hotplug and instead use the in-place pod resize feature.

Notes for reviewers

This is more of a tracker VEP for this change, as there is no API change.

kubevirt-bot · 2025-04-21T11:18:08Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign vladikr for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

iholder101 · 2025-04-21T11:32:00Z

FYI @vladikr @acardace @jean-edouard

iholder101 · 2025-04-21T11:32:49Z

/release-note-none

vladikr · 2025-04-21T12:01:41Z

@iholder101 can you please double check if this works with guaranteed resources? As far as I know, it doesn't.

iholder101 · 2025-04-21T12:43:29Z

@iholder101 can you please double check if this works with guaranteed resources? As far as I know, it doesn't.

Very good observation, thank you!
So it turns out that every QoS class pod can be resized, however this is not the case for dedicated cpus/memory [1]:

Resizing pods with static CPU & memory policy configured is out-of-scope for the beta release of in-place resize. If a pod is a guaranteed QOS on a node with a static CPU or memory policy configured, then the resize will be marked as infeasible.

This will be reconsidered post-beta as a future enhancement.

I will definitely try to dig into understanding where this limitation is coming from and emphasizing its importance to us.

If it will be decided that indeed dedicated-cpus are out of scope, we can fallback to the migration strategy for these specific VMs. I will update the VEP accordingly.

esotsal · 2025-04-23T06:48:17Z

/cc ( to follow up )

For resize question, please check PR

kubernetes/kubernetes#129719

( there is a screencast at PRs description with current proposal status ) , please note solution considers lowest threshold to be the CPU pinned during startup. Looking forward for your comments/review in that PR, your use case is very interesting.

kubevirt-bot · 2025-04-23T06:48:20Z

@esotsal: GitHub didn't allow me to request PR reviews from the following users: esotsal.

Note that only kubevirt members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

/cc

For resize question, please check PR

kubernetes/kubernetes#129719

( there is a screencast at PRs description with current proposal status ) , please note solution considers lowest threshold to be the CPU pinned during startup. Looking forward for your comments/review in that PR, your use case is very interesting.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

iholder101 · 2025-04-23T08:04:22Z

I will definitely try to dig into understanding where this limitation is coming from and emphasizing its importance to us.

If it will be decided that indeed dedicated-cpus are out of scope, we can fallback to the migration strategy for these specific VMs. I will update the VEP accordingly.

Updated the VEP accordingly.

iholder101 · 2025-04-23T08:05:05Z

/cc ( to follow up )

For resize question, please check PR

kubernetes/kubernetes#129719

( there is a screencast at PRs description with current proposal status ) , please note solution considers lowest threshold to be the CPU pinned during startup. Looking forward for your comments/review in that PR, your use case is very interesting.

Thank you very much for the pointer! I'll review the PR soon!

Barakmor1 · 2025-04-24T19:56:31Z

veps/sig-compute/45-in-place-hotplug/vep.md

+- [ ] Turn this feature on by default.
+


I think we should add GA Kubernetes In-Place resize as a requirement

My opinion is that beta features should (in most cases) be enabled by default, this is also Kubernetes' policy and I think it makes a lot of sense. It will enable to test and experiment a feature upstream before fully committing to it.

There's a related discussion about about this in kubevirt/kubevirt#14427, which aims to enable a path forward for beta features being on by default.

Yes, I understand that Beta features should be enabled by default. This is why I think it would be better to wait until this feature gate is GA in Kubernetes before making it Beta in KubeVirt. I'm not sure we should have default behavior that depends on Kubernetes Beta/Alpha feature gates. We have a similar requirement for the ImageVolume feature gate.

The general rule is that our feature gate will be 1 graduation phase below Kubernetes.
ImageVolumes are Beta on Kubernetes, therefore we allow it to become Alpha in Kubevirt.
Same goes for GA and Beta with the pod resize FG.

rthallisey · 2025-05-01T13:32:07Z

veps/sig-compute/45-in-place-hotplug/vep.md

+* Admins / namespace owners.
+
+## User Stories
+


As a user, in order to hotplug additional devices, I also need to hotplug more memory to support the new pass through devices.

Interesting, @rthallisey are we planning to support hot plug for host devices?
@iholder101, what resources are currently supported, or is it entirely open?

Thanks @rthallisey! I've added that!

@iholder101, what resources are currently supported, or is it entirely open?

@vladikr According to the feature's non-goals:

updating extended resources or any other resource types besides CPU, memory

So yeah, only memory and CPU resources are currently supported.

With PT devices, it's very difficult support to live migration - also kvm won't allow it. So avoiding live migrate will make hot plug of host devices possible. The tricky part to support this feature is you will often need to hot plug memory to be used only by the compute container, not given to the guest, as part of the device hotplug.

The tricky part to support this feature is you will often need to hot plug memory to be used only by the compute container, not given to the guest

That should be entirely possible!

How will you secure when reducing memory that you will avoid out of memory situations, given that it is not possible to get the exact memory usage from the container runtime ?

How will you secure when reducing memory that you will avoid out of memory situations, given that it is not possible to get the exact memory usage from the container runtime ?

This topic is still under discussions, since with cgroups v2 lowering memory limits will be accepted and cause OOM kills, and if k8s will do the validation it's bound to an inherit TOC/TOU race. More info can be found here: https://github.com/kubernetes/enhancements/tree/61abddca34caac56d22b7db48734b7040dc68b43/keps/sig-node/1287-in-place-update-pod-resources#memory-limit-decreases.

In any case, hot-unplug is not yet supported in Kubevirt in general. We can think of relaxing this limitation if the k8s feature will come up with a decent solution to this problem.

Signed-off-by: Itamar Holder <[email protected]>

kubevirt-bot · 2025-08-10T09:36:12Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

/lifecycle stale

kubevirt-bot · 2025-09-09T09:48:34Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

/lifecycle rotten

rthallisey · 2025-09-09T13:11:02Z

/remove-lifecycle rotten

kubevirt-bot requested review from lyarwood, vladikr and xpivarc April 21, 2025 11:18

kubevirt-bot added the size/L label Apr 21, 2025

iholder101 force-pushed the vep/in-place-hotplug branch from ff667aa to 88936ee Compare April 21, 2025 11:31

iholder101 marked this pull request as ready for review April 21, 2025 11:31

kubevirt-bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 21, 2025

kubevirt-bot added release-note-none Denotes a PR that doesn't merit a release note. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Apr 21, 2025

iholder101 force-pushed the vep/in-place-hotplug branch from 88936ee to 6db00f3 Compare April 23, 2025 07:59

iholder101 mentioned this pull request Apr 21, 2025

VEP 45: Move to using in-place pod resize for memory/cpu hotplug #45

Open

Barakmor1 reviewed Apr 24, 2025

View reviewed changes

iholder101 mentioned this pull request Apr 29, 2025

CRI-O and kubectl version mismatch on Kubernetes 1.33 kubevirt/kubevirtci#1431

Closed

rthallisey reviewed May 1, 2025

View reviewed changes

iholder101 force-pushed the vep/in-place-hotplug branch from 6db00f3 to bdccf80 Compare May 4, 2025 10:25

iholder101 added 2 commits May 4, 2025 13:25

in-place hotplug

060ac42

Signed-off-by: Itamar Holder <[email protected]>

add a note about dedicated CPU VMs

2d2c805

Signed-off-by: Itamar Holder <[email protected]>

iholder101 force-pushed the vep/in-place-hotplug branch from bdccf80 to 2d2c805 Compare May 4, 2025 10:25

kubevirt-bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 10, 2025

kubevirt-bot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Sep 9, 2025

kubevirt-bot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Sep 9, 2025

Barakmor1 mentioned this pull request Sep 9, 2025

Option to perform Live migration on the same VM kubevirt/kubevirt#15625

Open

In-place VM Hotplug (without live-migration) #46

Are you sure you want to change the base?

In-place VM Hotplug (without live-migration) #46

Uh oh!

Conversation

iholder101 commented Apr 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

VEP Metadata

What this PR does

Notes for reviewers

Uh oh!

kubevirt-bot commented Apr 21, 2025

Uh oh!

iholder101 commented Apr 21, 2025

Uh oh!

iholder101 commented Apr 21, 2025

Uh oh!

vladikr commented Apr 21, 2025

Uh oh!

iholder101 commented Apr 21, 2025

Uh oh!

esotsal commented Apr 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kubevirt-bot commented Apr 23, 2025

Uh oh!

iholder101 commented Apr 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

iholder101 commented Apr 23, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

iholder101 Apr 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

iholder101 Apr 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

iholder101 May 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kubevirt-bot commented Aug 10, 2025

Uh oh!

kubevirt-bot commented Sep 9, 2025

Uh oh!

rthallisey commented Sep 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

iholder101 commented Apr 21, 2025 •

edited

Loading

esotsal commented Apr 23, 2025 •

edited

Loading

iholder101 commented Apr 23, 2025 •

edited

Loading

iholder101 Apr 27, 2025 •

edited

Loading

iholder101 Apr 27, 2025 •

edited

Loading

iholder101 May 12, 2025 •

edited

Loading