Skip to content

Conversation

@Longchuanzheng
Copy link

@Longchuanzheng Longchuanzheng commented Jun 25, 2025

VEP Metadata

Tracking issue: #64
SIG label: /sig/compute

What this PR does

This proposal introduces a VM hibernation mechanism for KubeVirt, enabling users to stop and start virtual machines by saving and restoring their running memory state.

Special notes for your reviewer

@kubevirt-bot kubevirt-bot requested a review from lyarwood June 25, 2025 06:11
@kubevirt-bot
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign vladikr for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@kubevirt-bot kubevirt-bot added the dco-signoff: yes Indicates the PR's author has DCO signed all their commits. label Jun 25, 2025
@kubevirt-bot kubevirt-bot requested review from vladikr and xpivarc June 25, 2025 06:11

## User Stories

A user exec `virtctl pmsuspendToDisk vm.name` commend, the vm stopped with memory state saved, exec `virtctl start vm.name` to start the stopped vm.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not virtctl suspend $VM_NAME with additional flags to suspend to disk etc?

Avoids leaking the fact that we are using libvirt underneath through our API/commands etc.

@vladikr
Copy link
Member

vladikr commented Aug 20, 2025

@mhenriks could you please take a look?

@vladikr
Copy link
Member

vladikr commented Aug 21, 2025

I wonder what guarantees we would provide around this functionality?
How does the VM owner know that there is enough space left on the disk to save the memory state?

Also, have you considered a more gitops friendly approach?

My main concern here is that hibernation can potentially indefinitelly block system operations, drain for example. Since hibernation can take a very long time in some cases.
Perhaps we should think about a better timeout handling?

@mhenriks
Copy link
Member

@mansam may have some thoughts on this

I think the API should be gitops compatible (triggered by update to VM spec). Unlike pause, the VM will only get into this state via user request. A VM may get paused automatically by an I/O error for example. Also, unlike pause I see this as a VM only operation.

The big question is where to write the VM state and how that's configured? Can look at memory dump volume for a pattern. But I know @ShellyKa13 may be proposing a more generic "utility" volume type for uses such as this

@vladikr
Copy link
Member

vladikr commented Aug 21, 2025

@mansam may have some thoughts on this

I think the API should be gitops compatible (triggered by update to VM spec). Unlike pause, the VM will only get into this state via user request. A VM may get paused automatically by an I/O error for example. Also, unlike pause I see this as a VM only operation.

The big question is where to write the VM state and how that's configured? Can look at memory dump volume for a pattern. But I know @ShellyKa13 may be proposing a more generic "utility" volume type for uses such as this

Thank you!

I also like the direction of a utility volume or the larger idea of a disks and memory snapshot.

@Longchuanzheng
Copy link
Author

Hi, @vladikr @mhenriks Thanks for the discussion. Perhaps my question below is not very professional. When I mentioned this VEP, I only wanted to use the “dompmsuspend” interface. Is this interface not universal? From our previous discussion, it seems that we are discussing the use of the 'save' interface?

@mhenriks
Copy link
Member

mhenriks commented Sep 9, 2025

Hi, @vladikr @mhenriks Thanks for the discussion. Perhaps my question below is not very professional. When I mentioned this VEP, I only wanted to use the “dompmsuspend” interface. Is this interface not universal? From our previous discussion, it seems that we are discussing the use of the 'save' interface?

Good question! Yes, I was assuming "save" and not "dompmsuspend target disk". Nice thing about 'dompmsuspend' is that it doesn't require any additional storage. But it does require the guest agent. The 'save' function will work for any VM. I think I'd be open to supporting dompmsuspend with disk target. But we may want to do both eventually.

I think next step for this VEP is to design a declarative API (maybe just a new runstrategy) and map out the flow

@Longchuanzheng Longchuanzheng changed the title VEP 64: Stop vm with pmsuspend to disk VEP 64: vm Hibernation Sep 12, 2025
@Longchuanzheng
Copy link
Author

I wonder what guarantees we would provide around this functionality? How does the VM owner know that there is enough space left on the disk to save the memory state?

Also, have you considered a more gitops friendly approach?

My main concern here is that hibernation can potentially indefinitelly block system operations, drain for example. Since hibernation can take a very long time in some cases. Perhaps we should think about a better timeout handling?

In my opinion, the timeout mechanism here is somewhat similar to hot migration. I think it is enough to set a default timeout and make this time user-configurable.
As for the size of the PVC, my current idea is that the customer decides the size of the PVC himself. If the customer does not specify it, the size of the automatically created PVC will be the memory size of the virtual machine + 1G (of course this is a sloppy method, and I would be very grateful for any good suggestions).

@Longchuanzheng
Copy link
Author

Longchuanzheng commented Sep 12, 2025

Hi, @vladikr @mhenriks . I've updated the flow and interface design. The relevant process references dumpmemory and is just a rough outline. Since save and restore are a complete process, I've designed a lot of phases in vm HibernationStatuses.phase, while I'm not sure if this is appropriate.
I also think there are some details worth discussing:

  1. Should we delete the corresponding PVC after a successful restore?
  2. Can we start a dormant VM directly using the start method instead of the restore method? I've added this option to HibernationStrategy.

@Longchuanzheng Longchuanzheng marked this pull request as draft September 13, 2025 12:44
@kubevirt-bot kubevirt-bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Sep 13, 2025
Signed-off-by: zhuanlan <[email protected]>
@Longchuanzheng Longchuanzheng marked this pull request as ready for review September 15, 2025 11:17
@kubevirt-bot kubevirt-bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Sep 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dco-signoff: yes Indicates the PR's author has DCO signed all their commits. size/L

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants