-
Notifications
You must be signed in to change notification settings - Fork 155
Cleanup intercept device shutdown #2140
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: release/2505
Are you sure you want to change the base?
Cleanup intercept device shutdown #2140
Conversation
1. Tie lifetime of the device to the vmbus parent a. Remove existing channel used to control lifetime. 2. If the vmbus relay parent is dropped, there will be no opportunity to close the GPADL associated with the special ring buffer memory pages. In this case, note the error and leak the device. Trying to change the VTL permissions on these memory pages while the host still has them open will result in an error. 3. Allow intercepted devices to be revoked and reoffered from the host.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR refactors the shutdown handling of VMBus relay intercept devices by tying their lifetime to the VMBus parent instead of using separate channel-based lifetime control. The main purpose is to handle device cleanup more gracefully when the VMBus relay parent is dropped, particularly when GPADL teardown fails.
- Removes the manual lifetime control channel and integrates device lifecycle with the VMBus parent
- Adds proper error handling for GPADL teardown failures with memory leak detection
- Enables intercepted devices to be properly revoked and reoffered by the host
Reviewed Changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
File | Description |
---|---|
vm/devices/vmbus/vmbus_relay_intercept_device/src/lib.rs | Main refactor removing channel-based lifetime control, adding GPADL teardown error handling, and improving device revoke logic |
vm/devices/vmbus/vmbus_relay/src/lib.rs | Reorders channel existence check to allow intercepted devices to be reoffered |
openhcl/underhill_core/src/worker.rs | Removes manual tracking of intercept device lifetime senders |
openhcl/underhill_core/src/dispatch/mod.rs | Removes unused _vmbus_intercept_devices field from LoadedVm struct |
// The VTL pages were not freed. This can occur if an | ||
// error is hit that drops the vmbus parent tasks. Just | ||
// pend here and let the outer error cause the VM to | ||
// exit. | ||
pending::<()>().await; |
Copilot
AI
Oct 10, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using pending::<()>().await
to intentionally hang the task is an unusual pattern. Consider adding a more explicit comment explaining why this infinite wait is the desired behavior, or using a more conventional approach like returning an error that would cause the VM to exit.
// The VTL pages were not freed. This can occur if an | |
// error is hit that drops the vmbus parent tasks. Just | |
// pend here and let the outer error cause the VM to | |
// exit. | |
pending::<()>().await; | |
// The VTL pages were not freed. This is an unrecoverable error. | |
// Return an error to trigger VM exit in a controlled manner. | |
anyhow::bail!("VTL pages were not freed; unrecoverable error in vmbus relay device"); |
Copilot uses AI. Check for mistakes.
/// Responds to the channel being revoked by the host. | ||
async fn handle_revoke(&mut self, state: &mut SimpleVmbusClientDeviceTaskState) { | ||
let Some(offer) = state.offer.take() else { | ||
let Some(offer) = state.offer.as_ref() else { |
Copilot
AI
Oct 10, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change from state.offer.take()
to state.offer.as_ref()
means the offer is no longer consumed here, but it's still taken at line 538. This could lead to unexpected behavior if handle_revoke
is called multiple times, as subsequent calls would still find an offer present.
Copilot uses AI. Check for mistakes.
CP of #2117