Skip to content

Conversation

tjones60
Copy link
Contributor

@tjones60 tjones60 commented Oct 14, 2025

Detect the Hyper-V worker process crash event (MSVM_VMMS_VM_TERMINATE_ERROR) so that the VM does not just continue running until it times out. When this event is detected, log the last 5 seconds worth of unfiltered Hyper-V logs to see if there were any interesting logs that did not contain the VMID. Also refactors flush_logs to allow for incremental flushing in case it doesn't get called during teardown.

Motivated by this failure: Petri test results

@tjones60 tjones60 requested a review from a team as a code owner October 14, 2025 21:47
@Copilot Copilot AI review requested due to automatic review settings October 14, 2025 21:47
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR enhances Petri's Hyper-V VM monitoring by detecting and handling worker process crashes. It adds crash event detection for the MSVM_VMMS_VM_TERMINATE_ERROR event and improves logging by capturing unfiltered Hyper-V logs when crashes occur.

Key changes:

  • Added detection of Hyper-V worker process crash events with immediate logging of last 30 seconds of unfiltered logs
  • Refactored log flushing to support incremental updates and prevent log loss during teardown
  • Modified several methods to accept mutable references to enable periodic log flushing

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
petri/src/vm/hyperv/vm.rs Added crash detection logic, refactored flush_logs for incremental operation, and updated method signatures for mutability
petri/src/vm/hyperv/powershell.rs Modified hyperv_event_logs to accept optional VMID and added the new crash event constant

mattkur
mattkur previously approved these changes Oct 15, 2025
Copy link
Contributor

@mattkur mattkur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with the minor nits from copilot (e.g. explaining why you set last flush time to 1ms greater than the created time of the current event ...), but looks good to me. Thanks for making this change!

I also agree with the TODO to have this run as an async task, just like other log streams. It's odd to not have the Hyper-V logs mixed in with other test and VM logs.

@tjones60
Copy link
Contributor Author

I agree with the minor nits from copilot (e.g. explaining why you set last flush time to 1ms greater than the created time of the current event ...), but looks good to me. Thanks for making this change!

I also agree with the TODO to have this run as an async task, just like other log streams. It's odd to not have the Hyper-V logs mixed in with other test and VM logs.

The logs do get mixed in today when viewing the test results, they just get added all at the end. I don't know of a good way to stream them but getting them more often would be nice. It would require a bit of refactoring to share some data between threads, and I didn't want to do that right now.

@tjones60 tjones60 merged commit 99042b3 into microsoft:main Oct 15, 2025
50 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants