-
Notifications
You must be signed in to change notification settings - Fork 138
netvsp: handle RNDIS packet filter OID for stopping Rx #1926
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
erfrimod
wants to merge
102
commits into
microsoft:main
from
erfrimod:erfrimod/netvsp-rndis-packet-filter-2505
Closed
netvsp: handle RNDIS packet filter OID for stopping Rx #1926
erfrimod
wants to merge
102
commits into
microsoft:main
from
erfrimod:erfrimod/netvsp-rndis-packet-filter-2505
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Update mirroring logic to mirror the 2505 release branch. Co-authored-by: Ben Hillis <[email protected]>
Co-authored-by: Ben Hillis <[email protected]>
This branch will be on 1.86.0 for the forseeable future. As per our internal support policy this is an even version.
…#1403) We support these regs on all backends, so no reason not to. Cherry-pick of microsoft#1308
…t#1387) (microsoft#1393) The added tests use [loom](https://docs.rs/loom/0.7.2/loom/) to ensure that all possible order of operations work correctly. This required tweaking the orderings we use. Cherry-pick of microsoft#1387
We are trying to write a non-JSON-formatted value to track that an env-var-sourced variable is secret. Fix this by writing `null`. Also add in some diagnostics and improve the in-memory variable representation to avoid so many allocations. Co-authored-by: John Starks <[email protected]>
…1402) (microsoft#1406) After long discussions we have decided to flip the default of our tracing filter, and to allow untagged tracing statements by default. We believe that the risks and costs of being unable to debug incidents in production are too high, and that we can manually scrub our tracing statements to ensure that no sensitive information is leaked. Cherry-pick of microsoft#1402. Part of microsoft#852.
…icrosoft#1383) (microsoft#1431) When guest memory page protections are changed (e.g., pages are transitioned between shared and private), we need to flush concurrent accesses to those pages by the paravisor before updating the page state in hardware. Otherwise, faults or cross-VTL data leaks may occur. Add this synchronization as cheaply as we can: add a simple RCU (Read-Copy-Update) mechanism that allows threads accessing guest memory to cheaply synchronize with threads mutating the page access bitmaps. Use the membarrier() syscall on Linux to allow readers to operate without memory barriers, shifting the expensive to the (infrequent) bitmap update paths. Only enable this mechanism in OpenHCL, since other environments do not rely on bitmap-based guest memory access controls. Cherry-pick of microsoft#1383 Co-authored-by: John Starks <[email protected]>
…dresses (microsoft#1340) (microsoft#1434) On TDX, we see some cases where the guest attempts to access an address with an incorrect shared bit. It's unclear if this is an issue in OpenHCL, the guest, or the host, but fix OpenHCL crashing with an emulation failure due to a `GuestMemory` access failure, and instead inject a machine check into the guest. For addresses outside of mmio and ram, continue to emulate but log that the guest did something strange. In the future, we may also inject a machine check on that path. Tested via a uefi app that attempts to access a shared page at a private gpa (see https://github.com/chris-oo/openvmm/blob/uefi-tmk-write-to-shared/tmk/tmk_launch/src/main.rs) with the following crash: ``` [kmsg]: [3.058450] virt_mshv_vtl::processor::tdx: WARN guest accessed inaccessible gpa, injecting MC gpa=0x666d9000 is_shared=false [kmsg]: [3.061137] virt_mshv_vtl::processor: WARN Guest has reported system crash crash=VtlCrash { vp_index: VpIndex(0), last_vtl: Vtl0, control: GuestCrashCtl { pre_os_id: 0, no_crash_dump: false, crash_message: true, crash_notify: true }, parameters: [12, 0, 0, 6790a820, 4d7] } [kmsg]: [3.061758] virt_mshv_vtl::processor: WARN Guest has reported a system crash message "!!!! X64 Exception Type - 12(#MC - Machine-Check) CPU Apic ID - 00000000 !!!!<5c>r<5c>nRIP - 00000000666DB030, CS - 0000000000000038, RFLAGS - 0000000000010202<5c>r<5c>nRAX - 00000000666D9000, RCX - 0000000000000042, RDX - 3333333333333333<5c>r<5c>nRBX - 0000000066E00018, RSP - 0000000033D99150, RBP - 0000000033D99190<5c>r<5c>nRSI - 0000000066E54718, RDI - 0000000033DB8160<5c>r<5c>nR8 - 0000000000000000, R9 - 00000000666ED508, R10 - 00000000666ED5EE<5c>r<5c>nR11 - 000000000000000A, R12 - 0000000000000000, R13 - 0000000000000000<5c>r<5c>nR14 - 0000000033D99AA0, R15 - 0000000033D99A98<5c>r<5c>nDS - 0000000000000030, ES - 0000000000000030, FS - 0000000000000030<5c>r<5c>nGS - 0000000000000030, SS - 0000000000000030<5c>r<5c>nCR0 - 0000000080010073, CR2 - 0000000000000000, CR3 - 0000000033801000<5c>r<5c>nCR4 - 0000000000000668, CR8 - 0000000000000000<5c>r<5c>nDR0 - 0000000000000000, DR1 - 0000000000000000, DR2 - 0000000000000000<5c>r<5c>nDR3 - ``` microsoft#1426 tracks implementing this for SNP. Backport of microsoft#1340
…rosoft#1441) The register page is not valid until it has been mapped and the hypervisor says it's valid. However, the in-memory contents of the page before it is mapped could be non-zero, especially across a servicing operation. This can cause the VMM to think the register page contents are valid, causing register corruption. Fix this by explicitly clearing the valid flag in the register page during sidecar startup. Also zero a few more potentially stale structures to avoid potential bugs. Cherry-pick of microsoft#1440 Co-authored-by: John Starks <[email protected]>
) (microsoft#1448) Update the kernel to the latest available version which contains this sidecar fixes: microsoft/OHCL-Linux-Kernel#82 microsoft/OHCL-Linux-Kernel#83 Cherry-pick of microsoft#1442 Co-authored-by: Ben Hillis <[email protected]>
This change updates ms-tpm-20-ref-rs to a version that includes a TPM backing store size fix.
…oft#1465) Allow the memory backing to provide an error kind, which the virt backends will later use to determine whether to attempt emulation, inject a machine fault, resume the VP, or terminate the VM. Cherry-pick of microsoft#1430 Co-authored-by: John Starks <[email protected]>
…icrosoft#1458) Filter panic messages (printed to /dev/ttyprintk) out to a separate target and raise their effective trace level from verbose to critical. Cherry-pick of microsoft#1455 Co-authored-by: John Starks <[email protected]>
…ng branch (microsoft#1466) Now that we are in ask mode for the upcoming release, this change updates the mirroring workflow to submit OSS changes to a staging branch instead of directly to the release branch. This means that we will need to periodically merge staging into release (after getting approval). This is the same flow we did for the 2411 release. Co-authored-by: Ben Hillis <[email protected]>
…crosoft#1408) (microsoft#1460) Also includes some drive-by cleanups where I happened to see them. Areas I did not audit because they are not relevant to CVMs: - trace and debug level statements - Non-CVM workers (debug & VNC) - Test only code (petri, vmm_tests, tmk*) - ARM-specific code - Non-CVM virt backends (including virt_mshv_vtl/mshv) - Host-only code (openvmm, GED, igvmfilegen, etc) - Gen 1 devices (vga, chipset, etc) - VirtIO Areas that still need auditing by owners and area experts: - Mesh (@jstarks) - Networking (vm/devices/net/* & underhill_core/netvsp) (networking team) - Storage (vm/devices/storage/* & underhill_core/nvme_manager) (storage team) - VMBus (vm/devices/vmbus/*) (@SvenGroot) - VMGS (vm/vmgs/*) (@tjones60) Part of microsoft#852 Cherry-pick of microsoft#1408
…ft#1470) The Linux kernel serializes CPU hotplug. If multiple sidecar VPs need to be onlined into OpenVMM simultaneously, they will all stop running the guest while associated Linux threads call into the Linux kernel to online the CPU (which will block on the CPU hotplug lock or whatever). This means the average blackout time for a VP that's onlined early in boot is linear in the number of early-onlined VPs. And thanks to typical device configurations, this is usually linear in the total number of VPs. This is a performance problem. To avoid this, explicitly serialize VP online _before_ the target VP is stopped. This allows the VP to continue running the guest until it reaches the front of the online queue. This reduces the average blackout time to just the time to online one CPU, meaning this solution should scale to any number of VPs. Cherry-pick of microsoft#1443 Co-authored-by: John Starks <[email protected]>
…1477) The current size check is failing in the release/2505 branch because it's currently comparing against main and the branches have diverged. Co-authored-by: Ben Hillis <[email protected]>
…icrosoft#1474) …wide (microsoft#1370) In preparation for VTL 1 memory support for CVMs, make the shared/encrypted bitmap tracking available on a partition-level, rather than in the GuestMemoryMapping (which ends up being per-VTL). Also includes some refactoring to isolate out the bitmap logic so that it can be reused for vtl protection bitmaps. Tested: SNP +/- guest vsm boots
…1462) (microsoft#1468) tokio-rs/tracing#2519 can cause the tracing crate to mistakenly drop logs emitted after calling the `enabled!` macro. Today we only call that macro in two places; this PR removes one of them, the second is coming in another PR. We currently are using our dynamic tracing filtering to filter out what system-level messages get sent to the host, in addition to its normal purposes. After much investigation and thought I've come to the conclusion that there is no good way to work around this bug while maintaining dynamic configuration. So instead just statically code these levels. Realistically in terms of what these messages can help us diagnose this is almost certainly fine. Cherry-pick of microsoft#1462
…icrosoft#1484) WHP has a bug around partition scrub on AMD nested hosts which makes servicing tests flakey. Skip them for now. This is a targeted PR to just make these tests not flakey. I'd like to instead rework how we decide what petri tests to run based on host capabilities, but take this stopgap first. Cherry pick of microsoft#1480
…rosoft#1486) This removes functionality added in microsoft@7278a20 to avoid hitting tokio-rs/tracing#2519. While the functionality is nice to have, it is not so important as to be worth potentially dropping events, and there is no performant way to implement it that can avoid this bug. Then ban the tracing::enabled macro codebase-wide. Cherry-pick of microsoft#1469
Changes flowey config for mu_msvm to use v25.1.3 release. mu_msvm release here: https://github.com/microsoft/mu_msvm/releases/tag/v25.1.3
(This is a backport PR) This PR adds EFI Diagnostics, which is a service used to parse UEFI diagnostics data from an in-memory buffer and send it to our tracing facilities. The UEFI firmware will write the GPA of the advanced logger buffer to an Io port intercept called `SET_EFI_DIAGNOSTICS_GPA`. The diagnostics service is responsible for reading guest memory at the specified GPA and parsing the data. This gets triggered when the UEFI firmware writes to an Io port intercept called `PROCESS_EFI_DIAGNOSTICS`. The `PROCESS_EFI_DIAGNOSTICS` UefiCommand gets triggered by the following conditions: - UEFI encounters a failure (guest driven via `PROCESS_EFI_DIAGNOSTICS`) - UEFI fails to boot any device (guest driven via `PROCESS_EFI_DIAGNOSTICS`) - UEFI reaches exit boot services The simplest way to test this is to run: ``` cargo run -- --uefi ```
microsoft#1487) When VTL 2 accesses VTL 0 memory on behalf of VTL 0, it needs to be able to check whether VTL 1 has restricted access to the memory. This change introduces tracking of VTL 1 permissions using bitmaps and adds some of the enforcement of these permissions. Tested: SNP +/- guest VSM boots TVM and TDX VMs also boot
…microsoft#1532) microsoft#1480 missed this one. Cherry pick of microsoft#1530.
) Mitigate TPM corruption due to previous VMs having a 16K TPM NVRAM reported as 32K, and commited bad state to the vTPM NVRAM. This involves the following: For every 16K TPM NVRAM, walk the dynamic section and truncate the last header if it points to data past the end. Additionally, run the following mitigation steps for 16K NVRAM: 1. Check for a 4K bytes AK cert nv index. 1. This VM needs to be mitigated. 2. Undefine the 4K AK cert to save space. 3. Attempt to write a 1 byte mitigation platform marker, which can fail. 4. Attempt to write a just-sized platform ak cert. 2. Else, check for a mitigated marker or no platform cert 1. Log that this vm is mitigated, and if an ak cert is present or not 3. Else, check for an owner cert 1. Log that this VM is in the expected state Co-authored-by: Chris Oo <[email protected]>
…rosoft#1483) (microsoft#1516) Uses RCU implementation to synchronize reads and writes of the VTL 1 permission bitmaps. Tested: SNP +/- guest vsm boots
microsoft#1536) …mulation (microsoft#1513) Adds two guest memory objects, backed by kernel/usermode execute VTL 1 permission bitmaps (for cvms), to be used on the emulation path to enforce VTL 1 protections when accessing instructions during instruction emulation. Tested: SNP +/- guest vsm boots TVM boots
…crosoft#1535) (microsoft#1553) Nobody calls it, and they should be going through uh_mem instead anyways. We probably need to explore unifying traits between virt and uh_mem in the future, but that can wait. Cherry-pick of microsoft#1535
Update nextest to the latest release. This fixes some bugs it seems in nextest around detecting leaks on windows, which was causing test failures. Cherry pick of microsoft#1790. Fixes microsoft#1782
…microsoft#1797) On AArch64, the Performance Monitor Unit (PMU) is supposed to be supported by every platform. Add this information to the vm's topology, and correctly report a configured value in the MADT via the GICC structure. Onboard a test to verify that Linux sees the correct interrupts. Hyper-V and WHP support a hardcoded value of 0x17, so for now hardcode that value on those platforms. A follow up change will correctly report this value via a `pmu` device tree node, but take this more minimal change to backport to the release/2505 branch. Although macOS also supports this interrupt with the same value of 0x17, enabling that did not cause Linux to work as expected, so more investigation there is needed. This fixes xperf on Windows and perf on Linux which rely on this being present. microsoft#1775 Backport of microsoft#1776
Probably got accidentally added during a merge conflict.
…onnecting (microsoft#1817) This change fixes an issue where, if all channels are already reset when a disconnect happens, the server would not invoke `Notifier::modify_connection`. This means that the state such as the interrupt page and monitor pages is not reset, and in the case of OpenHCL the relay is not notified of the disconnect (which can leave host state intact, including monitor pages if MNF is handled by the host). This caused an issue where Linux would occasionally crash during resume from hibernate. When resuming, Linux makes two connections, first to read the memory image, and then to resume normal operations, both using MNF. When the first connection unloads, the overlay pages for the monitor pages were not removed until the reconnect, leading to memory corruption when Linux proceeds to use these pages as normal memory. This change also adds some tests ensuring the notifier is invoked for an unload with open channels, without open channels, and a forced disconnect when a new InitiateContact message is received. Cherry-picked from microsoft#1809 Co-authored-by: Copilot <[email protected]>
) (microsoft#1773) On CVM platforms, this self test results in logs to verify that the various bitmap are preventing accesses as expected. Log that we're doing this self test. Cherry pick of microsoft#1772
…icrosoft#1810) Backport of: microsoft#1755 This PR focuses on allowing EfiDiagnostics to force flush through InspectMut. We will only print EfiDiagnostics to our tracing facilities **ONLY IF WE WRITE** to the `process_diagnostics` field in `UefiDevice`. To trigger this, use inspect like so: ``` openvmm> inspect -u 1 vm/uefi/process_diagnostics ```
…t#1835) This change cherry-picks microsoft#1829, and its dependent change microsoft#1815. --------- Co-authored-by: Copilot <[email protected]>
…soft#1838) This should help us catch bad memory setups earlier. Note: we're still debugging what causes failures here, but the sooner we can catch them the better. Also includes some additional tracing. Cherry-pick of microsoft#1828
…1839) If two ranges in a guest's memory layout share a bitmap backing page, then during bitmap initialization one of the range's bitmap state will be incorrectly zeroed. This causes bitmap checks to unexpectedly fail. Fix this by not re-zeroing bitmap pages during initialization. Cherry-pick of microsoft#1830 Co-authored-by: John Starks <[email protected]>
…1847) There is a vulnerability in OpenHCL's VMGS key-rolling code that allows the host to cause the VMGS to be encrypted with a host-controlled key. unwrap_and_rotate_keys now returns a pair of egress keys: one that may have been used to previously encrypt the VMGS and can only be used for that purpose; and a second key, always derived anew, that can safely be used to re-encrypt the VMGS. CVE-2025-53781 Co-authored-by: Copilot <[email protected]>
Cherry-pick into release/2505 for microsoft#1831 Co-authored-by: Jenna Goddard <[email protected]>
Cherry pick to release/2505 ### underhill_core: factory for nvme devices + tests for nvme_manager (microsoft#1787) Add unit tests for the existing code in `nvme_manager.rs`. This requires a minor refactoring: push the code to create NVMe drivers into a factory that the tests can then mock. These basic tests already highlight the performance problems seen in production: the GetNamespace path is not concurrent. This is one part of the broader work effort. ### underhill_core: nvme_manager: make it multithreaded microsoft#1763 This PR addresses serialization in the existing underhill_core: nvme_manager. This serialization proves to be the bottleneck when performing a runtime servicing operation with multiple NVMe devices. This change leverages mesh to create a two-level hierarchy: - The existing `NvmeManager` API surface is the top level. The idea is that this keeps track of the NVMe devices that are in some state of being created, and - A new `NvmeDriverManager` that manages the lifecycle of a single NVMe device. Most NVMe devices have one namespace, but our cloud scale scenario requires supporting multiple namespaces per NVMe device. It's okay to serialize multiple calls to the same device, since the most expensive portion is loading the driver.
This release is almost ready, time to disable debug support in the builds that will end up in prod.
microsoft#1866) Allow the host to specify OpenHCL features and encryption policy for the VMGS.
A malicious admin can evict the AK from their VM's vTPM and replace it with their own key. At boot, Azure will load that key from the VMGS and then sign an AKCert with that key, allowing the admin to spoof KeyGuard and CVM attestation. CVM: This change mirrors changes in the legacy HCL: Regenerate the AK at boot from the TPM seeds, instead of loading it from VMGS. This ensures that the original AKCert is always present in the vTPM. TVM: OpenHCL currently cannot regenerate the AK for a TVM, because the original AK (provisioned by the vtpmservice) contains an auth policy; OpenHCL does not implement that policy creation. As an alternative, when OpenHCL boots, it will check the attributes on the AK that it loads from VMGS. If the attributes are wrong (indicating a possibly malicious key), it will not make any calls to renew the AKCert. CVE-2025-49707 --------- Co-authored-by: Ben Hillis <[email protected]> Co-authored-by: Copilot <[email protected]>
…icrosoft#1875) This reverts commit b768183. This apparently broke vmbus relay on tdx. We're not sure why yet, but revert it for now to unblock RIs.
…crosoft#1878) (microsoft#1879) This fixes VMBus Relay on TDX without the hw debug bit. Clean cherry-pick of microsoft#1878
…icrosoft#1875) (microsoft#1880) This reverts commit 1ff2b55, as microsoft#1879 fixes the issue found.
…) (microsoft#1894) When we are sharing a page we remove all VTL 0 permissions to that page. Later on, when we re-private the page, we were failing to reset these permissions, which led to failures when the guest tried to use pages it should have had access to. This code is a bit confusing due to conflating private/shared with VTL 1 access permissions. Add a bunch of comments, and fix the reset. Cherry-pick of microsoft#1891
…ft#1900) Continue processing events while waiting for guest response from shutdown request. Properly return errors from shutdown requests. CP from microsoft#1895 Co-authored-by: Brian Perkins <[email protected]>
Linux netvsc sends an OID to stop receiving packets on vmbus channel close. Example scenarios: hibernation and MTU change. Prior to opening a new channel and processing the packets, netvsc checks that there are no pending packets. If there are, netvsc logs and error and is unable to recover. We observe the error: `hv_netvsc eth0: Ring buffer not empty after closing rndis` in the guest syslog. Modifying netvsp to handle the OID and stop processing RX traffic. This will allow for netvsc to successfully close and re-open the vmbus channel, even under heavy incoming traffic. --------- Co-authored-by: Sunil Muthuswamy <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Linux netvsc sends an OID to stop receiving packets on vmbus channel close. Example scenarios: hibernation and MTU change. Prior to opening a new channel and processing the packets, netvsc checks that there are no pending packets. If there are, netvsc logs and error and is unable to recover. We observe the error: hv_netvsc eth0: Ring buffer not empty after closing rndis in the guest syslog.
Modifying netvsp to handle the OID and stop processing RX traffic. This will allow for netvsc to successfully close and re-open the vmbus channel, even under heavy incoming traffic.
Cherry pick of
#1873