-
Notifications
You must be signed in to change notification settings - Fork 138
wip: Advertise optional flr support #1858
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds optional Function Level Reset (FLR) support to the NVMe fault controller for testing purposes. This enables VFIO to modify the reset_method
file by advertising FLR capability through PCI Express configuration space.
Key changes:
- Implements PCI Express capability with FLR support in the fault controller
- Adds comprehensive unit tests for FLR functionality
- Updates configuration structures to include FLR support option
Reviewed Changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 5 comments.
Show a summary per file
File | Description |
---|---|
vm/devices/storage/nvme_test/src/tests/flr_tests.rs | New comprehensive test suite for FLR capability advertising and triggering |
vm/devices/storage/nvme_test/src/pci.rs | Main implementation adding PCI Express capability with FLR handler and reset logic |
vm/devices/pci/pci_core/src/capabilities/pci_express.rs | New PCI Express capability implementation with FLR support and extensive tests |
vm/devices/pci/pci_core/src/spec.rs | PCI Express specification definitions for capability registers |
vm/devices/storage/nvme_test/src/lib.rs | Updated fault configuration to use Arc for shared ownership |
vm/devices/storage/nvme_resources/src/lib.rs | Added flr_support field to controller handle |
vm/devices/storage/nvme_test/src/resolver.rs | Updated resolver to pass through flr_support configuration |
vm/devices/storage/nvme_test/src/tests.rs | Added flr_tests module |
vm/devices/storage/nvme_test/src/tests/controller_tests.rs | Updated fault configuration usage and added flr_support field |
vm/devices/storage/disk_nvme/nvme_driver/src/tests.rs | Updated fault configuration usage and added flr_support field |
vm/devices/pci/pci_core/src/capabilities/mod.rs | Exposed new PCI Express capability types |
Some Open question for this PR:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good. Thanks for taking the time to work on this!
/// Creates a new PCI Express capability with FLR support. | ||
/// | ||
/// # Arguments | ||
/// * `flr_supported` - Whether Function Level Reset is supported | ||
/// * `flr_handler` - Optional handler to be called when FLR is initiated | ||
pub fn new(flr_supported: bool, flr_handler: Option<Arc<dyn FlrHandler>>) -> Self { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do you envision that callers would use this if they specify flr_supported: true
but flr_handler: None
. I'm not sure that makes semantic sense. What do you think? (I'm happy to be wrong here, I'm still wrapping my head around the nuance of FLR behavior).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it make sense to even allow this behavior? It seems like even if you want a no-op, shouldn't you be forced to specify a handler?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Theoretically I think requiring FLR handler at all times makes sense to me. However, my holdout is it would add unnecessary bloat to tests down the road (specially if we expect more PCIe capabilities to be supported in the future). What if we get rid of the bool altogether and just interpret Some(FlrHandler)
to mean that flr is supported?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using above approach
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I like Some(FlrHandler) here.
PciExpressCapabilityHeader::PCIE_CAPS => { | ||
// PCIe Capabilities Register (16 bits) + Next Pointer (8 bits) + Capability ID (8 bits) | ||
// For basic endpoint: Version=2, Device/Port Type=0 (PCI Express Endpoint) | ||
let pcie_caps: u16 = 0x0002; // Version 2, Device/Port Type 0 | ||
(pcie_caps as u32) << 16 | (0x00 << 8) | CapabilityId::PCI_EXPRESS.0 as u32 | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not critical, but I'd prefer to see you define the appropriate struct type for this as well. We're already doing that for the device caps, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think this could be a follow up change to this? Since the PR has been sitting for a while I am trying to scope it to the mvp. Always happy to send out follow up changes!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, yes. That can be a follow up.
/// Creates a new PCI Express capability with FLR support. | ||
/// | ||
/// # Arguments | ||
/// * `flr_supported` - Whether Function Level Reset is supported | ||
/// * `flr_handler` - Optional handler to be called when FLR is initiated | ||
pub fn new(flr_supported: bool, flr_handler: Option<Arc<dyn FlrHandler>>) -> Self { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it make sense to even allow this behavior? It seems like even if you want a no-op, shouldn't you be forced to specify a handler?
d30046e
to
ece7dac
Compare
- Add PCI Express capability ID to pci_core spec - Implement PciExpressCapability with FLR support in pci_core - Add FlrHandler trait for handling FLR events - Add flr_support flag to NvmeControllerCaps and NvmeControllerHandle - Wire FLR capability to NvmeController when enabled - Implement FLR reset logic using atomic flag mechanism - Add comprehensive unit tests for FLR functionality - Update nvme_resources to include FLR configuration option Co-authored-by: mattkur <[email protected]>
Co-authored-by: mattkur <[email protected]>
…eing isolated to the pcie flr testing
…king mut variables
…ion Pr is checked in
2919774
to
8c2141e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall this LGTM. We discussed offline that you're going to split this PR into one for the core capability and another for the NVMe handling of it. Looking at a smaller nvme-only PR may yield some additional comments, but I think we're good to go here. Thanks for your work on this Guramrit!
Co-authored-by: Matt LaFayette (Kurjanowicz) <[email protected]>
…th some basic functionality (#1930) Adds the PCI_Express capability option for devices to advertise PCIe support. This change will also us to add FLR support to the nvme_test crate for further perf testing. PciExpressCapability: New capability struct implementing PCI Express Device Capabilities, Device Control, and Device Status registers Spec digarams (From PCIe Base Spec 6.4) for reference: <img width="1481" height="991" alt="image" src="https://github.com/user-attachments/assets/a6f49e62-af03-4573-ab60-17637154aadc" /> <img width="1489" height="919" alt="image" src="https://github.com/user-attachments/assets/21f6996c-eb8b-4fb6-b505-a113c37d9257" /> Being made as a helper PR to #1858 since that was getting a little too large/complicated --------- Co-authored-by: Copilot <[email protected]> Co-authored-by: Matt LaFayette (Kurjanowicz) <[email protected]>
Adding optional FLR support to the NvmeFaultController. Fixes: #1719. Description in the words of copilot:
Problem
When testing with the NVMe emulator, VFIO didn't allow OpenHCL to modify the
reset_method
file because the emulator didn't advertise FLR support. This prevented proper device reset functionality in virtualized environments.Solution
Added a minimal PCI Express capability implementation that:
Key Components
flr_support
flag toNvmeControllerCaps
andNvmeControllerHandle
Testing
Added comprehensive unit tests that verify:
Usage
Disclaimer: This is a reworked PR authored by copilot with logic fixes to how the FLR is completed and fixes to thedefinition of the DeviceConfiguration struct in PCIe capability + Moved the support to the fault emulator instead of the main one (for now).