Skip to content

Conversation

gurasinghMS
Copy link
Contributor

@gurasinghMS gurasinghMS commented Oct 10, 2025

Every QueueHandler now stores an aer_handler. The aer_handler implements the new AerHandler trait and can either be an AdminAerHandler which handles aer commands and communication properly or it can be a NoOpAerHandler that does nothing. The implementation used is determined at the time of QueuePair creation by the NvmeDriver which provides the is_admin boolean value. The bool is persisted as part of the QueuePair state so that the correct implementation is used post restore.

In the admin queue path (using AdminAerHandler), the NvmeDriver no longer drives the AER loop. It is instead handed by the QueueHandler. When looking for commands to process (in the recv channel), the admin QueueHandler prioritizes sending an AERs before processing commands. So when the run loop starts, the QueueHandler automatically sends and AER as the first command. And that subsequent AER commands are prioritized if an AER isn't already pending.

Aer commands are always modeled as detached rpc calls and the AdminAerHandler just scans every completion on the admin queue awaiting the AEN. This is not that much of a performance defecit because such scanning only happens on the admin queue. Performance of IO Queues not impacted.

Io QueueHandlers are provided the NoOpAerHandler which only has empty function. These empty functions should be compiled away and thus the IO implementation should remain exactly as performant as it is today (Thanks to @alandau for the insights here). This is especially true for the poll_send_aer that sits on the hot/critical path. Since it uses the #[inline] tag and always returns false, the entire if check should be compiled away for all Io Queues. (We don't need to inline the other functions because they don't actually sit on the critical path and won't be hit on Io queues anyways.

image

}

pub struct NoOpAerHandler;
impl AerHandler for NoOpAerHandler {}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the reviewers: Should we be adding a panic in here somewhere to make sure that functions like handle_aen( ) are never invoked?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The functions will be invoked, but the invocation will be inlined (well, modulo the Box<dyn> thing). If you put a panic! here, it will panic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right! I was wondering if we should panic if this function is called. As in, if we end up in a situation where the driver is sending GetAen commands to an IOQueue, something has gone horribly wrong somewhere ....

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, sorry, missed that you're talking about handle_aer_request (as opposed to, say, poll_send_aer).
The function can be called only if the hardware devices to send an AEN without an explicit AER, and this is out of spec, right? If we don't anticipate buggy hardware (physical or virtualized), or we want to catch these bugs at the expense of a panic, this looks like a good idea. But I don't know if that's the policy for OpenVMM. As an alternative, we can log a message (that's likely to be ignored)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the intention is that these functions should never be invoked then a panic is the right thing to do.

Copy link
Contributor Author

@gurasinghMS gurasinghMS Oct 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added 2 panics in the NoOp handler code for the functions on the non-critical path. i.e. these functions are only ever invoked if the driver requests for an AEN or if the IO Queue tries to send an AER .... both of which should NEVER happen. This should be more of a failsafe

@gurasinghMS gurasinghMS changed the title wip: templatize queue handler in the nvme driver to allow aer handling with low overhead nvme_driver: templatize queue handler in the nvme driver to allow aer handling with low overhead Oct 10, 2025
@gurasinghMS gurasinghMS marked this pull request as ready for review October 10, 2025 18:34
@gurasinghMS gurasinghMS requested review from a team as code owners October 10, 2025 18:34
@Copilot Copilot AI review requested due to automatic review settings October 10, 2025 18:34
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR templatizes the NVMe queue handler to optimize AER (Asynchronous Event Request) handling by introducing handler-specific behavior for admin vs IO queues. The key improvement is moving AER handling from the driver level to the queue level while maintaining performance for IO operations.

  • Introduces an AerHandler trait with AdminAerHandler for admin queues and NoOpAerHandler for IO queues
  • Moves AER command management from the NvmeDriver to the AdminAerHandler within queue processing
  • Uses templating and inlining to ensure IO queue performance remains unaffected

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
queue_pair.rs Adds AER handler trait system, implements admin and no-op handlers, integrates handler into queue processing loop
driver.rs Updates queue creation to specify admin/IO type, modifies AER handling to use new queue-level API

Copy link

#[mesh(6)]
pub handler_data: QueueHandlerSavedState,
#[mesh(7)]
pub is_admin: Option<()>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Why is this an Option instead of a plain bool?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See https://openvmm.dev/guide/dev_guide/contrib/save-state.html about safely adding fields to saved state. I bet that's what led Guramrit to make this an Option. I do agree, this should be an Option<bool> rather than an Option<()>.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there are two ways to go forward here:

  1. The right way: if is_admin is Some(...), then we can trust what it says. Otherwise, we need the code in the restore paths to tell the QueueHandler if it's an admin queue or not. Perhaps this can be handled in the top level restore of the NVMe controller.

  2. The okay way: detect that we're the admin queue at the time of issuing first AEN, and reconfigure ourselves for that.

  3. Ignore this problem, since we aren't going to ever save nvme driver without using keepalive, and we're not going to use keepalive until after all this code is in.

I actually think (1) is easier than (2), but let me know what you all think.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought I was being so smart using Option<()> instead of a bool hehe!! Made it an option because of the saved state guidelines that matt linked. Will make this an Option instead!

@gurasinghMS
Copy link
Contributor Author

gurasinghMS commented Oct 10, 2025

I updated the code to use concrete types for the QueueHandler to allow for compiler optimizations in the QueueHandler::run() function. @mattkur @alandau I did have to remove the Option<()> from the saved state of the QueuePair as it is no longer needed. The driver now invokes the appropriate type for the QueuePair<T> during both new() and restore(). Let me know what you think.
I am a little concerned about what happens when the controller returns an error for an AER command. If this is unchecked on the driver side, we end up in a vicious loop where the driver just hammers the controller with AER commands. For now I added a failed state. Handler will stop processing AERs upon receiving the first failure (This is exactly what we do today). Will need to think about this case more and maybe add some sort of throttling mechanism

Copy link

last_aen,
await_aen_cid,
} = state;
self.last_aen = last_aen.map(AsynchronousEventRequestDw0::from_bits); // Restore from u32
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering if saving this as a u32 makes sense here. Should we instead be updating the type to allow saving it directly?

@gurasinghMS
Copy link
Contributor Author

In order to verify that the poll_send_aer() function is indeed being compiled out of the driver code, I built and inspected the crate assembly using:

cargo build --release --package nvme_driver && cargo rustc --package nvme_driver --lib -- --emit=asm
rustfilt < nvme_driver-9da3de0345bb5b66.s > ../../demangled_output_release.s
grep -n "poll_send_aer" demangled_output_release.s

Output shows that no function is even created for the NoOpAerHandler type. This should mean that things are properly compiled out:
image

@alandau
Copy link
Contributor

alandau commented Oct 13, 2025

In order to verify that the poll_send_aer() function is indeed being compiled out of the driver code, I built and inspected the crate assembly using:

cargo build --release --package nvme_driver && cargo rustc --package nvme_driver --lib -- --emit=asm
rustfilt < nvme_driver-9da3de0345bb5b66.s > ../../demangled_output_release.s
grep -n "poll_send_aer" demangled_output_release.s

I think this pretty much proves it. That said, I'd look at the disassembly of the compiled executable (with symbols) to find the function that's supposed to call poll_send_aer and see that no call happens (and nothing is inlined in its place).

Copy link
Contributor

@alandau alandau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bits around making QueuePair generic look good to me, thanks for addressing the comments.

}

pub struct NoOpAerHandler;
impl AerHandler for NoOpAerHandler {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, sorry, missed that you're talking about handle_aer_request (as opposed to, say, poll_send_aer).
The function can be called only if the hardware devices to send an AEN without an explicit AER, and this is out of spec, right? If we don't anticipate buggy hardware (physical or virtualized), or we want to catch these bugs at the expense of a panic, this looks like a good idea. But I don't know if that's the policy for OpenVMM. As an alternative, we can log a message (that's likely to be ignored)

@gurasinghMS
Copy link
Contributor Author

Ah, sorry, missed that you're talking about handle_aer_request (as opposed to, say, poll_send_aer).
The function can be called only if the hardware devices to send an AEN without an explicit AER, and this is out of spec, right? If we don't anticipate buggy hardware (physical or virtualized), or we want to catch these bugs at the expense of a panic, this looks like a good idea. But I don't know if that's the policy for OpenVMM. As an alternative, we can log a message (that's likely to be ignored)

If a buggy AEN is sent to an IO queue, the code will just panic when it tries removing from the pending_commands list because the cid was never sent to the device

Copy link

alandau
alandau previously approved these changes Oct 14, 2025

#[derive(Clone, Debug, Protobuf)]
#[mesh(package = "nvme_driver")]
pub struct AerHandlerSavedState {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks fine to me but probably needs review from matt?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what happens when we go to an version of the driver that does not support this? or we come from a version of a driver that didn't support this?

Copy link
Contributor Author

@gurasinghMS gurasinghMS Oct 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is saved as an Option<>:

    pub struct QueueHandlerSavedState {
        #[mesh(1)]
        pub sq_state: SubmissionQueueSavedState,
        #[mesh(2)]
        pub cq_state: CompletionQueueSavedState,
        #[mesh(3)]
        pub pending_cmds: PendingCommandsSavedState,
        #[mesh(4)]
        pub aer_handler: Option<AerHandlerSavedState>,
    }

I am not sure if we go backwards with the versions (i.e. to a driver that doesn't support this) but if we go forward (i.e. coming from a version of a driver that didn't support this), during restore() the admin aer handler would do nothing and just start from a new state:

    fn restore(&mut self, state: &Option<AerHandlerSavedState>) {
        if let Some(state) = state {
            let AerHandlerSavedState {
                last_aen,
                await_aen_cid,
            } = state;
            self.last_aen = last_aen.map(AsynchronousEventRequestDw0::from_bits); // Restore from u32
            self.await_aen_cid = *await_aen_cid;
        }
    }

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed this yesterday. Since this hasn't yet shipped, I don't think we need to think too hard about this. But:

new -> old: no worse than now. AER will still be pended in the device (or not), but that's the existing behavior anyways.
old -> new: no worse than now. AER will still be pended in the device, but as Guramrit mentions, the new driver won't know. (which is the same as existing behavior)

/// Returns whether an AER needs to sent to the controller or not. Since
/// this is the only function on the critical path, attempt to inline it.
#[inline]
fn poll_send_aer(&self) -> bool {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is default impls the right thing to do here, or do you want to force trait implementers to implement the noops themselves? It seems like it should be the latter?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally, I can see this going either way and I don't have a strong opinion on this. In my head, I was treating the default implementation of the trait as the NoOp-Handler (indicated this in the trait comment as well). Happy to go either way on this one

Copy link
Contributor

@mattkur mattkur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

few minor comments, but looks good. Thanks Guramrit!

Comment on lines +695 to +700
// If error, cleanup and stop processing AENs.
if completion.status.status() != 0 {
self.failed = true;
self.last_aen = None;
return;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: is an error logged in this case?

Comment on lines +701 to +707
// Complete the AEN or pend it.
let aen = AsynchronousEventRequestDw0::from_bits(completion.dw0);
if let Some(send_aen) = self.send_aen.take() {
send_aen.complete(aen);
} else {
self.last_aen = Some(aen);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: please add a comment that explains why it's safe to delay sending the AER here. AER will be sent before the command to get log page (which is what clears the bit in the driver that tells the device to send an AER).

@gurasinghMS gurasinghMS merged commit 0e50fcf into microsoft:main Oct 15, 2025
50 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants