Add support for IOMMUFD IOCTLs by benlwalker · Pull Request #833 · nutanix/libvfio-user

benlwalker · 2026-02-05T16:04:59Z

Linux added a new uapi called IOMMUFD for managing the IOMMU. In recent Linux releases, many of the vfio ioctls are now just legacy wrappers around this new interface. However, the new interface has a couple of important features that I'd like to use with vfio-user. Namely:

Support for allowing the server to allocate the IOVA. This is important because the server can return the true IOVA to the client directly, which can be used as an actual DMA address by hardware. If the client puts this address into a command, they don't have to rely on the server process to intercept the command and translate it.
Support for registering memory described by a dma-buf (e.g. memory on another PCI device)

I've tried to make this backward compatible by adding feature negotiation. Only if both client and server report support will it be used.

Add lspci.expected.out.5 to accommodate output format differences in newer versions of the lspci utility (e.g., on Fedora 42). The test already supported multiple expected outputs for different lspci versions; this adds another variant. Signed-off-by: Ben Walker <ben@nvidia.com>

benlwalker · 2026-02-05T16:12:08Z

include/vfio-user.h

+} __attribute__((packed));
+
+/* VFIO_USER_IOMMUFD_BIND */
+struct vfio_user_iommufd_bind {


I think in the real iommufd flow you'd be selecting the device to bind to here. But in vfio-user the device is implied by the domain socket we connected to. So all this is really doing is giving us this devid value to use later (and we don't really need it later, because there's only one device). I debated whether to keep this to make the flow stay the same as when using the real iommufd, or if I should just drop it and simplify.

benlwalker · 2026-02-05T16:17:40Z

include/vfio-user.h

+    uint32_t __reserved;
+    uint64_t user_va;      /* Userspace virtual address (for tracking) */
+    uint64_t length;
+    uint64_t iova;         /* Output: server-allocated IOVA */


In the real iommufd there's an extra flag defined that tells the system whether the client specifies the iova here or whether the server fills it in. I've only implemented the mode where the server fills it in. There's a few considerations here:

If the client chooses the iova, then it's almost certainly not going to be the real DMA address that needs to be sent to the hardware. The system will assume that the server process intercepts and translates all commands.

If the server chooses the iova then we can return the real DMA address and the server doesn't need to intercept anything (so in theory, clients could get more direct access to the NVMe queues some day, if we gain additional security features in other areas). But the client needs to translate from its own vaddr to the server returned iova on every I/O.

The biggest challenge comes in knowing when the server should do a translation. By only allowing server-choose mode with iommufd, then I know if we're in iommufd mode the server doesn't have to translate. If I allow both modes like in the real iommufd ioctl, I don't have a great way to figure out if the server should translate an address or not.

I think protocol wise we should have the option, but explicitly only support server-allocated for now - and we should have those specific features in negotiation too (we'd not say we support client allocation for these)

Add support for iommufd-based memory management to the vfio-user protocol as an alternative to the legacy VFIO DMA_MAP/UNMAP model. Key changes: - Add capability negotiation for iommufd support in VFIO_USER_VERSION - Implement new protocol commands for iommufd operations: * VFIO_USER_IOMMUFD_ALLOC_IOAS - Allocate I/O Address Space * VFIO_USER_IOMMUFD_DESTROY_IOAS - Destroy I/O Address Space * VFIO_USER_IOMMUFD_BIND - Bind device to iommufd context * VFIO_USER_IOMMUFD_ATTACH_PT - Attach device to IOAS * VFIO_USER_IOMMUFD_DETACH_PT - Detach device from IOAS * VFIO_USER_IOMMUFD_IOAS_MAP - Map memory region (server-allocated IOVA) * VFIO_USER_IOMMUFD_IOAS_UNMAP - Unmap memory region - Add server-side iommufd controller implementation (lib/iommufd.c) - Update client sample to support iommufd capability negotiation - Fix build dependencies in samples and tests The iommufd model uses server-controlled IOVA allocation, where the server performs mmap() and returns the mapped virtual address as the IOVA. This simplifies address management and avoids the need for separate IOVA allocators. When both client and server advertise iommufd support during version negotiation, iommufd mode is enabled for the connection. Otherwise, the connection falls back to legacy VFIO DMA_MAP/UNMAP. Signed-off-by: Ben Walker <ben@nvidia.com>

tmakatos

Thanks for the patch, just to be clear there's no interaction with /dev/iommufd but this add iommufd-like functionality, right? I need to familiarise myself first with iommufd before I can meaningfully review this.

benlwalker · 2026-02-05T22:49:36Z

Thanks for the patch, just to be clear there's no interaction with /dev/iommufd but this add iommufd-like functionality, right? I need to familiarise myself first with iommufd before I can meaningfully review this.

Right - just like this library doesn't interact with vfio but uses the same ioctls. Portions of the functionality covered previously by vfio in Linux have moved to this new iommufd API (and those parts of vfio are now just a thin wrapper around iommufd) and I wanted the new fancy features in iommufd to be available in vfio-user.

tmakatos · 2026-02-06T12:54:38Z

Thanks for the patch, just to be clear there's no interaction with /dev/iommufd but this add iommufd-like functionality, right? I need to familiarise myself first with iommufd before I can meaningfully review this.

Right - just like this library doesn't interact with vfio but uses the same ioctls. Portions of the functionality covered previously by vfio in Linux have moved to this new iommufd API (and those parts of vfio are now just a thin wrapper around iommufd) and I wanted the new fancy features in iommufd to be available in vfio-user.

Makes sense, I think using IOMMUFD in the protocol specification is consistent with what we've been doing so far. Not sure about whether it should be called iommufd internally though, I'll defer this to @jlevon (who has more bikesheddy inclinations).
I suppose you're going to post the protocol changes to qemu-devel?
Also, do you have any relevant QEMU patches to share?

tmakatos

@benlwalker this is a very superficial, partial review, I haven't had a chance to read about iommufd just yet.

tmakatos · 2026-02-06T16:49:23Z

include/vfio-user.h

+/*
+ * vfio-user protocol commands
+ *
+ * Commands 1-18 are part of the base protocol. Commands 19+ implement


Migration read/write isn't supported in practise if the device isn't migrateable, so not sure we need to state it here.

tmakatos · 2026-02-06T16:53:31Z

lib/iommufd.h

+ */
+typedef struct iommufd_ioas {
+    uint32_t            ioas_id;
+    TAILQ_HEAD(, iommufd_mapping) mappings;


So an IOAS has multiple IOVA mappings? Just checking whether a list is suitable, how many such mappings do we generally expect?

tmakatos · 2026-02-06T16:55:13Z

lib/iommufd.h

+typedef struct iommufd_controller {
+    struct vfu_ctx      *vfu_ctx;
+    bool                enabled;        /* iommufd mode negotiated */
+    bool                device_bound;   /* Device bound to iommufd */


Since there's no /dev/iommufd, what does device_bound mean in practise?

tmakatos · 2026-02-06T21:36:36Z

lib/iommufd.h

+} iommufd_controller_t;
+
+/*
+ * Create a new iommufd controller.


I suppose returns NULL on error and sets errno?
(Here and wherever a pointer is returned?)

tmakatos · 2026-02-06T21:38:49Z

lib/iommufd.h

+/*
+ * Destroy an IOAS and all its mappings.
+ */
+int


-1 on error and sets errno? (elsewher as well?)

tmakatos · 2026-02-06T21:54:14Z

lib/libvfio-user.c

+                return ERROR_INT(ERANGE) - 1; /* Need at least 1 SG entry */
+            }
+
+            /* Use INT_MAX as special region index to mark iommufd mapping */


Isn't it cleaner to use a dedicated field for that?

tmakatos · 2026-02-06T21:55:20Z

lib/libvfio-user.c

+    for (i = 0; i < cnt; i++, sg++) {
+        if (sg->region == INT_MAX) {
+            /* This is an iommufd mapping - IOVA equals the server's virtual address */
+            if (vfu_ctx->iommufd == NULL || !iommufd_is_enabled(vfu_ctx->iommufd)) {


That should be an assert right?

tmakatos · 2026-02-06T21:56:48Z

lib/libvfio-user.c

+            iov[i].iov_len = sg->length;
+        } else {
+            /* Legacy DMA mapping - use existing dma_sgl_get logic */
+            if (sg->region >= vfu_ctx->dma->nregions) {


tmakatos · 2026-02-06T21:57:51Z

lib/libvfio-user.c

+                return ERROR_INT(EINVAL);
+            }
+
+            dma_memory_region_t *region = &vfu_ctx->dma->regions[sg->region];


Need to check why dma_sgl_get went away.

tmakatos · 2026-02-06T21:59:51Z

lib/libvfio-user.c

+        return ret;
+    }
+
+    /* Return a dummy device ID (not used in vfio-user) */


AFAIK vfio-user doesn't use device IDs at all, check.

jlevon · 2026-03-17T19:56:07Z

lib/iommufd.c

+    }
+
+    iommufd->device_bound = true;
+    iommufd->bound_ioas_id = 0;


can we use a define like INVALID_IOAS_ID instead of the zero here?

jlevon · 2026-03-17T20:01:22Z

lib/libvfio-user.c

+            /* Use INT_MAX as special region index to mark iommufd mapping */
+            /* In iommufd mode, IOVA equals the server's virtual address, so we can use it directly */
+            sgl[0].dma_addr = dma_addr;
+            sgl[0].region = INT_MAX;


a more specific define than INT_MAX for marking these please.

this will change once we merge the btree...

jlevon · 2026-03-17T20:03:36Z

lib/iommufd.c

+}
+
+int
+iommufd_bind_device(iommufd_controller_t *iommufd)


I'm not seeing any advantage to us having a bind primitive in our protocol, so I do think we can drop it.

jlevon · 2026-03-17T20:09:23Z

lib/iommufd.c

+    return 0;
+}
+
+int


what does attach/detach actually do for vfio-user servers? The only difference I could see is that we check the bound ioas first in iommufd_find_mapping() - but we already have the mapping_hint there.

What am I missing?

jlevon · 2026-03-17T20:12:33Z

lib/iommufd.c

+    mapping->vaddr = vaddr;
+    mapping->iova = (uint64_t)(uintptr_t)(vaddr + offset_in_page);
+    mapping->length = length;
+    mapping->user_va = user_va;


we just store this and don't do anything with it - this is the clients VA right? The docs could clarify this, and also give more detail as to why we even accept it in the message never mind store it?

jlevon · 2026-03-17T20:20:18Z

lib/iommufd.c

+
+    /* The IOVA is the server's vaddr (offset-adjusted) */
+    mapping->vaddr = vaddr;
+    mapping->iova = (uint64_t)(uintptr_t)(vaddr + offset_in_page);


I think the comment/explanation above struct vfu_dma_info needs updating, as it's still stating iova space is physical

jlevon · 2026-03-17T21:00:19Z

include/vfio-user.h

    VFIO_USER_MIG_DATA_READ             = 17,
    VFIO_USER_MIG_DATA_WRITE            = 18,
+    /*
+     * iommufd protocol extension


I don't think we should call this iommufd (but it's fine if we reference it for inspiration). There's really no equivalent thing to "iommufd" at all here, so it's not a useful name for this set of APIs.

I'd suggest instead (depending on disposition of attach/detach, see below) VFIO_USER_IOAS_*

jlevon · 2026-03-17T21:00:45Z

include/vfio-user.h

+     * the legacy VFIO_USER_DMA_MAP/UNMAP commands, mirroring the Linux
+     * kernel's iommufd API.
+     */
+    VFIO_USER_IOMMUFD_ALLOC_IOAS        = 19,


I wouldn't ask it of you, but we will need to update the protocol definition (which lives in qemu now). Let's make sure to file an issue at least.

jlevon · 2026-03-17T21:02:18Z

include/vfio-user.h

+    VFIO_USER_IOMMUFD_ATTACH_PT         = 25,
+    VFIO_USER_IOMMUFD_DETACH_PT         = 26,
    VFIO_USER_MAX,
 };


any chance I could talk you into updating https://github.com/nutanix/libvfio-user/blob/master/docs/memory-mapping.md with a section on these APIs, and at least a brief summary of what an "ioas" is, how it relates to the existing libvfio-user APIs, the IOVA==SVA requirement here?

Also in particular, what would we actually want multiple IOAS for in vfio-user

jlevon · 2026-03-17T21:08:53Z

lib/libvfio-user.c

+    if (vfu_ctx->iommufd != NULL && iommufd_is_enabled(vfu_ctx->iommufd)) {
+        /* Validate that the IOVA is registered (for security) */
+        mapping = iommufd_find_mapping(vfu_ctx->iommufd, (uint64_t)(uintptr_t)dma_addr, len);
+        if (mapping != NULL) {


why would we fall back to the non-iommufd implementation, if we can't find a suitable mapping here?

Also, could you expand more about the specific "security" check you're doing here? (I think it's to prevent an incoming client request from arbitrarily reading for any VA in the server process, but it's not explicit here.)

jlevon · 2026-03-20T17:57:39Z

test/test-lspci.sh

 $LSPCI | lspci -vv -F /dev/stdin >lspci.out

-for i in 1 2 3 4; do
+for i in 1 2 3 4 5; do


do you mind opening a separate PR for your lspci commit, so we can get it merged?

jlevon · 2026-03-20T17:58:31Z

samples/client.c

@@ -197,7 +197,8 @@ recv_version(int sock, int *server_max_fds, size_t *server_max_data_xfer_size,
        }

        ret = tran_parse_version_json(json_str, server_max_fds,


we will definitely want python tests for the new APIs once they're firmed up

jlevon · 2026-03-20T18:04:06Z

lib/tran.c

+        }
+        /* Enable iommufd mode if both client and server support it */
+        if (iommufd_supported && vfu_ctx->iommufd != NULL) {
+            iommufd_enable(vfu_ctx->iommufd);


I think you could instead just have ->iommufd only allocated when iommufd is enabled, and != NULL comparisons to decide if it's supported

jlevon · 2026-03-20T18:06:43Z

lib/libvfio-user.c

 static int
 vfu_reset_ctx(vfu_ctx_t *vfu_ctx, int reason);

+/* iommufd command handlers */


nit, could you just put the handlers before handle_request() and not forward declare here?

jlevon · 2026-03-20T18:08:16Z

lib/libvfio-user.c

+    /* Check iommufd mappings first if iommufd is enabled */
+    if (vfu_ctx->iommufd != NULL && iommufd_is_enabled(vfu_ctx->iommufd)) {
+        /* Validate that the IOVA is registered (for security) */
+        mapping = iommufd_find_mapping(vfu_ctx->iommufd, (uint64_t)(uintptr_t)dma_addr, len);


general nit: lots of long lines, we generally keep to 80 chars

jlevon · 2026-03-20T18:14:13Z

lib/libvfio-user.c

    return dma_sg_is_mappable(vfu_ctx->dma, sg);
 }

+/* iommufd command handler implementations */


will review all these once disposition of bind+attach is resolved

jlevon · 2026-03-20T18:14:45Z

lib/iommufd.h

+ */
+
+/*
+ * iommufd support for libvfio-user


could do with expanding - and explaining the relationship to dma.c

jlevon · 2026-03-20T18:18:45Z

include/vfio-user.h

+struct vfio_user_iommufd_ioas_unmap {
+    uint32_t argsz;
+    uint32_t flags;
+#define VFIO_USER_IOMMUFD_UNMAP_ALL  (1 << 0)


this isn't handled - and handle_iommufd_ioas_unmap() doesn't validate the flags at all

jlevon · 2026-03-20T18:21:30Z

include/vfio-user.h

+     * kernel's iommufd API.
+     */
+    VFIO_USER_IOMMUFD_ALLOC_IOAS        = 19,
+    VFIO_USER_IOMMUFD_DESTROY_IOAS      = 20,


need relevant command_needs_quiesce() otherwise client code race badly with multi-threaded code using sgls

jlevon · 2026-03-20T18:23:35Z

lib/iommufd.c

+        return 0;
+    }
+
+    /* Determine page size and round mmap parameters */


lacking the fd_get_blocksize() behaviour of the legacy controller?

benlwalker force-pushed the iommufd branch from 1534250 to 21cffb0 Compare February 5, 2026 16:09

benlwalker commented Feb 5, 2026

View reviewed changes

benlwalker force-pushed the iommufd branch from 21cffb0 to 3044074 Compare February 5, 2026 22:45

tmakatos reviewed Feb 5, 2026

View reviewed changes

tmakatos reviewed Feb 13, 2026

View reviewed changes

jlevon reviewed Mar 17, 2026

View reviewed changes

jlevon reviewed Mar 20, 2026

View reviewed changes

jlevon mentioned this pull request Apr 9, 2026

[Feature]: Add support for vfio-pci mapped memory on client side #843

Open

		@@ -197,7 +197,8 @@ recv_version(int sock, int server_max_fds, size_t server_max_data_xfer_size,
		}

		ret = tran_parse_version_json(json_str, server_max_fds,

Conversation

benlwalker commented Feb 5, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tmakatos left a comment

Choose a reason for hiding this comment

Uh oh!

benlwalker commented Feb 5, 2026

Uh oh!

tmakatos commented Feb 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tmakatos left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tmakatos commented Feb 6, 2026 •

edited

Loading