Conversation
Add lspci.expected.out.5 to accommodate output format differences in newer versions of the lspci utility (e.g., on Fedora 42). The test already supported multiple expected outputs for different lspci versions; this adds another variant. Signed-off-by: Ben Walker <ben@nvidia.com>
| } __attribute__((packed)); | ||
|
|
||
| /* VFIO_USER_IOMMUFD_BIND */ | ||
| struct vfio_user_iommufd_bind { |
There was a problem hiding this comment.
I think in the real iommufd flow you'd be selecting the device to bind to here. But in vfio-user the device is implied by the domain socket we connected to. So all this is really doing is giving us this devid value to use later (and we don't really need it later, because there's only one device). I debated whether to keep this to make the flow stay the same as when using the real iommufd, or if I should just drop it and simplify.
| uint32_t __reserved; | ||
| uint64_t user_va; /* Userspace virtual address (for tracking) */ | ||
| uint64_t length; | ||
| uint64_t iova; /* Output: server-allocated IOVA */ |
There was a problem hiding this comment.
In the real iommufd there's an extra flag defined that tells the system whether the client specifies the iova here or whether the server fills it in. I've only implemented the mode where the server fills it in. There's a few considerations here:
- If the client chooses the iova, then it's almost certainly not going to be the real DMA address that needs to be sent to the hardware. The system will assume that the server process intercepts and translates all commands.
- If the server chooses the iova then we can return the real DMA address and the server doesn't need to intercept anything (so in theory, clients could get more direct access to the NVMe queues some day, if we gain additional security features in other areas). But the client needs to translate from its own vaddr to the server returned iova on every I/O.
The biggest challenge comes in knowing when the server should do a translation. By only allowing server-choose mode with iommufd, then I know if we're in iommufd mode the server doesn't have to translate. If I allow both modes like in the real iommufd ioctl, I don't have a great way to figure out if the server should translate an address or not.
There was a problem hiding this comment.
I think protocol wise we should have the option, but explicitly only support server-allocated for now - and we should have those specific features in negotiation too (we'd not say we support client allocation for these)
Add support for iommufd-based memory management to the vfio-user protocol as an alternative to the legacy VFIO DMA_MAP/UNMAP model. Key changes: - Add capability negotiation for iommufd support in VFIO_USER_VERSION - Implement new protocol commands for iommufd operations: * VFIO_USER_IOMMUFD_ALLOC_IOAS - Allocate I/O Address Space * VFIO_USER_IOMMUFD_DESTROY_IOAS - Destroy I/O Address Space * VFIO_USER_IOMMUFD_BIND - Bind device to iommufd context * VFIO_USER_IOMMUFD_ATTACH_PT - Attach device to IOAS * VFIO_USER_IOMMUFD_DETACH_PT - Detach device from IOAS * VFIO_USER_IOMMUFD_IOAS_MAP - Map memory region (server-allocated IOVA) * VFIO_USER_IOMMUFD_IOAS_UNMAP - Unmap memory region - Add server-side iommufd controller implementation (lib/iommufd.c) - Update client sample to support iommufd capability negotiation - Fix build dependencies in samples and tests The iommufd model uses server-controlled IOVA allocation, where the server performs mmap() and returns the mapped virtual address as the IOVA. This simplifies address management and avoids the need for separate IOVA allocators. When both client and server advertise iommufd support during version negotiation, iommufd mode is enabled for the connection. Otherwise, the connection falls back to legacy VFIO DMA_MAP/UNMAP. Signed-off-by: Ben Walker <ben@nvidia.com>
tmakatos
left a comment
There was a problem hiding this comment.
Thanks for the patch, just to be clear there's no interaction with /dev/iommufd but this add iommufd-like functionality, right? I need to familiarise myself first with iommufd before I can meaningfully review this.
Right - just like this library doesn't interact with vfio but uses the same ioctls. Portions of the functionality covered previously by vfio in Linux have moved to this new iommufd API (and those parts of vfio are now just a thin wrapper around iommufd) and I wanted the new fancy features in iommufd to be available in vfio-user. |
Makes sense, I think using |
tmakatos
left a comment
There was a problem hiding this comment.
@benlwalker this is a very superficial, partial review, I haven't had a chance to read about iommufd just yet.
| /* | ||
| * vfio-user protocol commands | ||
| * | ||
| * Commands 1-18 are part of the base protocol. Commands 19+ implement |
There was a problem hiding this comment.
Migration read/write isn't supported in practise if the device isn't migrateable, so not sure we need to state it here.
| */ | ||
| typedef struct iommufd_ioas { | ||
| uint32_t ioas_id; | ||
| TAILQ_HEAD(, iommufd_mapping) mappings; |
There was a problem hiding this comment.
So an IOAS has multiple IOVA mappings? Just checking whether a list is suitable, how many such mappings do we generally expect?
| typedef struct iommufd_controller { | ||
| struct vfu_ctx *vfu_ctx; | ||
| bool enabled; /* iommufd mode negotiated */ | ||
| bool device_bound; /* Device bound to iommufd */ |
There was a problem hiding this comment.
Since there's no /dev/iommufd, what does device_bound mean in practise?
| } iommufd_controller_t; | ||
|
|
||
| /* | ||
| * Create a new iommufd controller. |
There was a problem hiding this comment.
I suppose returns NULL on error and sets errno?
(Here and wherever a pointer is returned?)
| /* | ||
| * Destroy an IOAS and all its mappings. | ||
| */ | ||
| int |
There was a problem hiding this comment.
-1 on error and sets errno? (elsewher as well?)
| return ERROR_INT(ERANGE) - 1; /* Need at least 1 SG entry */ | ||
| } | ||
|
|
||
| /* Use INT_MAX as special region index to mark iommufd mapping */ |
There was a problem hiding this comment.
Isn't it cleaner to use a dedicated field for that?
| for (i = 0; i < cnt; i++, sg++) { | ||
| if (sg->region == INT_MAX) { | ||
| /* This is an iommufd mapping - IOVA equals the server's virtual address */ | ||
| if (vfu_ctx->iommufd == NULL || !iommufd_is_enabled(vfu_ctx->iommufd)) { |
There was a problem hiding this comment.
That should be an assert right?
| iov[i].iov_len = sg->length; | ||
| } else { | ||
| /* Legacy DMA mapping - use existing dma_sgl_get logic */ | ||
| if (sg->region >= vfu_ctx->dma->nregions) { |
| return ERROR_INT(EINVAL); | ||
| } | ||
|
|
||
| dma_memory_region_t *region = &vfu_ctx->dma->regions[sg->region]; |
There was a problem hiding this comment.
Need to check why dma_sgl_get went away.
| return ret; | ||
| } | ||
|
|
||
| /* Return a dummy device ID (not used in vfio-user) */ |
There was a problem hiding this comment.
AFAIK vfio-user doesn't use device IDs at all, check.
| } | ||
|
|
||
| iommufd->device_bound = true; | ||
| iommufd->bound_ioas_id = 0; |
There was a problem hiding this comment.
can we use a define like INVALID_IOAS_ID instead of the zero here?
| /* Use INT_MAX as special region index to mark iommufd mapping */ | ||
| /* In iommufd mode, IOVA equals the server's virtual address, so we can use it directly */ | ||
| sgl[0].dma_addr = dma_addr; | ||
| sgl[0].region = INT_MAX; |
There was a problem hiding this comment.
a more specific define than INT_MAX for marking these please.
There was a problem hiding this comment.
this will change once we merge the btree...
| } | ||
|
|
||
| int | ||
| iommufd_bind_device(iommufd_controller_t *iommufd) |
There was a problem hiding this comment.
I'm not seeing any advantage to us having a bind primitive in our protocol, so I do think we can drop it.
| return 0; | ||
| } | ||
|
|
||
| int |
There was a problem hiding this comment.
what does attach/detach actually do for vfio-user servers? The only difference I could see is that we check the bound ioas first in iommufd_find_mapping() - but we already have the mapping_hint there.
What am I missing?
| mapping->vaddr = vaddr; | ||
| mapping->iova = (uint64_t)(uintptr_t)(vaddr + offset_in_page); | ||
| mapping->length = length; | ||
| mapping->user_va = user_va; |
There was a problem hiding this comment.
we just store this and don't do anything with it - this is the clients VA right? The docs could clarify this, and also give more detail as to why we even accept it in the message never mind store it?
|
|
||
| /* The IOVA is the server's vaddr (offset-adjusted) */ | ||
| mapping->vaddr = vaddr; | ||
| mapping->iova = (uint64_t)(uintptr_t)(vaddr + offset_in_page); |
There was a problem hiding this comment.
I think the comment/explanation above struct vfu_dma_info needs updating, as it's still stating iova space is physical
| VFIO_USER_MIG_DATA_READ = 17, | ||
| VFIO_USER_MIG_DATA_WRITE = 18, | ||
| /* | ||
| * iommufd protocol extension |
There was a problem hiding this comment.
I don't think we should call this iommufd (but it's fine if we reference it for inspiration). There's really no equivalent thing to "iommufd" at all here, so it's not a useful name for this set of APIs.
I'd suggest instead (depending on disposition of attach/detach, see below) VFIO_USER_IOAS_*
| * the legacy VFIO_USER_DMA_MAP/UNMAP commands, mirroring the Linux | ||
| * kernel's iommufd API. | ||
| */ | ||
| VFIO_USER_IOMMUFD_ALLOC_IOAS = 19, |
There was a problem hiding this comment.
I wouldn't ask it of you, but we will need to update the protocol definition (which lives in qemu now). Let's make sure to file an issue at least.
| VFIO_USER_IOMMUFD_ATTACH_PT = 25, | ||
| VFIO_USER_IOMMUFD_DETACH_PT = 26, | ||
| VFIO_USER_MAX, | ||
| }; |
There was a problem hiding this comment.
any chance I could talk you into updating https://github.com/nutanix/libvfio-user/blob/master/docs/memory-mapping.md with a section on these APIs, and at least a brief summary of what an "ioas" is, how it relates to the existing libvfio-user APIs, the IOVA==SVA requirement here?
There was a problem hiding this comment.
Also in particular, what would we actually want multiple IOAS for in vfio-user
| if (vfu_ctx->iommufd != NULL && iommufd_is_enabled(vfu_ctx->iommufd)) { | ||
| /* Validate that the IOVA is registered (for security) */ | ||
| mapping = iommufd_find_mapping(vfu_ctx->iommufd, (uint64_t)(uintptr_t)dma_addr, len); | ||
| if (mapping != NULL) { |
There was a problem hiding this comment.
why would we fall back to the non-iommufd implementation, if we can't find a suitable mapping here?
Also, could you expand more about the specific "security" check you're doing here? (I think it's to prevent an incoming client request from arbitrarily reading for any VA in the server process, but it's not explicit here.)
| $LSPCI | lspci -vv -F /dev/stdin >lspci.out | ||
|
|
||
| for i in 1 2 3 4; do | ||
| for i in 1 2 3 4 5; do |
There was a problem hiding this comment.
do you mind opening a separate PR for your lspci commit, so we can get it merged?
| @@ -197,7 +197,8 @@ recv_version(int sock, int *server_max_fds, size_t *server_max_data_xfer_size, | |||
| } | |||
|
|
|||
| ret = tran_parse_version_json(json_str, server_max_fds, | |||
There was a problem hiding this comment.
we will definitely want python tests for the new APIs once they're firmed up
| } | ||
| /* Enable iommufd mode if both client and server support it */ | ||
| if (iommufd_supported && vfu_ctx->iommufd != NULL) { | ||
| iommufd_enable(vfu_ctx->iommufd); |
There was a problem hiding this comment.
I think you could instead just have ->iommufd only allocated when iommufd is enabled, and != NULL comparisons to decide if it's supported
| static int | ||
| vfu_reset_ctx(vfu_ctx_t *vfu_ctx, int reason); | ||
|
|
||
| /* iommufd command handlers */ |
There was a problem hiding this comment.
nit, could you just put the handlers before handle_request() and not forward declare here?
| /* Check iommufd mappings first if iommufd is enabled */ | ||
| if (vfu_ctx->iommufd != NULL && iommufd_is_enabled(vfu_ctx->iommufd)) { | ||
| /* Validate that the IOVA is registered (for security) */ | ||
| mapping = iommufd_find_mapping(vfu_ctx->iommufd, (uint64_t)(uintptr_t)dma_addr, len); |
There was a problem hiding this comment.
general nit: lots of long lines, we generally keep to 80 chars
| return dma_sg_is_mappable(vfu_ctx->dma, sg); | ||
| } | ||
|
|
||
| /* iommufd command handler implementations */ |
There was a problem hiding this comment.
will review all these once disposition of bind+attach is resolved
| */ | ||
|
|
||
| /* | ||
| * iommufd support for libvfio-user |
There was a problem hiding this comment.
could do with expanding - and explaining the relationship to dma.c
| struct vfio_user_iommufd_ioas_unmap { | ||
| uint32_t argsz; | ||
| uint32_t flags; | ||
| #define VFIO_USER_IOMMUFD_UNMAP_ALL (1 << 0) |
There was a problem hiding this comment.
this isn't handled - and handle_iommufd_ioas_unmap() doesn't validate the flags at all
| * kernel's iommufd API. | ||
| */ | ||
| VFIO_USER_IOMMUFD_ALLOC_IOAS = 19, | ||
| VFIO_USER_IOMMUFD_DESTROY_IOAS = 20, |
There was a problem hiding this comment.
need relevant command_needs_quiesce() otherwise client code race badly with multi-threaded code using sgls
| return 0; | ||
| } | ||
|
|
||
| /* Determine page size and round mmap parameters */ |
There was a problem hiding this comment.
lacking the fd_get_blocksize() behaviour of the legacy controller?
Linux added a new uapi called IOMMUFD for managing the IOMMU. In recent Linux releases, many of the vfio ioctls are now just legacy wrappers around this new interface. However, the new interface has a couple of important features that I'd like to use with vfio-user. Namely:
I've tried to make this backward compatible by adding feature negotiation. Only if both client and server report support will it be used.