Skip to content

Memory coherent issue from the vfio passthroughed device in TDX #395

@CGCSpring

Description

@CGCSpring

We observed a memory coherent issue on the vfio passthroughed device in TDX with host/guest configuration from this repo. So please share some suggestions on this.

On our device, there is a IH(Interrupt Handler) hardware ring where our device will write interrupt entry information into. The ring is allocated via dma_alloc_coherent in kernel, and returned dma address is programmed to IH ring registers for its access.

  1. In the first round driver attach in TDX, the IH ring working properly. When an hardware interrupt arrives, via the irq handler, we are able to read back the correct interrupt entry data in the IH buffer allocated by dma_alloc_coherent, and handle it. Also, at the same time, we have a mirrored IH data from the device register, which tells us IH writing to IH buffer is working well.
  2. Next we detach the device, and the IH buffer is released.
  3. Attach our device again and with same dma_alloc_coherent calling for IH buffer, the issue occurs. Once hardware interrupt arrives, we observe the data read back from the IH buffer is always 0, while IH data from the device register is not 0. So it looks the data read back is from CPU cache instead of the memory.
  4. So I guess why IH buffer works in the first round is when CPU reads, it always cache miss, so instead, read from the memory, and then it gets the correct data. And when guest releases IH buffer, host TEE does not invalidate the cache possibly. So issue occurs from the second driver attach/detach cycle.
  5. When creating a non-CC guest VM with the same device, no such issue happens in cycles of driver attach/detach.

Also, I tried below methods as below, but the issue persists:

  1. Instead of dma_alloc_coherent, I tried a combination of dma_alloc_pages and vmap to get an uncached cpu mapping, but no luck to fix this issue.
  2. Flush the memory via clflush_cache_range before IH hardware accesses.

Now I am blocked. As moreover, when issue happens, there is no outstanding log/hint in host platform. Please share some thoughts how to move on.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions