Skip to content

[Feature Request] Injecting GPU memory in CUDA/TensorRT EPs #25385

@pb-dseifert

Description

@pb-dseifert

Describe the feature request

Thank you for a great library! We use ONNX Runtime as part of a larger machine learning workflow, of which multiple other parts run on the GPU as well. In order to maximize GPU utilization, we reuse the scratch space in GPU memory in these different steps. Would it be possible to augment the CUDA/TensorRT EP APIs with void* ptr, size_t size (or std::span<std::byte>) arguments such that the user could provide the GPU memory for the EP to use?

Describe scenario use case

Let's say we have the following pipeline

[A] --> [some ONNX Runtime workflow B] --> [C]

all of these stages run on the GPU. A and C currently use the same GPU memory for their computation, whereas the ONNX Runtime stage B allocates its own GPU memory that it does not share. By reusing GPU memory, we can process more data in a single stage, which improves overall data throughput.

Metadata

Metadata

Assignees

No one assigned

    Labels

    ep:CUDAissues related to the CUDA execution providerep:TensorRTissues related to TensorRT execution providerfeature requestrequest for unsupported feature or enhancement

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions