[Feature Request] Injecting GPU memory in CUDA/TensorRT EPs

### Describe the feature request

Thank you for a great library! We use ONNX Runtime as part of a larger machine learning workflow, of which multiple other parts run on the GPU as well. In order to maximize GPU utilization, we reuse the scratch space in GPU memory in these different steps. Would it be possible to augment the CUDA/TensorRT EP APIs with `void* ptr, size_t size` (or `std::span<std::byte>`) arguments such that the user could provide the GPU memory for the EP to use?

### Describe scenario use case

Let's say we have the following pipeline
```
[A] --> [some ONNX Runtime workflow B] --> [C]
```
all of these stages run on the GPU. `A` and `C` currently use the same GPU memory for their computation, whereas the ONNX Runtime stage `B` allocates its own GPU memory that it does not share. By reusing GPU memory, we can process more data in a single stage, which improves overall data throughput.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature Request] Injecting GPU memory in CUDA/TensorRT EPs #25385

Describe the feature request

Describe scenario use case

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Feature Request] Injecting GPU memory in CUDA/TensorRT EPs #25385

Description

Describe the feature request

Describe scenario use case

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions