Greetings,
I am currently using tf-trt and I want to measure the perfomance of my models (Latency, Throughput).
The tensorrt c++ API has the functionality of cuda synchronize via the cuda events API https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html#cuda-events
On top of that, Pytorch contains the torch.cuda.synchronize() alternative
https://pytorch.org/docs/stable/generated/torch.cuda.synchronize.html
However in the TF TRT docs, I cant find something similar, which in my opinion is essential in order to correctly measure perfomance metrics
Have I missed anything or are there plans to integrate such functionality?
Thank you