In order to measure the computation time of the deep learning model when using GPU, we have to keep in mind about GPU operations asynchronicity. Thus, we should use torch.cuda.Event or torch.autograd.profiler in Python.
I found torch::autograd::profiler in libtorch but there is no API documentation to know how to use functions in the library. This looks like C++ version of torch.autograd.profiler which to measure model computation time. However, there are many examples to measure computation time in Python but it is hard to find any example to measure computation time when using GPU in C++.
(This might be because the majority of users except me are familiar to use C++ and C++ library. )
Could someone provide a simple example to measure computation time of Torchscript model considering both CPU time and GPU time in C++?
FYI, in order to correctly measure computation time in Python3, please refer this link: