Hi, I’m trying to improve performance and in order to do so I want to measure the accurate running time of different functions calls. Do anybody knows how to synchronize CUDA calls in Libtorch? In python you can do: torch.cuda.synchronize()
Thanks!
This should work:
CUDAStream stream = getCurrentCUDAStream(); AT_CUDA_CHECK(cudaStreamSynchronize(stream));
I’m new to libtorch myself. But isn’t it possible to simply use
#include <torch/cuda.h> torch::cuda::synchronize();
See https://pytorch.org/cppdocs/api/function_namespacetorch_1_1cuda_1a576cf6ec2b223bcabee2f80dfdf8cdc8.html.