time.perf_counter()
might be a good first approach to time your code.
Note that CUDA calls are asynchronous, so that you would have to synchronize your code before starting and stopping the timer using torch.cuda.synchronize()
.
torch.utils.bottleneck might be also a good utility to profile your code and fine possible bottlenecks.
4 Likes