I was wondering how I could enforce synchronization for all cuda operations when profiling on GPU (in order to find the operations/function calls that are slow). Thanks!
I think:
torch.cuda.synchronize()
1 Like