I’ve been trying to record batch processing time, but stumbled upon asynchronous execution of torch operations. I know that I can sync with forward pass by calling loss.item() for example, but that looks ugly
Is there any analog of torch.cuda.syncronize() in C++ API?