Any analog of torch.cuda.syncronize() in C++ API?

I’ve been trying to record batch processing time, but stumbled upon asynchronous execution of torch operations. I know that I can sync with forward pass by calling loss.item() for example, but that looks ugly

Is there any analog of torch.cuda.syncronize() in C++ API?

Hi,

You can call directly the cudaDeviceSynchronize() method as is done by the python code here.

That’s nice, thank you!

Do you know if CPU code is also executed asynchronously?

No all cpu code is synchronous !