A question about time consumption

A


B

C

(The third column means the time consumption(unit: μs))
The three images are from three slightly different code snippets. Anyway, there will always be a line that takes up 60ms.
I would like to know why this happens and how to shorten the time?

All “slow” lines contain a cpu() call, which will create a synchronization if your script runs on the GPU.
To properly time CUDA code, you should synchronize before starting and stopping the timer (if you are manually profiling).

torch.cuda.synchronize()
t0 = time.time()
...
torch.cuda.synchronize()
t1 = time.time()

You could also use the profiler to measure the execution of your code.

The second picture doesn’t contain cpu().data, it is max_ids=ids.max(), ids is torch.cuda.Tensor.

I added ‘torch.cuda.synchronize()’, and this line takes up 60ms. Is there a way to remove synchronization time?