A question about time consumption

Ran · August 5, 2019, 8:28am

A

B

C

(The third column means the time consumption(unit: μs))
The three images are from three slightly different code snippets. Anyway, there will always be a line that takes up 60ms.
I would like to know why this happens and how to shorten the time?

ptrblck · August 5, 2019, 9:43am

All “slow” lines contain a cpu() call, which will create a synchronization if your script runs on the GPU.
To properly time CUDA code, you should synchronize before starting and stopping the timer (if you are manually profiling).

torch.cuda.synchronize()
t0 = time.time()
...
torch.cuda.synchronize()
t1 = time.time()

You could also use the profiler to measure the execution of your code.

Ran · August 5, 2019, 10:07am

The second picture doesn’t contain cpu().data, it is max_ids=ids.max(), ids is torch.cuda.Tensor.

Ran · August 5, 2019, 11:34am

I added ‘torch.cuda.synchronize()’, and this line takes up 60ms. Is there a way to remove synchronization time?