(The third column means the time consumption(unit: μs))
The three images are from three slightly different code snippets. Anyway, there will always be a line that takes up 60ms.
I would like to know why this happens and how to shorten the time?
All “slow” lines contain a
cpu() call, which will create a synchronization if your script runs on the GPU.
To properly time CUDA code, you should synchronize before starting and stopping the timer (if you are manually profiling).
t0 = time.time()
t1 = time.time()
You could also use the profiler to measure the execution of your code.
The second picture doesn’t contain cpu().data, it is max_ids=ids.max(), ids is torch.cuda.Tensor.
I added ‘torch.cuda.synchronize()’, and this line takes up 60ms. Is there a way to remove synchronization time?