CPU vs GPU timing of cuda operations

Germanunkol · May 31, 2018, 12:46pm

It seem the high CPU times were due to the asynchronous execution of the CPU and GPU instructions (see also this reply).

To get better values, I ran:

CUDA_LAUNCH_BLOCKING=1 python3 profileNetwork.py

Now the CPU times for the functions reported above are almost the same as their GPU times.

If I understand correctly, the CUDA_LAUNCH_BLOCKING flag ensures that when a CPU instruction is waiting for a result from the GPU, the waiting time is no longer accumulated into the reported CPU time.