Is there a minimum computation time with gpu?

Hi, I’m trying to start pytorch.

but I saw that using gpu(cuda:0) 's performance was not fast as I expected.

So, for a test, I made a very simple script to check it,
and found,
No matter how small the network, there is a minimum computation time for optimizing network…

am I right? and if it is not, could you tell me how can I reduce this computation time?


scipt used for test are here :

I’m using pytorch 1.4 and rtx2060 with cuda.

I tested network with
(H1, H2) = (4, 2), (24, 12), (600, 300), (2000, 1000)

at (4,2)~(600,300), computation time was very similar.

modelTime: 0.17~0.19 ms
lossTime : 0.45~0.47 ms
optiTime : 0.52~0.55 ms

CUDA operations are asynchronous, so you would have to synchronize your code before starting and stopping your timer via torch.cuda.synchronize().

That being said, you can generally ignore the Python overhead and kernel launch times, if your workload is large enough.
However, if you are planning of training tiny models, your GPU utilization might be low, due to data bottlenecks or other issues.

1 Like