Hi,
I am profiling my training code to detect the performance bottleneck. I found that the variable.cuda() operation takes much more time than doing the actual gradient descent(74.1% vs. 13.6%).
Is there any specific reason for this?
Thanks
Hi,
I am profiling my training code to detect the performance bottleneck. I found that the variable.cuda() operation takes much more time than doing the actual gradient descent(74.1% vs. 13.6%).
Is there any specific reason for this?
Thanks
Is there anyone knows?
Did you synchronize?
No, I didn’t choose any option about synchronize.
I mean, you should use torch.cuda.synchronize()
to get the “true” time.
oh, got it. Thank you!