Why data.cuda() is so slow? And the speed is related to what model you have used

Thanks for sharing the code! CUDA operations are executed asynchronously so you would need to synchronize the code via torch.cuda.synchronize() before starting and stopping the timers.