Cuda.synchronize()/ .cuda() is extremely slow

panzf98 · March 31, 2022, 8:45am

Hi
I am using Torch 1.7.1 and Cuda 10.1 in Titan XP.
but when i use .cuda() command,it always takes more than 10 mins.
According to the same problem answered before,i try to use torch.cuda.synchronize() before the .cuda() command，but synchronize still needs more than 10mins.
Is there anyway to accelerate this?
Here’s my code and result, thanks in advance.

import torch
from datetime import datetime
torch.cuda.set_device(2)
t1 = datetime.now()
torch.cuda.synchronize()
print(datetime.now() - t1)
for i in range(10):
    x = torch.randn(10, 10, 10, 10) # similar timings regardless of the tensor size
    t1 = datetime.now()
    x.cuda()
    print(i, datetime.now() - t1)

ptrblck · March 31, 2022, 9:37am

Could you update to the latest release and check if you would still be running into the same issue?

panzf98 · March 31, 2022, 11:21am

Thanks,but I’m not authorized to update gpu driver,thus i can only use cuda 10.1.
I tried to update pytorch to 1.8.1,which is the latest release i can get with cuda 10.1,but the problem remains.
Besides,the performance used to be great using pytorch 1.7.1,so i assume it is not the version problem.

ptrblck · March 31, 2022, 6:55pm

The issue sounds a bit as if you are JIT compiling for your architecture.
Since you are using a Titan Xp, it should have a supported compute capability of 6.1. Do you have any other devices in this machine which might be older?
If so, could you mask the Titan Xp via export CUDA_VISIBLE_DEVICES=ID where ID denotes the GPU id and see if you are hitting the same issue again?

panzf98 · April 3, 2022, 7:18am

Sorry for the late response.Actually I only got another gpu 2080Ti ,but it is even slower than Titan xp.Anyway, thanks for help.