Multiple GPU error _ not on the same device

fsh · September 24, 2020, 9:47am

Hi,

I am running the code below on multiple GPU mode. When I take the x,y variables either to cuda() or not, I can not get the whole network running on the same device. Depending on the device, I get this error.

RuntimeError: expected device cuda:1 but got device  cuda:0
RuntimeError: expected device cuda:0 but got device cpu

# x = torch.linspace(0, args.max-1, args.max).cuda()
# y = torch.linspace(0, args.max-1, 4*args.max).cuda()

x = torch.from_numpy(np.linspace(0, args.max-1, args.max))
y = torch.from_numpy(np.linspace(0, args.max-1, 4*args.max))

model = network(args.max, x, y)
model = nn.DataParallel(model)
model.cuda()

I have tried taking x, y into the network, both in cuda and cpu mode, but the same error occurs again.

I would appreciate your help.

ptrblck · September 25, 2020, 8:46am

Are you using any cuda() or to() calls inside your model (__init__ or forward)?
If so, could you remove them as they might create the device mismatch.
nn.DataParallel will automatically create model replicas for you and you don’t need to push internal model parameters to a specific device manually (only the nn.DataParallel model to the default device).

fsh · September 25, 2020, 9:01am

Yes, I was using them in the __init__. Now, I tried taking them into the forward in the model, and the problem seems to be solved.

Thanks!