I tried to use it as a drop-in replacement, so the same setup as in the first post:
device = torch.device('cuda:4')
model = nn.DataParallel(model, device_ids=[4,5])
and as you suggested I removed the to(device)
calls from inside the forward
function – but I run into this error:
RuntimeError: module must have its parameters and buffers on device cuda:4 (device_ids[0]) but found one of them on device: cpu
So, after some research, it seems, you have to call to(device)
on the model and input before passing it to forward
, basically moving it to the primary GPU from where it’ll be scattered to other ones? That’s how I understand it right now. So after that, and moving the input tensors to the same device, it stops returning that error message.
model = nn.DataParallel(model, device_ids=[4,5]).to(device)
model([[x.to(device) for x in y] for y in inputs], lengths.to(device), g=False)
… and allows me to reach this point in my forward
function, which previously had to(device)
and now doesn’t, where I dynamically create a padding mask:
mask = torch.arange(max_len).expand(len(lengths), max_len) < lengths.unsqueeze(1)
And returns this error:
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:4 and cpu!
So I’m not sure, if I’m now totally on the wrong track but the last error at least suggest that I need some kind of to(device)
inside forward
, but of course it should be the corresponding device and not a fixed one.