DIfference device error when I use multiple GPUs, with creating new cuda tensor

I get that error. But I can’t find any good resources about it.
I wonder if there is good solution for that.

I wrapped my model like this

my_model=nn.DataParallel(my_model,device_ids=DEVICE_IDS).cuda()

And in the middle of training, I create new tensor like this

new_tensor=torch.tensor([1.0]).cuda()

And I try operation with new_tensor and for example prediction image from my_model

after_operation=new_tensor+prediction_image

I always run into error in that case, for example, prediction_image is always in the cuda:0,
and new_tensor is in cuda:7 when I use 8 GPUs.
(It’s also strange. I always see [0,7] GPUs in every try if I remember well.

So, I try to move new_tensor onto GPU:0 explicitly, then, PyTorch uses only GPU:0, resulting full memory error.

And if I try to move prediction_image onto GPU:7, as far as I remember, I ran into illegal memory access.

Please advice on this. I’ll add more relevant codes if needed.

The new_tensor is being created on the default device, while the model and data could be on another device using nn.DataParallel.
If you need to create new tensors inside your forward method, you should push them to the current device your model and data is on, e.g.:

new_tensor = torch.tensor([1.0], device=input.device)
1 Like

Thank you. Argument device=input.device is not what I tried before. So, I hope it’ll work.

For other readers as additional information,
I guess input is for example tensor I got from the model, and should I also use .cuda() to perform operation with cuda tensor and new tensor like this?
new_tensor = torch.tensor([1.0], device=input.device).cuda()

The .cuda() call is not necessary any more, as you’ve already created the tensor on the appropriate device.
If you don’t specify a device_id, cuda() will push the tensor back to the default device, e.g. 'cuda:0'.
I would recommend to use to(device) or the device attribute now instead of cuda() and cpu().

1 Like