Tensors are on different GPUS

tstandley · October 31, 2017, 7:13am

For:
.cuda(device_id=gpus[0])

what do you mean by gpus? obviously it throws an undefined error.

nicklhy · October 31, 2017, 9:17am

gpus is a list of gpu ids you want to use, i.e. gpus=[1, 3].

danish · November 26, 2017, 4:03pm

I am also wondering if this issue is related to the problem I just posted here : [ Solved] nn.DataParallel with ModuleList of custom modules fails on Multiple GPUs

Sajid_Iqbal · December 15, 2017, 9:22pm

yes in my case, tensors were placed on gpu but net was not placed on cuda so i got this error.

visonpon · January 2, 2018, 7:14am

Hi, i also encounter this error, did you find any solution? thanks~

rahul · April 4, 2018, 4:45am

Hi, can you please help me with how to use the nn.ModuleList in order to get around the following error “RuntimeError: tensors are on different GPUs”

bhavikngala · November 15, 2018, 5:26am

Hi @smth, I am facing this issue, I have a custom layer, it removes the standardization of the output, basically performs x*std+mean (opposite of standardization). So the tensors std and mean, are class variables for this layer, an error is thrown when the input passes to this layer - ‘RuntimeError: binary_op(): expected both inputs to be on same device, but input a is on cuda:1 and input b is on cuda:0’.
Do I need to copy both tensors to gpu:0 or is there another way?
Thanks.

smth · November 15, 2018, 5:42am

did you register mean and std as buffers using .register_buffer? that will help move them to GPU-x when wrapped in a DataParallel, otherwise PyTorch wouldn’t know that they have to be moved.

Reference: https://pytorch.org/docs/stable/nn.html#torch.nn.Module.register_buffer

bhavikngala · November 15, 2018, 5:50am

I missed this. So this is the correct way to have variables in custom layers. Thanks @smth, it works.