Tensors are on different GPUS

For:
.cuda(device_id=gpus[0])

what do you mean by gpus? obviously it throws an undefined error.

gpus is a list of gpu ids you want to use, i.e. gpus=[1, 3].

I am also wondering if this issue is related to the problem I just posted here : [ Solved] nn.DataParallel with ModuleList of custom modules fails on Multiple GPUs

yes in my case, tensors were placed on gpu but net was not placed on cuda so i got this error.

Hi, i also encounter this error, did you find any solution? thanks~

1 Like

Hi, can you please help me with how to use the nn.ModuleList in order to get around the following error “RuntimeError: tensors are on different GPUs”

Hi @smth, I am facing this issue, I have a custom layer, it removes the standardization of the output, basically performs x*std+mean (opposite of standardization). So the tensors std and mean, are class variables for this layer, an error is thrown when the input passes to this layer - ‘RuntimeError: binary_op(): expected both inputs to be on same device, but input a is on cuda:1 and input b is on cuda:0’.
Do I need to copy both tensors to gpu:0 or is there another way?
Thanks.

did you register mean and std as buffers using .register_buffer? that will help move them to GPU-x when wrapped in a DataParallel, otherwise PyTorch wouldn’t know that they have to be moved.

Reference: https://pytorch.org/docs/stable/nn.html#torch.nn.Module.register_buffer

1 Like

I missed this. So this is the correct way to have variables in custom layers. Thanks @smth, it works.