For:
.cuda(device_id=gpus[0])
what do you mean by gpus? obviously it throws an undefined error.
For:
.cuda(device_id=gpus[0])
what do you mean by gpus? obviously it throws an undefined error.
gpus
is a list of gpu ids you want to use, i.e. gpus=[1, 3]
.
I am also wondering if this issue is related to the problem I just posted here : [ Solved] nn.DataParallel with ModuleList of custom modules fails on Multiple GPUs
yes in my case, tensors were placed on gpu but net was not placed on cuda so i got this error.
Hi, i also encounter this error, did you find any solution? thanks~
Hi, can you please help me with how to use the nn.ModuleList in order to get around the following error âRuntimeError: tensors are on different GPUsâ
Hi @smth, I am facing this issue, I have a custom layer, it removes the standardization of the output, basically performs x*std+mean (opposite of standardization). So the tensors std and mean, are class variables for this layer, an error is thrown when the input passes to this layer - âRuntimeError: binary_op(): expected both inputs to be on same device, but input a is on cuda:1 and input b is on cuda:0â.
Do I need to copy both tensors to gpu:0 or is there another way?
Thanks.
did you register mean
and std
as buffers using .register_buffer
? that will help move them to GPU-x when wrapped in a DataParallel, otherwise PyTorch wouldnât know that they have to be moved.
Reference: https://pytorch.org/docs/stable/nn.html#torch.nn.Module.register_buffer
I missed this. So this is the correct way to have variables in custom layers. Thanks @smth, it works.