Hi, I have a model with a grid_sample layer, I tried to train my model on multiple GPUs, but got the following error:
RuntimeError: grid_sampler(): expected input and grid to be on same device, but input is on cuda:1 and grid is on cuda:0
Is there anyway to use this layer on multiple GPUs? Thanks
The input
and grid
should be on the same device.
If you are creating one of these tensors manually in the forward
or pass it to the forward
method, make sure to transfer it to the same device, e.g. by using:
grid = grid.to(x.device)
Thanks for the reply. I got a segmentation default
when moving either of the grid or the input to the same device by input = input.to(grid.get_device())
.
My grid is actually the same for all inputs, so I stored it using self.grid = grid
and using grid_sample(input, self.grid)
. Do you think this causes the problem? But I think it’s inefficient to pass the grid every forward.
Try to use grid.device
instead.
Might be and you should stick to your work flow, as I was just using it as an example. 
You could also try to register self.grid
as a buffer
using self.register_buffer
, which would move the tensor automatically using model.to()
.
register_buffer
solves my problem. The segmentation default
actually comes from other parts. It seems when training on multiple GPUs, we cannot call .cuda()
during the forward path, so everything should be registered in the buffer.
Thanks so much for your help!
You could call .cuda()
or to()
, but should specify the right device to push the tensor to.
E.g. if you would like to create some tensors inside the forward
method, you could use the device of some buffers/parameters or the incoming tensor to create the new one.
However, if self.grid
is treated as an attribute of the model, registering it as a buffer is the cleaner and better approach. 
Right, that makes so much sense now. Thanks!