Using grid_sample on multiple GPUs

sunshineatnoon · August 14, 2019, 6:11pm

Hi, I have a model with a grid_sample layer, I tried to train my model on multiple GPUs, but got the following error:

RuntimeError: grid_sampler(): expected input and grid to be on same device, but input is on cuda:1 and grid is on cuda:0

Is there anyway to use this layer on multiple GPUs? Thanks

ptrblck · August 14, 2019, 6:44pm

The input and grid should be on the same device.
If you are creating one of these tensors manually in the forward or pass it to the forward method, make sure to transfer it to the same device, e.g. by using:

grid = grid.to(x.device)

sunshineatnoon · August 14, 2019, 7:06pm

Thanks for the reply. I got a segmentation default when moving either of the grid or the input to the same device by input = input.to(grid.get_device()).

sunshineatnoon · August 14, 2019, 7:07pm

My grid is actually the same for all inputs, so I stored it using self.grid = grid and using grid_sample(input, self.grid). Do you think this causes the problem? But I think it’s inefficient to pass the grid every forward.

ptrblck · August 14, 2019, 7:15pm

Try to use grid.device instead.

Might be and you should stick to your work flow, as I was just using it as an example.

You could also try to register self.grid as a buffer using self.register_buffer, which would move the tensor automatically using model.to().

sunshineatnoon · August 14, 2019, 8:38pm

register_buffer solves my problem. The segmentation default actually comes from other parts. It seems when training on multiple GPUs, we cannot call .cuda() during the forward path, so everything should be registered in the buffer.

Thanks so much for your help!

ptrblck · August 14, 2019, 11:26pm

You could call .cuda() or to(), but should specify the right device to push the tensor to.
E.g. if you would like to create some tensors inside the forward method, you could use the device of some buffers/parameters or the incoming tensor to create the new one.

However, if self.grid is treated as an attribute of the model, registering it as a buffer is the cleaner and better approach.

sunshineatnoon · August 15, 2019, 11:07pm

Right, that makes so much sense now. Thanks!