In my code, I want to keep the latest input feature map to the layer and subtract it with the input feature map and then update the latest value.
The code is working on a single GPU, but I get the following error when running on multiple GPUs. Any thoughts?
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0!
If you are using DataParallel or DistributedDataParallel, remove the cuda() call in self.latest_input.
These wrappers will push the model replica to the corresponding device for you, while you are explicitly creating this tensor on cuda:0, which could yield the device mismatch error.
PS: Variables are deprecated since PyTorch 0.4 so you can use tensors directly in newer versions.
Thanks for your help. I’m using DataParallel. I removed “.cuda()” in “self.latest_input” statement, but It seems that “input.clone()” copy the tensor on CPU as I get the following error.
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!
Sorry for forgetting the second part of the right approach.
The new error is not raised by input.clone(), but should still be raised by self.latest_input, which is now on the CPU as it wasn’t properly registered.
To make sure model.to() pushes all internal parameters and buffers to the desired device you have to register self.latest_input as an nn.Parameter, if it should be trained, or as a buffer if it shouldn’t: