RuntimeError: Expected tensor for argument #1 'input' to have the same device as tensor for argument #2 'weight'; but device 0 does not equal 2 (while checking arguments for cudnn_convolution)

Hi, all
Why I should use same gpu number when I trained the model?
If not, this error occurs.
RuntimeError: Expected tensor for argument #1 'input' to have the same device as tensor for argument #2 'weight'; but device 0 does not equal 2 (while checking arguments for cudnn_convolution)

The input data and parameters of a module need to be located on the same GPU, since the calculation is performed on this device.
In your current code it seems your input is on GPU0, while the weight parameter is on GPU2.
Make sure to push the input to the same device where the model is located.


Just want to add something here:
I ran into the same problem when I resumed a training.

“When you call torch.load() on a file which contains GPU tensors, those tensors will be loaded to GPU by default.”

In my case, I run trainings on different GPUs on a multi GPU-system. So from time to time my GPU changes.

So, to avoid this loading issues:

torch.load(checkpoint_file, map_location=‘cpu’)

Later you can push everything on the right device.


I have the same error with DistributedDataParallel. I am sending the model to DDP as follows

models = DDP(models.cuda(), device_ids=[rank],
                            broadcast_buffers=False, find_unused_parameters=True)

and I am sending the inputs to the same rank as well. This error doesn’t occur while using DataParallel though

Could you post the code snippet which shows how you are transferring the input tensors to the right device?
I guess you might transfer it to the default device instead of the currently used one.

Sending the model to the rank instead of cuda solved it for me. Thanks

I was suffering the same issue. But it seems not to be caused by GPU0 problem. Because if you use CUDA_VISIBLE_DEVICES=1,3,4 to specify the used GPUS, they are reindexed in the pytorch process as range(#available_GPUS) (same as in tensorflow).
After many tries, I find the problem in the definition of a model and solve as fellows. I originally defined my model as :

self.features, self.layer4 = load_resnet()
            self.roi_fmap = self._roi_map

def forward(self, x):
   return self.roi_fmap(x)
# for convenient, I wrap the operation here
def _roi_map(self, features):
        return self.layer4(features).mean(3).mean(2) 

Then I solve the problem by modifying

self.features, self.layer4 = load_resnet()
            self.roi_fmap = self.layer4

def forward(self, x):
  out = self.roi_fmap(x)
   return out.mean(3).mean(2)

I guest that when I use a function _roi_map() to wrap a module self.layer4(features).mean(3).mean(2), pytorch seems not to assige this part to different gpus by using replicas = nn.parallel.replicate(model, devices=list(range(num_gpus))).
Any layer/module/model should be exposed in the def init() part of a nn.Module class. pytorch seems only search relevant layer/module/model here and not to dive into any functions within or outside the class definition.

I am not sure whether my analysis correct. If you know more please correct me. Thanks a lot.

Hi, @ptrblck, I use the code below:

device = torch.device("cuda", gpu_ids[0]) if args.gpu_ids is not None else torch.device("cuda")
x = = device, dtype = torch.float32)
net = torch.nn.DataParallel(net, device_ids = gpu_ids, output_device = device).to(device=device)

I got the same error. Could you help?
If I comment out the DataParallel part, the code works fine

Could you post your model definition and the complete stack trace?
I guess you might create tensors in a specific device inside your model or are using tensors instead of buffers of parameters.

I use a pretraned network from github.

net = WSDAN(num_classes=2, M=8, net="efficientnet-b3", pretrained=False)
ckpt = torch.load("./pretrained/abcd.pth", map_location="cpu")