Hi there, I’m trying to run my code across multiple GPU’s and am getting the following error:
RuntimeError: Expected tensor for argument #1 'input' to have the same device as tensor for argument #2 'weight'; but device 1 does not equal 0 (while checking arguments for cudnn_convolution)
I’ve seen a few posts around here and on https://github.com/pytorch/pytorch/, but nothing seems to be of use for me. I’m using a pre-trained model from https://github.com/osmr/imgclsmob, and have modified the forward function to return the activations as well as output. Here’s a simplified version of my code:
from pytorchcv.model_provider import get_model as ptcv_get_model
import torch
import types
net = ptcv_get_model("densenet40_k12_cifar10", root = 'loc', pretrained=True)
def my_forward(self, x):
activations = []
for module in self.features._modules.values():
x = module(x) #error happens here
activations.append(x)
x = x.view(x.size(0), -1)
x = self.output(x)
return x, outs
net.forward = types.MethodType(my_forward, net)
if torch.cuda.device_count() > 1:
net = nn.Dataparallel(net, device_ids=[0,1,2,3]
net.to(device)
net.eval()
And my full error message is:
Traceback (most recent call last):
File "main.py", line 471, in <module>
train_student(student, teach)
File "main.py", line 155, in train_student
outputs_teacher, ints_teacher = teach(inputs)
File "/home/s1874193/miniconda3/envs/distill/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in __call__
result = self.forward(*input, **kwargs)
File "/home/s1874193/miniconda3/envs/distill/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 152, in forward
outputs = self.parallel_apply(replicas, inputs, kwargs)
File "/home/s1874193/miniconda3/envs/distill/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 162, in parallel_apply
return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
File "/home/s1874193/miniconda3/envs/distill/lib/python3.7/site-packages/torch/nn/parallel/parallel_apply.py", line 83, in parallel_apply
raise output
File "/home/s1874193/miniconda3/envs/distill/lib/python3.7/site-packages/torch/nn/parallel/parallel_apply.py", line 59, in _worker
output = module(*input, **kwargs)
File "/home/s1874193/miniconda3/envs/distill/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in __call__
result = self.forward(*input, **kwargs)
File "main.py", line 364, in my_forward
x = module_val(x)
File "/home/s1874193/miniconda3/envs/distill/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in __call__
result = self.forward(*input, **kwargs)
File "/home/s1874193/miniconda3/envs/distill/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 338, in forward
self.padding, self.dilation, self.groups)
RuntimeError: Expected tensor for argument #1 'input' to have the same device as tensor for argument #2 'weight'; but device 1 does not equal 0 (while checking arguments for cudnn_convolution)
Any suggestions would be really appreciated. Thanks!