DataParallel on Multi-GPU RunTimeError ('input' doesn't have the same device with 'weight'')

Hi, when I use DataParallel on multi-gpu, I got the following error message recently:

PS: all the codes run normally using single GPU.

RuntimeError: Expected tensor for argument #1 ‘input’ to have the same device as tensor for argument #2 ‘weight’; but device 1 does not equal 0 (while checking arguments for cudnn_convolution)

It seems my data are on cuda1 but network paramters are on cuda0.
I google the error message and want to figure out what happened, but I cannot find a specific solution and still confused, so I post it and really appreciate for your help.

I want to train a custom encoder-decoder model namely EncoderDecoder, and it consist of multiple custom modules, e.g., custom_module1, custom_module2…, each module only have a __init__ and forward function.

and basically I use it like:

device = torch.device('cuda:0') if torch.cuda.is_available() else torch.device('cpu')
...
class Model:
    def __init__(self, config: Configuration, device: torch.device):
        super(Model, self).__init__()    
        ...
        self.encoder_decoder = EncoderDecoder(config, device).to(device)
        self.encoder_decoder = torch.nn.DataParallel(self.encoder_decoder) 
    
    def train_on_batch(self, ...):
        ...
        output = self.encoder_decoder(input)

I am using pytorch:1.3.

And I am not sure why this happened. so I check the devices used both outside and inside the models, in the outside (the class Model), the input data and the self.encoder_decoder are all on cuda:0 (the device). And issue happened inside the model (self.encoder_decoder).

In the custom_module1 of self.encoder_decoder.

I checked the input data are:

audio: torch.Size([5, 1, 257, 376]) cuda:0 torch.float16
audio: torch.Size([5, 1, 257, 376]) cuda:1 torch.float16

which should be correct right? data are both on cuda0 and cuda1.

But it seems conflict with the model weights, which obviously should be stored only on cuda0? I guess this is why this bug happened?

and I checked the device inside the EncoderDecoder(config, device):

device in EncoderDecoder (an DataParallel model’s device): cuda:0
device in EncoderDecoder (an DataParallel model’s device): cuda:0

Is this why this bug happened? but I still don’t know how to handle it? Or, the previous information are normal and correct when using DataParallel? if then, I don’t know why this issue appears?

Thanks for your help!

Here is the raw error message:

Original Traceback (most recent call last):
File “/opt/conda/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py”, line 60, in _worker
output = module(*input, **kwargs)
File “/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py”, line 541, in call
result = self.forward(*input, **kwargs)
File “/opt/conda/lib/python3.6/site-packages/apex/amp/_initialize.py”, line 197, in new_fwd
**applier(kwargs, input_caster))
File “/workspace/encoder-decoder/encoder_decoder.py”, line 29, in forward
decodeded_carrier = self.encoder_module(carrier, msg, carrier_seqlen)
File “/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py”, line 541, in call
result = self.forward(*input, **kwargs)
File “/workspace/encoder-decoder/encoder.py”, line 44, in forward
encoded_audio = self.conv_layers(audio, lengths)
File “/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py”, line 541, in call
result = self.forward(*input, **kwargs)
File “/workspace/encoder-decoder/mask_conv.py”, line 25, in forward
x = module(x)
File “/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py”, line 541, in call
result = self.forward(*input, **kwargs)
File “/workspace/encoder-decoder/glu.py”, line 22, in forward
x1 = self.convx1a(padded_x)
File “/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py”, line 541, in call
result = self.forward(*input, **kwargs)
File “/opt/conda/lib/python3.6/site-packages/torch/nn/modules/conv.py”, line 345, in forward
return self.conv2d_forward(input, self.weight)
File “/opt/conda/lib/python3.6/site-packages/torch/nn/modules/conv.py”, line 342, in conv2d_forward
self.padding, self.dilation, self.groups)
RuntimeError: Expected tensor for argument #1 ‘input’ to have the same device as tensor for argument #2 ‘weight’; but device 1 does not equal 0 (while checking arguments for cudnn_convolution)

The code snippet looks alright (although you don’t need to wrap encoder_decoder in another class, I assume your code is more complicated).
All submodules in encoder_decoder should be scattered to the GPUs during the forward pass.
Did you push any parameter or tensor inside your custom modules to a specific GPU?
This would break the automatic scattering, as you would be forcing this tensor to be e.g. on GPU0.

Thanks for your kindly help!

So, you are meaning I shouldn’t force any tensor to a specific GPU device during the forward function? How about I just create some new tensors in the sub-modules and force them to be in the same device with the normal existing tensors (which should be in correct device by auto-scattering if I didn’t assign any device to them, right? )?

If you need to create tensors in the forward method, your suggestion of using the .device attribute of the input tensor or any parameter will work.

Hey ptrblck,

I found a interesting bug in my code. I have a tensor namely carrier_seqlen , and the size equals with batch-size, e.g., when the batch-size equals 10, the size of carrier_seqlen should be 10 too.

Here is the issue: when I print out the tensor carrier_seqlen both inside and outside my model encoder-decoder, the value on device 0 changes:

carrier_seqlen outside encoder-decoder: tensor([504, 385, 332, 324, 271, 271, 260, 245, 238, 169], device=‘cuda:0’,
dtype=torch.int32) torch.Size([10]) cuda:0 torch.int32

carrier_seqlen in encoder-decoder: tensor([257, 257, 257, 257, 257], device=‘cuda:0’, dtype=torch.int32) torch.Size([5]) cuda:0 torch.int32
carrier_seqlen in encoder-decoder: tensor([271, 260, 245, 238, 169], device=‘cuda:1’, dtype=torch.int32) torch.Size([5]) cuda:1 torch.int32

I am not sure why this happened? I just pass this tensor inside the model and haven’t done any processing, the size is torch.Size([10]), and it should be divided into torch.Size([5]) cause I have two gpus inside the model, right? But why the value changed?

Thanks for your help!

I cannot reproduce this issue locally with a dummy model.
Could you post a code snippet to reproduce this issue?