Multi gpu errors on Conv1D - float object cannot be interpreted as an integer

Rafal_Pilarczyk · July 31, 2018, 4:59pm

Hi all,

Firstly I followed tutorial for multi-gpu. Optional: Data Parallelism — PyTorch Tutorials 2.1.1+cu121 documentation

It works for me. Unfortunately I tried to implement my own dataset and my own network architecture for multi-gpu training

Single GPU - works fine (i think)
Multi gpu for pytorch 0.4.0 I had an error:

RuntimeError: Expected tensor for argument #1 ‘input’ to have the same device as tensor for argument #2 ‘weight’; but device 1 does not equal 0 (while checking arguments for cudnn_convolution)

For pytorch 0.4.1 I have now an error

TypeError: ‘float’ object cannot be interpreted as an integer

My nn module subclass forward method. Simple dilated convolution, batchnorm and relu.

def forward(self, x):
out = self.dil_conv(x)
out = self.bn1(out)
out = self.relu(out)

Error in this function. I use torch.nn.DataParallel(net)

def forward(self, input):
return F.conv1d(input, self.weight, self.bias, self.stride,
self.padding, self.dilation, self.groups)

Error for conv1d

TypeError: ‘float’ object cannot be interpreted as an integer

Rafal_Pilarczyk · July 31, 2018, 7:53pm

Ok I think that previously by mistake I add padding as floating number, but it was casted to int so that’s why everything looked fine and now it has changed after upgrade. Probably during conv1d init there should be type validators or am I wrong?

Now i have the same error. Shold I use inputs.cuda() or rather inputs.to(device) for Multi-gpu? In the example I can see inputs.to(device), but this approach gives me error below. I tried to find this kind of error, but solutions were not clear or didn’t work for me.

RuntimeError: Expected tensor for argument #1 ‘input’ to have the same device as tensor for argument #2 ‘weight’; but device 1 does not equal 0 (while checking arguments for cudnn_convolution)