Hi all,
Firstly I followed tutorial for multi-gpu. Optional: Data Parallelism — PyTorch Tutorials 2.1.1+cu121 documentation
It works for me. Unfortunately I tried to implement my own dataset and my own network architecture for multi-gpu training
Single GPU - works fine (i think)
Multi gpu for pytorch 0.4.0 I had an error:
RuntimeError: Expected tensor for argument #1 ‘input’ to have the same device as tensor for argument #2 ‘weight’; but device 1 does not equal 0 (while checking arguments for cudnn_convolution)
For pytorch 0.4.1 I have now an error
TypeError: ‘float’ object cannot be interpreted as an integer
My nn module subclass forward method. Simple dilated convolution, batchnorm and relu.
def forward(self, x):
out = self.dil_conv(x)
out = self.bn1(out)
out = self.relu(out)
Error in this function. I use torch.nn.DataParallel(net)
def forward(self, input):
return F.conv1d(input, self.weight, self.bias, self.stride,
self.padding, self.dilation, self.groups)
Error for conv1d
TypeError: ‘float’ object cannot be interpreted as an integer