How to use multiple GPUs in customized network layers?

I built a network using customized layers. It runs fine on a single GPU but crashes when using two GPUs of a server. The codes and error messages are shown below. It seems that one of the tensors was split into the 2 GPUs while the other was not. Was it caused by the customized forward function? How should I solve it? Thanks!

Code:

os.environ['CUDA_VISIBLE_DEVICES'] = '0,1'

class Cov1(layers):

    def __init__(self, in_dim=Fdim, out_dim=Fdim, bias=True):
        super(Cov1, self).__init__(in_dim, out_dim, bias)

    def forward(self,seq,sum_idx):
        simcov1 = torch.zeros(seq.shape).cuda()

        for i in range(0,self.in_dim):
            SeqDist = Vsets(seq[:,i].unsqueeze(1))
            simcov1[:, i] = torch.mean(SeqDist * sum_idx, 1)

        simcov1 = 1 - simcov1
        if self.bias is not None:
            mean_dist = simcov1.matmul(self.weight) + self.bias
            return (mean_dist)
        else:
            mean_dist = simcov1.matmul(self.weight)
            return (mean_dist)

Error screen:

If you are using DataParallel the assumption is all input tensors have the same dimension along the first (batch) dimension. Otherwise the splitting behavior becomes tricky to reason about.

What are the input shapes (and the meaning of the dimensions) being passed and is DataParallel being used?

Thanks eqy. Yes, DataParallel is used for the model: model = nn.DataParallel(model).cuda(). And the dimension of the first tensor SeqDist is 446x446 and the second tensor sum_idx is 446. The multiplication SeqDist * sum_idx is to select the rows specified in sum_idx and calculate each row’s average.
If the first dimension of the tensors changes, I don’t know how to make the multiplication work…

In this case, can you simply make this data parallel by doing something like making seqdist (N, 446, 446) and sum_idx (N, 446)?

It works~ Thanks a lot~ I revised my coding to meet the splitting mechanism. :v: