I built a network using customized layers. It runs fine on a single GPU but crashes when using two GPUs of a server. The codes and error messages are shown below. It seems that one of the tensors was split into the 2 GPUs while the other was not. Was it caused by the customized forward function? How should I solve it? Thanks!
Code:
os.environ['CUDA_VISIBLE_DEVICES'] = '0,1'
class Cov1(layers):
def __init__(self, in_dim=Fdim, out_dim=Fdim, bias=True):
super(Cov1, self).__init__(in_dim, out_dim, bias)
def forward(self,seq,sum_idx):
simcov1 = torch.zeros(seq.shape).cuda()
for i in range(0,self.in_dim):
SeqDist = Vsets(seq[:,i].unsqueeze(1))
simcov1[:, i] = torch.mean(SeqDist * sum_idx, 1)
simcov1 = 1 - simcov1
if self.bias is not None:
mean_dist = simcov1.matmul(self.weight) + self.bias
return (mean_dist)
else:
mean_dist = simcov1.matmul(self.weight)
return (mean_dist)
Error screen: