I have a simple network which produce a tuple in forward() call like below:
class NN4(nn.Module):
def __init__(self):
super(NN4, self).__init__()
self.fc1 = nn.Linear(8, 4)
self.fc21 = nn.Linear(4, 1)
def forward(self, x):
x = F.selu(self.fc1(x))
x1 = torch.sigmoid(self.fc21(x))
# return x, x # not None
return x, x1 # None
Then I test this network with the following code (note in the first half NN4 is wrapped with nn.dataparallel, the lower half of the code was not):
DEVICE = torch.device('cuda:0')
def test_NN4():
images = torch.randn(4, 8).to(DEVICE)
fimages = torch.randn(4, 8).to(DEVICE)
D = NN4().to(DEVICE)
D = nn.DataParallel(D)
D.zero_grad()
d_loss = D(images)[0].mean() - D(fimages)[0].mean()
print('d_loss: -->', d_loss)
d_loss.backward()
print('-------->>>')
aaa = list(D.named_parameters())
print(aaa[0][0])
print(aaa[0][1].grad)
D2 = NN4().to(DEVICE)
D2.zero_grad()
d2_loss = D2(images)[0].mean() - D2(fimages)[0].mean()
print('d2_loss: -->', d2_loss)
d2_loss.backward()
print('-------->>>')
aaa2 = list(D2.named_parameters())
print(aaa2[0][0])
print(aaa2[0][1].grad)
I run this code with two GPUs id = [0, 1] (with CUDA_VISIBLE_DEVICES=0,1 python test.py) and the result is
d_loss: --> tensor(0.0098, device='cuda:0', grad_fn=<SubBackward0>)
-------->>>
module.fc1.weight
None
d2_loss: --> tensor(-0.0592, device='cuda:0', grad_fn=<SubBackward0>)
-------->>>
fc1.weight
tensor([[ 0.2356, -0.1217, 0.0502, -0.2524, 0.1167, 0.0295, 0.1135, 0.1423],
[ 0.3054, -0.2515, 0.0074, -0.2933, 0.1163, 0.0952, 0.1906, 0.2290],
[ 0.3524, -0.1401, 0.0276, -0.2763, 0.1148, 0.0307, 0.3021, 0.1994],
[ 0.2883, -0.2090, -0.0485, -0.1937, 0.0650, 0.0781, 0.3529, 0.2433]],
device='cuda:0')
I am expecting the gradient of fc1 under nn.dataparallel to be valid than “None” (just like when not wrapped with nn.dataparallel). The strange thing is if I switch the output of the NN4 forward call to
return x, x
then the result is OK:
d_loss: --> tensor(0.1056, device='cuda:0', grad_fn=<SubBackward0>)
-------->>>
module.fc1.weight
tensor([[ 0.1904, 0.0461, -0.2445, 0.0530, -0.0502, 0.0738, 0.0506, -0.1648],
[ 0.2761, 0.1007, -0.2761, 0.0436, -0.0724, 0.0660, 0.0267, -0.1630],
[ 0.2097, 0.0416, -0.2006, 0.0426, -0.0496, 0.0706, -0.0654, -0.1262],
[ 0.1848, 0.0789, -0.3042, 0.0943, -0.0567, 0.1234, -0.0341, -0.2012]],
device='cuda:0')
d2_loss: --> tensor(0.1202, device='cuda:0', grad_fn=<SubBackward0>)
-------->>>
fc1.weight
tensor([[ 0.1592, 0.0493, -0.2680, 0.0611, -0.0546, 0.1066, 0.0206, -0.1425],
[ 0.2109, 0.0573, -0.2443, 0.0503, -0.0348, 0.0786, 0.0665, -0.2017],
[ 0.2091, 0.0704, -0.3194, 0.0410, -0.0809, 0.1483, -0.0061, -0.1214],
[ 0.2173, 0.0366, -0.2628, 0.0207, -0.0380, 0.1162, 0.0384, -0.1626]],
device='cuda:0')
Can anybody explain this? What is the correct way to return a tuple from nn.Module? I am using PyTorch 1.0.0