Hi, I am new to Pytorch (transferred from Torch), and I am really confused about this example. Basically a module produces different gradient for the same input. Any idea why is it happening? Thanks a lot in advance.
modL = torch.nn.Linear(2,1).cuda()
a = torch.rand(3,2).cuda()
x = Variable(a)
y = modL(x)
z = y.sum()
z.backward()
modL.weight.grad
Variable containing:
3.2878 7.0024
[torch.cuda.FloatTensor of size 1x2 (GPU 0)]
# repeat without updating the weight
x = Variable(a)
y = modL(x)
z = y.sum()
z.backward()
modL.weight.grad
Variable containing:
4.5920 8.7185
[torch.cuda.FloatTensor of size 1x2 (GPU 0)]