A module produces different gradient for same input?

hyojin · September 11, 2017, 9:03pm

Hi, I am new to Pytorch (transferred from Torch), and I am really confused about this example. Basically a module produces different gradient for the same input. Any idea why is it happening? Thanks a lot in advance.

modL = torch.nn.Linear(2,1).cuda()
a = torch.rand(3,2).cuda()

x = Variable(a)
y = modL(x)
z = y.sum()
z.backward()
modL.weight.grad

Variable containing:
 3.2878  7.0024
[torch.cuda.FloatTensor of size 1x2 (GPU 0)]


# repeat without updating the weight

x = Variable(a)
y = modL(x)
z = y.sum()
z.backward()
modL.weight.grad

Variable containing:
 4.5920  8.7185
[torch.cuda.FloatTensor of size 1x2 (GPU 0)]

smth · September 12, 2017, 2:37pm

gradients are accumulated. So in the repeat case, what you are seeing is gradient of first call + gradient of second call.

before the line # repeat without updating the weight, insert this call:

modL.zero_grad()