Puzzling output of my code

1114 · May 30, 2017, 10:24am

Here is a piece of my code. For vector a=(1,2), I want to calculate the gradient of exp(-2(a_0-a_1)^2). I implement it in two ways, which are specified by flag. The only difference comes from the way that matrix Bs are generated. In both setting, the matrix Bs are the same, however, the gradients vary. I don’t know why, can anyone help me?

Here is my code:

import torch

def work(flag):
    a = torch.autograd.Variable(torch.Tensor((1,2)), requires_grad=True)
    A = a.unsqueeze(dim=1).expand((2,2))
    if flag: B = a.unsqueeze(dim=0).expand((2,2))
    else: B = A.permute(1, 0)
    print('B:\n', B.cpu().data.numpy())
    loss = torch.sum(torch.exp((A-B)**2))
    loss.backward()
    print('loss:', loss.cpu().data.numpy())
    print('grad:', a.grad.cpu().data.numpy())

work(False)
work(True)

My output is:

B:
 [[ 1.  2.]
 [ 1.  2.]]
loss: [ 7.43656349]
grad: [[ 10.87312698 -10.87312698]]
B:
 [[ 1.  2.]
 [ 1.  2.]]
loss: [ 7.43656349]
grad: [[ 0.  0.]]

albanD · May 30, 2017, 10:52am

Hi,

You need to be careful because it is not because a function has a value of 0 that its gradient is going to be 0.
Why don’t you just do something like this:

    a = torch.autograd.Variable(torch.Tensor((1,2)), requires_grad=True)
    loss = torch.sum(torch.exp(-2*(a[0]-a[1])**2))
    loss.backward()
    print('loss:', loss)
    print('grad:', a.grad)

1114 · May 30, 2017, 1:38pm

Thank you for replying, but I want to know what happens in the code. I think “work(True)” should not give a gradient 0. It’s nearly the same to “work(False)”, isn’t it?

albanD · May 30, 2017, 2:07pm

Ho,

The problem is that expand does not take a tuple as argument, you should write .expand(2,2) and not .expand((2,2)).
I think a check is missing here to ensure that we have a single list/tuple of ints.

1114 · May 30, 2017, 2:14pm

Very interesting! Thanks very much.