Puzzling output of my code

Here is a piece of my code. For vector a=(1,2), I want to calculate the gradient of exp(-2(a_0-a_1)^2). I implement it in two ways, which are specified by flag. The only difference comes from the way that matrix Bs are generated. In both setting, the matrix Bs are the same, however, the gradients vary. I don’t know why, can anyone help me?

Here is my code:

import torch

def work(flag):
    a = torch.autograd.Variable(torch.Tensor((1,2)), requires_grad=True)
    A = a.unsqueeze(dim=1).expand((2,2))
    if flag: B = a.unsqueeze(dim=0).expand((2,2))
    else: B = A.permute(1, 0)
    print('B:\n', B.cpu().data.numpy())
    loss = torch.sum(torch.exp((A-B)**2))
    print('loss:', loss.cpu().data.numpy())
    print('grad:', a.grad.cpu().data.numpy())


My output is:

 [[ 1.  2.]
 [ 1.  2.]]
loss: [ 7.43656349]
grad: [[ 10.87312698 -10.87312698]]
 [[ 1.  2.]
 [ 1.  2.]]
loss: [ 7.43656349]
grad: [[ 0.  0.]]


You need to be careful because it is not because a function has a value of 0 that its gradient is going to be 0.
Why don’t you just do something like this:

    a = torch.autograd.Variable(torch.Tensor((1,2)), requires_grad=True)
    loss = torch.sum(torch.exp(-2*(a[0]-a[1])**2))
    print('loss:', loss)
    print('grad:', a.grad)

Thank you for replying, but I want to know what happens in the code. I think “work(True)” should not give a gradient 0. It’s nearly the same to “work(False)”, isn’t it?


The problem is that expand does not take a tuple as argument, you should write .expand(2,2) and not .expand((2,2)).
I think a check is missing here to ensure that we have a single list/tuple of ints.

1 Like

Very interesting! Thanks very much.