Here is a piece of my code. For vector a=(1,2), I want to calculate the gradient of exp(-2(a_0-a_1)^2). I implement it in two ways, which are specified by flag. The only difference comes from the way that matrix Bs are generated. In both setting, the matrix Bs are the same, however, the gradients vary. I don’t know why, can anyone help me?

Here is my code:

```
import torch
def work(flag):
a = torch.autograd.Variable(torch.Tensor((1,2)), requires_grad=True)
A = a.unsqueeze(dim=1).expand((2,2))
if flag: B = a.unsqueeze(dim=0).expand((2,2))
else: B = A.permute(1, 0)
print('B:\n', B.cpu().data.numpy())
loss = torch.sum(torch.exp((A-B)**2))
loss.backward()
print('loss:', loss.cpu().data.numpy())
print('grad:', a.grad.cpu().data.numpy())
work(False)
work(True)
```

My output is:

```
B:
[[ 1. 2.]
[ 1. 2.]]
loss: [ 7.43656349]
grad: [[ 10.87312698 -10.87312698]]
B:
[[ 1. 2.]
[ 1. 2.]]
loss: [ 7.43656349]
grad: [[ 0. 0.]]
```