Hello, I am trying to compute the gradient of the following quadratic expression: x.T@A@x where x is a vector with n samples and m features and A is a matrix m times m, the direct derivative of this expression with respect to x is x@(A + A.T), I am trying to get this derivative using autograd function but I couldn’t do it, I implemented it this way:
n_samples = 5
n_features = 2
x = torch.randint(0, 5, (n_samples, n_features)).to(torch.float32).requires_grad_(True)
A = torch.randint(0, 5, (n_features, n_features)).to(torch.float32)#.requires_grad_(True)
#Using Autograd
loss = (x @ A @ x.T).sum()
loss.backward()
grad = x.grad
grad2 = x @ (A + A.T)
print(grad)
print(grad2)
However, both results are different, the grad result corresponds to the sum over columns of grad2, for example:
grad = tensor([[96., 78.],
[96., 78.],
[96., 78.],
[96., 78.],
[96., 78.]])
grad2 = tensor([[12., 12.],
[16., 14.],
[20., 18.],
[28., 22.],
[20., 12.]], grad_fn=<MmBackward0>)
grad2.sum(dim=0) = tensor([96., 78.], grad_fn=<SumBackward1>)
I understand that this behavior is maybe due to the fact that I am using the .sum() operator before backward(), but I would like to know if I can get the same result as in the manual derivation. I appreciate any help and further explanation about this. Thanks in advance.