Hello, I am trying to compute the gradient of the following quadratic expression: x.T@A@x where x is a vector with n samples and m features and A is a matrix m times m, the direct derivative of this expression with respect to x is x@(A + A.T), I am trying to get this derivative using autograd function but I couldn’t do it, I implemented it this way:
n_samples = 5 n_features = 2 x = torch.randint(0, 5, (n_samples, n_features)).to(torch.float32).requires_grad_(True) A = torch.randint(0, 5, (n_features, n_features)).to(torch.float32)#.requires_grad_(True) #Using Autograd loss = (x @ A @ x.T).sum() loss.backward() grad = x.grad grad2 = x @ (A + A.T) print(grad) print(grad2)
However, both results are different, the grad result corresponds to the sum over columns of grad2, for example:
grad = tensor([[96., 78.], [96., 78.], [96., 78.], [96., 78.], [96., 78.]]) grad2 = tensor([[12., 12.], [16., 14.], [20., 18.], [28., 22.], [20., 12.]], grad_fn=<MmBackward0>) grad2.sum(dim=0) = tensor([96., 78.], grad_fn=<SumBackward1>)
I understand that this behavior is maybe due to the fact that I am using the .sum() operator before backward(), but I would like to know if I can get the same result as in the manual derivation. I appreciate any help and further explanation about this. Thanks in advance.