Computing gradient of quadratic form using autograd

Xavier_Aramayo_Carra · November 20, 2023, 3:40pm

Hello, I am trying to compute the gradient of the following quadratic expression: x.T@A@x where x is a vector with n samples and m features and A is a matrix m times m, the direct derivative of this expression with respect to x is x@(A + A.T), I am trying to get this derivative using autograd function but I couldn’t do it, I implemented it this way:

n_samples = 5
n_features = 2
x = torch.randint(0, 5, (n_samples, n_features)).to(torch.float32).requires_grad_(True)
A  = torch.randint(0, 5, (n_features, n_features)).to(torch.float32)#.requires_grad_(True)

#Using Autograd
loss = (x @ A @ x.T).sum()
loss.backward()
grad = x.grad
grad2 = x @ (A + A.T)

print(grad)
print(grad2)

However, both results are different, the grad result corresponds to the sum over columns of grad2, for example:

grad = tensor([[96., 78.],
        [96., 78.],
        [96., 78.],
        [96., 78.],
        [96., 78.]])
grad2 = tensor([[12., 12.],
        [16., 14.],
        [20., 18.],
        [28., 22.],
        [20., 12.]], grad_fn=<MmBackward0>)
grad2.sum(dim=0) = tensor([96., 78.], grad_fn=<SumBackward1>)

I understand that this behavior is maybe due to the fact that I am using the .sum() operator before backward(), but I would like to know if I can get the same result as in the manual derivation. I appreciate any help and further explanation about this. Thanks in advance.

KFrank · November 21, 2023, 4:49pm

Hi Xavier!

It’s not entirely clear what you are trying to compute here. (Note that
here you have the transpose in the wrong place, but you have it correct
in your sample code.)

Correcting the transpose, your expression x @ A @ x.T has shape
[n_samples, n_samples], so it’s unclear what you want to differentiate
with respect to x.

This formula gives you a batch (of length n_samples) of gradients (each
of which has length n_features) of scalar quadratic forms using A. So I
speculate that you want to compute x[i] @ A @ x[i] on a batch basis
and then compute the gradients of these scalars with respect to x[i] on
a batch basis.

You can use autograd to do what (I think) you want, but you have to
somehow select out the individual batch elements that you want.

Consider this tweaked version of your example code:

import torch
print (torch.__version__)

_ = torch.manual_seed (1066)

n_samples = 5
n_features = 2
x = torch.randint(0, 5, (n_samples, n_features)).to(torch.float32).requires_grad_(True)
A  = torch.randint(0, 5, (n_features, n_features)).to(torch.float32)#.requires_grad_(True)

#Using Autograd
loss = (x @ A @ x.T).sum()              # this includes  x[i] @ A @ x[j] terms
loss.backward()
grad = x.grad

x.grad = None
loss = (x @ A @ x.T).diagonal().sum()   # keep just the  x[i] @ A @ x[i] terms
loss.backward()
gradB = x.grad

grad2 = x @ (A + A.T)                   # batch of gradients for a batch of scalar quadratic forms

print(grad)
print(gradB)
print(grad2)

for  xi in x:                           # compute batch of gradients with explicit loop
    xi = xi.detach().requires_grad_()
    (xi @ A @ xi).backward()
    print (xi.grad)

Here is its output:

2.1.0
tensor([[68., 74.],
        [68., 74.],
        [68., 74.],
        [68., 74.],
        [68., 74.]])
tensor([[10., 20.],
        [18., 14.],
        [ 6.,  8.],
        [14., 12.],
        [20., 20.]])
tensor([[10., 20.],
        [18., 14.],
        [ 6.,  8.],
        [14., 12.],
        [20., 20.]], grad_fn=<MmBackward0>)
tensor([10., 20.])
tensor([18., 14.])
tensor([6., 8.])
tensor([14., 12.])
tensor([20., 20.])

This shows how you can use a loop to compute the batch of gradients
and also how you can do it using a single, loop-free call to .backward().

Yes, the .sum() is your problem (based on my assumption that you want
to focus on the individual scalar quadratic forms, x[i] @ A @ x[i]). In your
expression, .sum() is also summing over the cross terms x[i] @ A @ x[j].
There are various ways to deal with this, one being to use .diagonal() to
pluck out just the terms you want. The loop version in my example script
verifies that .diagonal().sum() does, in fact, give you what (I think) you
want.

Best.

K. Frank

Xavier_Aramayo_Carra · November 22, 2023, 12:19pm

That is what I was willing to do, thank you so much!