Why does ".backward(retain_graph=True)" gives different values each time is called?

pablorf · September 17, 2020, 4:25pm

Starting with a simple example from here.

from torch import tensor,empty,zeros
x = tensor([1., 2.], requires_grad=True)
y = empty(3)
y[0] = 3*x[0]**2
y[1] = x[0]**2 + 2*x[1]**3
y[2] = 10*x[1]

This is a 2 input, 3 outputs model. I’m interested in getting the full Jacobian matrix. To do that, I was thinking:

J = zeros((y.shape[0],x.shape[0]))
for i in range(y.shape[0]):
	xU = zeros(y.shape[0])
	xU[i] = 1.0
	y.backward(xU,retain_graph=True)
	J[i,:] = x.grad

However, the output of this is:

tensor([[ 6.,  0.],
        [ 8., 24.],
        [ 8., 34.]])

while it should be:

tensor([[ 6.,  0.],
        [ 2., 24.],
        [ 0, 10.]])

Trying to debug what’s going on, I found out the following:

y[0].backward(retain_graph=True); print(x.grad)
y[0].backward(retain_graph=True); print(x.grad)

gives:

tensor([6., 0.])
tensor([12.,  0.])

Can someone explain what is going on under the hood with the call to “.backward(retain_graph=True)”? Why does it give different values when called twice?

I think that’s the key to why my calculation of the Jacobian matrix is not correct.
Thanks

ttoosi · September 17, 2020, 5:50pm

Hi,

Whenever you call backward, it accumulates gradients on parameters. That’s why you call optimizer.zero_grad() before calling loss.backward(). Here, it’s the same thing happening. By the second call to backward you add more gradients on top of the previously added gradients by the first call.

albanD · September 17, 2020, 7:14pm

Another way to solve this is to use autograd.grad() that returns the gradient instead of accumulating them.

slobodaapl · November 6, 2022, 2:07pm

How would one use said returned gradient from autograd.grad() for the backward function?

albanD · November 7, 2022, 3:35pm

I’m not sure to understand your question? This function returns the gradients instead of populating the .grad field on Tensors as is done by .backward().

slobodaapl · November 10, 2022, 6:04pm

Ah I understand now, I was misreading and thinking incorrectly about your explanation, that is my bad. It is clear now.