Starting with a simple example from here.
from torch import tensor,empty,zeros
x = tensor([1., 2.], requires_grad=True)
y = empty(3)
y[0] = 3*x[0]**2
y[1] = x[0]**2 + 2*x[1]**3
y[2] = 10*x[1]
This is a 2 input, 3 outputs model. I’m interested in getting the full Jacobian matrix. To do that, I was thinking:
J = zeros((y.shape[0],x.shape[0]))
for i in range(y.shape[0]):
xU = zeros(y.shape[0])
xU[i] = 1.0
y.backward(xU,retain_graph=True)
J[i,:] = x.grad
However, the output of this is:
tensor([[ 6., 0.],
[ 8., 24.],
[ 8., 34.]])
while it should be:
tensor([[ 6., 0.],
[ 2., 24.],
[ 0, 10.]])
Trying to debug what’s going on, I found out the following:
y[0].backward(retain_graph=True); print(x.grad)
y[0].backward(retain_graph=True); print(x.grad)
gives:
tensor([6., 0.])
tensor([12., 0.])
Can someone explain what is going on under the hood with the call to “.backward(retain_graph=True)”? Why does it give different values when called twice?
I think that’s the key to why my calculation of the Jacobian matrix is not correct.
Thanks