# Trying to backward through the graph a second time in a loop

Hi, guys, I am new of using torch and such error gets raised in a simple loop.

To be concise, the loop aims to compute the gradient of a loss function, L, with respect to a vector, V, and update V till convergence. Here is a simple fake code helps to understand my problem.

``````import torch
import torch.nn.functional as F
import numpy as np

# functions
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

def argmax(x, axis=-1):
return F.one_hot(torch.argmax(x, dim=axis), list(x.shape)[axis]).float()

Qp        =  argmax(U + X)
gVp       =  torch.inner(Qp, U) + torch.inner(Qp, X)
gVp_norm  =  torch.linalg.norm(gVp)
gVp_norm.backward(torch.ones_like(gVp_norm))

return (torch.inner(I,X) - gVp) * (I - X.grad)

# parameters
beta         =  0.98
alpha        =  0.03
delta        =  0.1
T            =  10
kss          =  ((1 / beta - (1 - delta)) / alpha)**(1 / (alpha - 1))
k            =  np.linspace(0.5 * kss, 1.8 * kss, T)
k_reshaped   =  k.reshape(-1, 1)
c            =  k_reshaped ** alpha + (1 - delta) * k_reshaped - k

# iterations
V            =  torch.zeros(T, dtype = torch.float32, device = device, requires_grad = True)
VS           =  V
U            =  torch.tensor(c, dtype = torch.float32, device = device, requires_grad = True)
I            =  torch.eye(T, dtype = torch.float32, device = device, requires_grad = True)

for i in range(T):

``````

I find that the error is most likely caused by the last line of the code `V = V - 0.1 * (V_grad - VS_grad)`, maybe itâ€™s because the vector V being reused to compute the gradient multiple times. I tried `retain_graph=True` but it did not seem to work.

Hi Jazzy!

Yes, this is the cause of your problem.

There are two things going on here:

First, this line of code creates a new tensor (and then sets the name `V` to
refer to it). You almost certainly donâ€™t want this.

Second, `V_grad` depends on (the original) `V` (as well as on `I` and `U`), and
therefore on the computation graph that was built the first time through, but
whose intermediate results have been freed, hence the â€śbackward through
the graph a second timeâ€ť error.

You may solve the first problem by modifying `V` in place, and solve the second
by wrapping the update of `V` in a `with torch.no_grad():` block. Thus:

``````    with torch.no_grad():
``````

As an aside, although I donâ€™t understand what you are trying to do, `VS = V`
is probably a mistake, as `VS` and `V` are simply two names that reference the
same tensor, so, for example, `VS_grad = gradient(I[i,:],U[i,:],VS)`
simply computes the same value as `V_grad`.

Also, at least in your sample code, setting `requires_grad = True` for `U` and
`I` doesnâ€™t do anything (except add some overhead) because you never use
`U.grad` nor `I.grad`.

Best.

K. Frank

1 Like

Thank you, sir! you are my lifesaver and thanks again for your insightful and thorough answer!

I am sorry I did not mention that I also try to perform stochastic variation reduction gradient in this loop, where `VS` is occasionally selected from `V` to compute the average gradient and regular gradient to reduce the stochastic variations across iterations. For the sake of simplicity, I did not put all these details into the fake code.