Trying to backward through the graph a second time in a loop

Hi, guys, I am new of using torch and such error gets raised in a simple loop.

To be concise, the loop aims to compute the gradient of a loss function, L, with respect to a vector, V, and update V till convergence. Here is a simple fake code helps to understand my problem.

import torch
import torch.nn.functional as F
import numpy as np

# functions
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

def argmax(x, axis=-1):
    return F.one_hot(torch.argmax(x, dim=axis), list(x.shape)[axis]).float()

def gradient(I,U,X):
    del X.grad
    Qp        =  argmax(U + X)
    gVp       =  torch.inner(Qp, U) + torch.inner(Qp, X)
    gVp_norm  =  torch.linalg.norm(gVp)
    gVp_norm.backward(torch.ones_like(gVp_norm))

    return (torch.inner(I,X) - gVp) * (I - X.grad)

# parameters
beta         =  0.98 
alpha        =  0.03
delta        =  0.1
T            =  10
kss          =  ((1 / beta - (1 - delta)) / alpha)**(1 / (alpha - 1))
k            =  np.linspace(0.5 * kss, 1.8 * kss, T)
k_reshaped   =  k.reshape(-1, 1)
c            =  k_reshaped ** alpha + (1 - delta) * k_reshaped - k

# iterations
V            =  torch.zeros(T, dtype = torch.float32, device = device, requires_grad = True)
VS           =  V
U            =  torch.tensor(c, dtype = torch.float32, device = device, requires_grad = True)
I            =  torch.eye(T, dtype = torch.float32, device = device, requires_grad = True)

for i in range(T):
    V_grad   =  gradient(I[i,:],U[i,:],V)
    VS_grad  =  gradient(I[i,:],U[i,:],VS)

    V  =  V - 0.1 * (V_grad - VS_grad)

I find that the error is most likely caused by the last line of the code V = V - 0.1 * (V_grad - VS_grad), maybe it’s because the vector V being reused to compute the gradient multiple times. I tried retain_graph=True but it did not seem to work.

Thanks in advance.

Hi Jazzy!

Yes, this is the cause of your problem.

There are two things going on here:

First, this line of code creates a new tensor (and then sets the name V to
refer to it). You almost certainly don’t want this.

Second, V_grad depends on (the original) V (as well as on I and U), and
therefore on the computation graph that was built the first time through, but
whose intermediate results have been freed, hence the “backward through
the graph a second time” error.

You may solve the first problem by modifying V in place, and solve the second
by wrapping the update of V in a with torch.no_grad(): block. Thus:

    with torch.no_grad():
        V.sub_ (0.1 * (V_grad - VS_grad))

As an aside, although I don’t understand what you are trying to do, VS = V
is probably a mistake, as VS and V are simply two names that reference the
same tensor, so, for example, VS_grad = gradient(I[i,:],U[i,:],VS)
simply computes the same value as V_grad.

Also, at least in your sample code, setting requires_grad = True for U and
I doesn’t do anything (except add some overhead) because you never use
U.grad nor I.grad.

Best.

K. Frank

1 Like

Thank you, sir! you are my lifesaver and thanks again for your insightful and thorough answer!

I am sorry I did not mention that I also try to perform stochastic variation reduction gradient in this loop, where VS is occasionally selected from V to compute the average gradient and regular gradient to reduce the stochastic variations across iterations. For the sake of simplicity, I did not put all these details into the fake code.