Hi, guys, I am new of using torch and such error gets raised in a simple loop.
To be concise, the loop aims to compute the gradient of a loss function, L, with respect to a vector, V, and update V till convergence. Here is a simple fake code helps to understand my problem.
import torch import torch.nn.functional as F import numpy as np # functions device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu") def argmax(x, axis=-1): return F.one_hot(torch.argmax(x, dim=axis), list(x.shape)[axis]).float() def gradient(I,U,X): del X.grad Qp = argmax(U + X) gVp = torch.inner(Qp, U) + torch.inner(Qp, X) gVp_norm = torch.linalg.norm(gVp) gVp_norm.backward(torch.ones_like(gVp_norm)) return (torch.inner(I,X) - gVp) * (I - X.grad) # parameters beta = 0.98 alpha = 0.03 delta = 0.1 T = 10 kss = ((1 / beta - (1 - delta)) / alpha)**(1 / (alpha - 1)) k = np.linspace(0.5 * kss, 1.8 * kss, T) k_reshaped = k.reshape(-1, 1) c = k_reshaped ** alpha + (1 - delta) * k_reshaped - k # iterations V = torch.zeros(T, dtype = torch.float32, device = device, requires_grad = True) VS = V U = torch.tensor(c, dtype = torch.float32, device = device, requires_grad = True) I = torch.eye(T, dtype = torch.float32, device = device, requires_grad = True) for i in range(T): V_grad = gradient(I[i,:],U[i,:],V) VS_grad = gradient(I[i,:],U[i,:],VS) V = V - 0.1 * (V_grad - VS_grad)
I find that the error is most likely caused by the last line of the code
V = V - 0.1 * (V_grad - VS_grad), maybe it’s because the vector V being reused to compute the gradient multiple times. I tried
retain_graph=True but it did not seem to work.
Thanks in advance.