Hi, guys, I am new of using torch and such error gets raised in a simple loop.
To be concise, the loop aims to compute the gradient of a loss function, L, with respect to a vector, V, and update V till convergence. Here is a simple fake code helps to understand my problem.
import torch
import torch.nn.functional as F
import numpy as np
# functions
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
def argmax(x, axis=-1):
return F.one_hot(torch.argmax(x, dim=axis), list(x.shape)[axis]).float()
def gradient(I,U,X):
del X.grad
Qp = argmax(U + X)
gVp = torch.inner(Qp, U) + torch.inner(Qp, X)
gVp_norm = torch.linalg.norm(gVp)
gVp_norm.backward(torch.ones_like(gVp_norm))
return (torch.inner(I,X) - gVp) * (I - X.grad)
# parameters
beta = 0.98
alpha = 0.03
delta = 0.1
T = 10
kss = ((1 / beta - (1 - delta)) / alpha)**(1 / (alpha - 1))
k = np.linspace(0.5 * kss, 1.8 * kss, T)
k_reshaped = k.reshape(-1, 1)
c = k_reshaped ** alpha + (1 - delta) * k_reshaped - k
# iterations
V = torch.zeros(T, dtype = torch.float32, device = device, requires_grad = True)
VS = V
U = torch.tensor(c, dtype = torch.float32, device = device, requires_grad = True)
I = torch.eye(T, dtype = torch.float32, device = device, requires_grad = True)
for i in range(T):
V_grad = gradient(I[i,:],U[i,:],V)
VS_grad = gradient(I[i,:],U[i,:],VS)
V = V - 0.1 * (V_grad - VS_grad)
I find that the error is most likely caused by the last line of the code V = V - 0.1 * (V_grad - VS_grad)
, maybe it’s because the vector V being reused to compute the gradient multiple times. I tried retain_graph=True
but it did not seem to work.
Thanks in advance.