Consider
I have Variable x
y = f(x)
z = Q(y) # Q here is a neural net
Step(1): gradient w.r.t. x.
z.backward(retain_graph = True)
x.grad
g = x.grad.clone()
x.grad.data.zero_()
Step(2): have another function that take the gradients we just compute
L(g)
I want to take gradient of it w.r.t. to the weights of neural net Q
as following
var_opt = torch.optim.Adam(Q.parameters(), lr=lr) while not converge: var_opt.zero_grad() variance_loss = torch.mean(L(g)) variance_loss.backward() var_opt.step()
The thing I’m worry is that first time backward() I have z.backward(retain_graph = True). If I don’t set this to be True; my second time backward() gives runtime error; however, I feel this is wrong, because first backward is w.r.t. x; the second time backward, I’m going for Q’s parameters.
Is this code correct written? I’m currently having bugs and I feel this might be reasons.
The overall code is pretty much like this
var_opt = torch.optim.Adam(Q.parameters(), lr=lr) while not converge: z.backward(retain_graph = True) g = x.grad.clone() x.grad.data.zero_() # do some other things here with g here var_opt.zero_grad() variance_loss = torch.mean(L(g)) variance_loss.backward() var_opt.step()