Consider

I have Variable x

y = f(x)

z = Q(y) # Q here is a neural net

Step(1): gradient w.r.t. x.

z.backward(retain_graph = True)

x.grad

g = x.grad.clone()

x.grad.data.zero_()

Step(2): have another function that take the gradients we just compute

L(g)

I want to take gradient of it w.r.t. to the weights of neural net Q

as following

`var_opt = torch.optim.Adam(Q.parameters(), lr=lr) while not converge: var_opt.zero_grad() variance_loss = torch.mean(L(g)) variance_loss.backward() var_opt.step()`

The thing I’m worry is that first time backward() I have z.backward(retain_graph = True). If I don’t set this to be True; my second time backward() gives runtime error; however, I feel this is wrong, because first backward is w.r.t. x; the second time backward, I’m going for Q’s parameters.

Is this code correct written? I’m currently having bugs and I feel this might be reasons.

The overall code is pretty much like this

`var_opt = torch.optim.Adam(Q.parameters(), lr=lr) while not converge: z.backward(retain_graph = True) g = x.grad.clone() x.grad.data.zero_() # do some other things here with g here var_opt.zero_grad() variance_loss = torch.mean(L(g)) variance_loss.backward() var_opt.step()`