Backward() w.r.t. to different parameters with different loss function

ElleryL · May 21, 2018, 4:38am

Consider

I have Variable x

y = f(x)
z = Q(y) # Q here is a neural net

Step(1): gradient w.r.t. x.

z.backward(retain_graph = True)
x.grad
g = x.grad.clone()
x.grad.data.zero_()

Step(2): have another function that take the gradients we just compute

L(g)

I want to take gradient of it w.r.t. to the weights of neural net Q

as following

var_opt = torch.optim.Adam(Q.parameters(), lr=lr)
while not converge:
    var_opt.zero_grad()
    variance_loss = torch.mean(L(g))
    variance_loss.backward()
    var_opt.step()

The thing I’m worry is that first time backward() I have z.backward(retain_graph = True). If I don’t set this to be True; my second time backward() gives runtime error; however, I feel this is wrong, because first backward is w.r.t. x; the second time backward, I’m going for Q’s parameters.

Is this code correct written? I’m currently having bugs and I feel this might be reasons.

The overall code is pretty much like this

var_opt = torch.optim.Adam(Q.parameters(), lr=lr)
while not converge:

    z.backward(retain_graph = True)
    g = x.grad.clone()
    x.grad.data.zero_()
    # do some other things here with g here

    var_opt.zero_grad()
    variance_loss = torch.mean(L(g))
    variance_loss.backward()
    var_opt.step()