Thanks for all wonderful discussions.
I just wanted to confirm that the following two solutions would give the same and right optimization results (pytorch 1.9.0).
In such a pipeline:
optim1 = optim.Adam(G.parameters())
optim2 = optim.Adam(D.parameters())
G = Model1()
D = Model2()
recons, z = G(input)
loss1 = loss_func1(recons)
diff = D(z)
loss2 = loss_func2(diff)
loss3 = loss_func3(diff)
loss_G = loss1 + loss2 # we don’t want to update D parameters here
loss_D = loss3
Solution #1
optim1.zero_grad()
loss_G.backward(retain_graph=True)
optim2.zero_grad()
loss_D.backward()
optim1.step()
optim2.step()
Solution #2
optim1.zero_grad()
loss_G.backward(retain_graph=True, inputs=list(G.parameters()))
optim1.step()
optim2.zero_grad()
loss_D.backward(inputs=list(D.parameters()))
optim2.step()
Both of the solutions come from the previous solutions. Thanks again.