Separate optimizer elements

oasjd7 · November 10, 2018, 1:56am

Hi, all
My original code is CASE1 . and I saw GAN code from here (https://github.com/eriklindernoren/PyTorch-GAN/blob/master/implementations/gan/gan.py) and there is separation of g_loss and d_loss.
I`m not sure my implementation so I tried to change my code to this style.
I think below CASE1 and CASE2 is the same. but CASE 1 can converge but CASE 2 is not.
Any differences between them?

CASE 1.
optimizer = optim.Adam(list(generator.parameters()) + list(discriminator.parameters(),lr=utils.lr, betas=(0.9,0.999))
...
optimizer.zero_grad()
total_loss = (g_loss+d_loss) 
total_loss.backward()
optimizer.step()

CASE 2.
optimizer_g = optim.Adam(generator.parameters(), lr=utils.lr, betas=(0.9,0.999))
optimizer_d = optim.Adam(discriminator.parameters(), lr=utils.lr, betas=(0.9,0.999))
...
optimizer_g.zero_grad()
g_loss.backward(retain_graph=True)
optimizer_g.step()

optimizer_d.zero_grad()
d_loss.backward()
optimizer_d.step()

bhushans23 · November 10, 2018, 7:51am

This is weird.
CASE 2 and CASE 1 should be equivalent
@apaszke can you comment?

justusschock · November 10, 2018, 12:12pm

They’re not completely the same. If you use some optimizers with momentum, the parameters can be even updated if they are not actually used (by the momentum term). Also it matters whether you detach the tensors before forwarding them on your module for generator and discriminator.