Hi, all
My original code is CASE1 . and I saw GAN code from here (https://github.com/eriklindernoren/PyTorch-GAN/blob/master/implementations/gan/gan.py) and there is separation of g_loss and d_loss.
I`m not sure my implementation so I tried to change my code to this style.
I think below CASE1 and CASE2 is the same. but CASE 1 can converge but CASE 2 is not.
Any differences between them?
CASE 1.
optimizer = optim.Adam(list(generator.parameters()) + list(discriminator.parameters(),lr=utils.lr, betas=(0.9,0.999))
...
optimizer.zero_grad()
total_loss = (g_loss+d_loss)
total_loss.backward()
optimizer.step()
CASE 2.
optimizer_g = optim.Adam(generator.parameters(), lr=utils.lr, betas=(0.9,0.999))
optimizer_d = optim.Adam(discriminator.parameters(), lr=utils.lr, betas=(0.9,0.999))
...
optimizer_g.zero_grad()
g_loss.backward(retain_graph=True)
optimizer_g.step()
optimizer_d.zero_grad()
d_loss.backward()
optimizer_d.step()