Separate optimizer elements

Hi, all
My original code is CASE1 . and I saw GAN code from here (https://github.com/eriklindernoren/PyTorch-GAN/blob/master/implementations/gan/gan.py) and there is separation of g_loss and d_loss.
I`m not sure my implementation so I tried to change my code to this style.
I think below CASE1 and CASE2 is the same. but CASE 1 can converge but CASE 2 is not.
Any differences between them?

CASE 1.
optimizer = optim.Adam(list(generator.parameters()) + list(discriminator.parameters(),lr=utils.lr, betas=(0.9,0.999))
...
optimizer.zero_grad()
total_loss = (g_loss+d_loss) 
total_loss.backward()
optimizer.step()
CASE 2.
optimizer_g = optim.Adam(generator.parameters(), lr=utils.lr, betas=(0.9,0.999))
optimizer_d = optim.Adam(discriminator.parameters(), lr=utils.lr, betas=(0.9,0.999))
...
optimizer_g.zero_grad()
g_loss.backward(retain_graph=True)
optimizer_g.step()

optimizer_d.zero_grad()
d_loss.backward()
optimizer_d.step()

This is weird.
CASE 2 and CASE 1 should be equivalent
@apaszke can you comment?

They’re not completely the same. If you use some optimizers with momentum, the parameters can be even updated if they are not actually used (by the momentum term). Also it matters whether you detach the tensors before forwarding them on your module for generator and discriminator.