Using single optimizer to train both generators in CycleGan

I was looking at cycleGAN’s implementation in PyTorch by the author of the paper. In the code author chained the parameters of both generators and passed them to Adam optimizer. I don’t understand the intuition behind training both networks with single optimizer. Shouldn’t we use different optimizers for different networks ? Where am I wrong ?

1 Like

The code I looked upas two optimizers: optimizer_G and optimizer_D. Can you point to the code that you are referring to?

Yes the code uses optimizer_G to train Generator AB and Generator BA. They are different models and need different gradients so why did the author use one optimizer ? Perhaps I am understanding it wrong.

1 Like
loss_G = loss_identity_A + loss_identity_B + loss_GAN_A2B + loss_GAN_B2A + loss_cycle_ABA + loss_cycle_BAB
        loss_G.backward()
        
        optimizer_G.step()

why are all the losses being added when both models have different parameters ?

Check out this answer.

What optimizer does is just updating its tracking, corresponding weights. For example, if network GA should be updated but network GB doesn’t need to, then this single optimizer which is tracking both GA and GB will just gonna update GA’s parameter but not GB’s parameter.

Official repo’s issue says this simplifies the code but the effect should be the same.