I was looking at cycleGAN’s implementation in PyTorch by the author of the paper. In the code author chained the parameters of both generators and passed them to Adam optimizer. I don’t understand the intuition behind training both networks with single optimizer. Shouldn’t we use different optimizers for different networks ? Where am I wrong ?
The code I looked upas two optimizers: optimizer_G
and optimizer_D
. Can you point to the code that you are referring to?
Yes the code uses optimizer_G to train Generator AB and Generator BA. They are different models and need different gradients so why did the author use one optimizer ? Perhaps I am understanding it wrong.
loss_G = loss_identity_A + loss_identity_B + loss_GAN_A2B + loss_GAN_B2A + loss_cycle_ABA + loss_cycle_BAB
loss_G.backward()
optimizer_G.step()
why are all the losses being added when both models have different parameters ?
Check out this answer.
What optimizer does is just updating its tracking, corresponding weights. For example, if network GA
should be updated but network GB
doesn’t need to, then this single optimizer which is tracking both GA
and GB
will just gonna update GA
’s parameter but not GB
’s parameter.
Official repo’s issue says this simplifies the code but the effect should be the same.