- You could do
for p in XXX.parameters(): p.requires_grad_(False)
for the bits you are not training and set them to True for the bits you are training. Most GAN examples do that, it saves getting and storing unneeded gradients. - You could have g_loss1 and g_loss2 for each discriminator and
g_loss = alpha * g_loss1 + beta * g_loss2
. To mix both criteria. - You likely don’t need
retain_graph=True
in the g_loss.backward if you just detach (after training the generator) or regenerate the fake image. A while ago, most GANs had several disc steps per generator step. I would not know if that changed. So it would be re-generating mostly. - Instead of
retain_graph=True
in the disc1 training, you might just add disc_one_loss and disc_two loss and do a single backward. - You didn’t ask about that, but it is very unlikely that you want
disc_..._loss = real_loss + fake_loss
.
If the generator minimizes criterion, then you probably want it to bereal_loss - fake_loss
, so that the discriminators push up the criterion for fake images.
Best regards
Thomas