The GAN loss is defined:
Prob_1 = D(real_data)
Prob_2 = D(G(z))
D_loss = log(prob_1) + (1 - log(prob_2))
G_loss = -log(D(G(z))
The loss are plotted here:
And as expected, the Prob_1 increases and Prob_2 decreases as the training goes, which means D is learning the right thing as expected.
Does this mean that the training of G and D are not balanced? G cannot keep up with D?
So intuitively, I think it would be better if G is trained more than D, but according to tips for training GAN, it would be a bad practice. (I tried other tips in this article like optimizers, activation functions, batchNorms etc.)
Would it be a good idea to train G more than D?
Or maybe there are some tips I should try?
GANs are a highly active topic for research. Generally GANs don’t converge well. A typical GAN loss should be something where G loss
log(D(G(z)) maximizes and D loss
log(D(x))+log(1-D(G(z)) minimizes. But that’s not the scenario all the time. Most of the time the discriminator gets fooled easily by the generator. To avoid this:
- Update the discriminator more often than the generator.
- Use WGAN with Gradient Penalty.
In some cases the discriminator gets an upperhand. To avoid this:
- Add dropout and leaky relu in the discriminator layers.
- Use label smoothing.
- Add noise to both generated and real input before feeding them to the discriminator.
**Note: Don’t expect GAN to converge they may converge and may not. But usually if your generator and discriminator loss converges together then your GAN is good.