[SOLVED] GAN with 3 networks

I have 2 generators. Both are competing against each other in typical GAN fashion.
The 1st generator acts like an auto-encoder and the 2nd generator outputs a different result.
I want to introduce a 3rd network, i.e. a classifier, that classifies the input and output of the 1st generator into 1 of 2 classes: encoded and non-encoded.
When training, all three losses (1st, 2nd generators and classifier losses) decrease nicely, but when evaluating, the classifier loss is high (not the generators’). Now, you want the encoded and non-encoded tensors to be similar but I would expect the classifier to still perform well during evaluation.
Is there any reason why 2 networks (2nd generator and classifier) competing against 1 network (1st generator) wouldn’t work?

Details:
The classifier is an inceptionresnetv2 classifier taken from cadene/pretrainedmodels.
The classifier loss is CrossEntropy (implicit logsoftmax activation function).
All three losses are summed together to give a master loss. I then call loss.backward(), which should back-propagate gradients back through all three networks.

Think you could include a diagram of this setup? And maybe a reference of a similar project? It’s a bit unclear what the goal is.

I don’t think that you can just call .backward() on a summed loss (though I could be wrong) and have it propagate through all three of these networks in order to attribute the correct gradients to the various parameters. But again, it’s hard to tell without a clearer picture of what you are trying to accomplish.

This is a shameless self-reference, but here I was working with a GAN/Autoencoder setup that uses two loss calculations. One for the AE’s reconstruction loss, and one for the D(G(z)) GAN loss. You may also look up the EBGAN paper (Appendix C) which uses an autoencoder as a discriminator.

So you can call backward on summed loss

The purpose of the classifier is to act like a discriminator and improve the performance of the 1st generator. It’s also useful for me to know when the input of the 2nd generator is encoded or not encoded.

It’s worth saying that the 1st and 2nd generators still train perfectly fine and achieve similar results to when the classifier is not included. What’s weird is the loss function of the classifier decreases when the model is training, i.e. when classifier.train() is called, but not when it is evaluated, i.e. when classifier.eval() is called.

I am pretty sure that this is expected behavior. .eval() changes the behavior of certain Module supertypes. Dropout, for instance, becomes the identity function when that method is called on it.

Please let me clarify. I don’t expect the loss of the classifier to decrease when evaluating, but I would expect it to be roughly the same as when it is being trained. The loss of the classifier during training is ~ 0.001 using nn.CrossEntropy but is ~ 4.1 when evaluating.

Also, I checked, .eval() doesn’t affect nn.CrossEntropy

I think this is a shaky expectation. If a model does not reduce the value of the objective function on unseen examples after increased training, it is a sign that the model is not effective for the data.

But, it is unclear what “evaluation” means in the context of generative adversarial networks, without seeing the code that you’re working with (or a minimum viable example). GANs (from my knowledge) typically don’t have training/testing/validation passes in the way that a classifier does.

Though, I suppose StyleGAN uses Frechet inception distance to measure similarity of generated examples relative to the dataset.

Solved it! So when I feed the input and output of the 1st generator into the classifier, I was concatenating them batchwise WITHOUT any shuffling. This is bad. Turns out, this has the effect of training well but validating badly. If you randomise batchwise, it both trains and validates well.

1 Like

Rookie mistake really. Hopes this helps someone in the future. ALWAYS randomise your inputs!