Combining WGAN loss with other loss functions (L1, VGG, etc.)

ZimoNitrome · November 21, 2020, 9:41pm

Hello,
I am re-writing a GAN (cGAN) into a Wasserstein GAN.
My original generator is trained both with adversarial loss from the discriminator but also with L1-loss between the generated fake and the target (I am also experimenting with VGG-loss and L2-loss).

My Wasserstein GAN works as expected when only using an adversarial loss but since it uses Wasserstein distance, the critic outputs losses which can range between 1e-5 to 1e6, shifting throughout the training. Combining other loss functions which generally have ranges from 0-1 feels next to impossible even with scaling factors.

I have therefore currently added a Tanh activation function for my discriminator, but I wonder if this is the way to go. It is not the true Wasserstein distance but the loss is “standardized”.

If you know of any project using WGAN loss in combination with other loss functions, please let me know.

ZimoNitrome · November 30, 2020, 11:34am

I found two articles using WGAN with other losses:

Both of these have available implementations on Github but they seem to simply scale the other losses i.e. VGGloss with a huge constant like 1000. In my experience trying this out, the adversarial Wasserstein loss started out way too small to have much impact on what was being learned until it passes the other losses, at which point the other losses have no impact.

My current solution is to revert back to using a normal GAN.

ZimoNitrome · September 13, 2021, 7:22pm

Apparently "it just works"™

The key might be to backprop in different stages. More specifically the following training scheme seems to work well:

Train D on real images
Train D on generated images
Train G with loss from D
Train G with any other objectives (i.e. L1)

After each stage, I used backwards() on each network while calling optimizer.step after stage 2 and 4.