Meaning of wasserstein distance

So, I am basically training a GAN with WGAN-gp setup. After I train the critic (lets say 5 times) If I estimate the Wasserstein distance between real and fake samples like (critic(real) - critic(fake)) it gives me a positive real number. After few epochs the Wasserstein distance between becomes negative and goes on decreasing. So, my question is basically what does this positive distance imply ? And also what does this negative value imply (which I got after few epochs) ? So, does that mean one of G or D got too strong? How do we fix this issue ??

In Wassertein GAN, the discriminator does not really play the role of a critic between the real and fake image samples. Instead, it is providing a way to estimate the difference in distribution of real and fake samples.

And it is expected that the discriminator loss decreases over time. But if keeps on decreasing, I suspect that the weights are not clamped. So, you may want to clip the weights of the discriminator within a small range like [-0.01, 0.01] after each update applied to the weights of the discriminator.

Also, this blog-post very nicely describes the Wasserstein distance: https://lilianweng.github.io/lil-log/2017/08/20/from-GAN-to-WGAN.html

Hi Vahid,

Thanks for you input. I got what you are saying about the estimating the difference between distribution of real and fake. But, what does negative value mean ? Yeah, It is expected than discriminator loss decreases over time as we are reducing the distance between real and fake but here I was referring to critic(real) - critic(fake) (which is computed after critic training in a given epoch and then looking at the distance between real and fake i.e., after something critic has learned)

And also, are you sure about gradient clipping? I am using Improved training of WGAN where it uses gradient penalty unlike WGAN which uses gradient clipping. The blog kinda seems a detailed explanation, thank you I will go through it.

Hi, have you figure this out? I think I met the same problem as yours. The loss of the discriminator keeps decrease to very large negative numbers, and the things generated by the generator is a mess.