Hi Csaba, Jarrel,
thank you for looking at this in detail!
I must admit that the mathematician in me cringes a bit @botcs's argument.
As @jarrelscy mentions, this is symmetric (it is a distance after all).
What happens mathematically is that the discriminator - the test function in the supremum - will ideally converge to the negative of what you get when you switch the signs between real and fake. The only important thing is to have opposing signs between the pair (real discr, fake discr) in the discriminator and (fake discr, fake gen) in the generator, the latter is because we want to maximize the difference between the integrals in the discriminator but do so by minimizing the negative.
So approximately (if the penalty term were zero because the weight was infinite) the Wasserstein distance is the negative loss of the discriminator and the loss of the generator lacks the subtraction of the integral on the real to be the true Wasserstein distance - as this term does not enter the gradient anyway, is is not computed. This is independent of how you pick the signs.
I took the signs from the WGAN code published by Martin Arjowsky on github.
Note that some time after writing the notebook you linked, I arrived at the conclusion that one-sided loss is better.