Is there any implementation of EMD in pytorch?

What are your probability measures, then, to measure the distance between?
Usually, you have something like a PD of 10 elements just like you (conceptually) would have for classification and then KL divergence / Cross Entropy to the (peaked at target) distribution. That can be straightforwardly replaced by Wasserstein distance as Frogner et al do and (in the regularized case) can be done with my implementation.
Note that the whole thing is somewhat numerically sensitive.

Best regards

Thomas