Hi @tom, itās really cool that youāre getting interested in this problem. I just want to avoid you going down blind alleyās.
The paper "Stochastic Optimization for Large-scale Optimal Transport " https://arxiv.org/abs/1605.08527, is a conference paper, and theyāre usually a bit of a wild card. I spent a while on the code, audeg/StochasticOT, and itās probably more suited to information retrieval rather than actually training a network like @smthās GAN code, or as a layer in one. So I think I made a mistake, and itās perhaps not such a good idea implementing it as a layer.
I donāt want to mislead you - itās probably a good idea to work on something thatās been proven to be useful. If you simply try to reproduce Chiyuan Zhang (pluskid) Wassertein.jl layer, in the code at the top of this thread, that would be a safe thing to do. Thatās something that were reasonably sure workās, and if you get it working in PyTorch it would be something that you could reference to, and reuse in the future.
Anyway, starting with the easy stuff - at the moment I canāt get the entropy regularised version in,
https://github.com/rflamary/POT/blob/master/examples/Demo_1D_OT.ipynb.
and the exact EMD solver used in PyEMD to give roughly the same numberās? What I mean is, (I think) the functions in the POT library return the transportation plan matrix T, and to get the actual wasserstein/EMD divergence, you calculate, the inner product,
<T,M> = trace(T'M)
Where M is the ground metric matrix. But I get different numbers from both libraries, so Iām not sure which is right? I know you need to tune the regularization parameter lambda, but it should be easy to do that using downhill simplex/Nelder Mead
I want to get these test cases to reconcile - that way weāve got something to check against when we try to code things, and do more complicated stuff in PyTorch. Otherwise, itās too easy to make a mistake, without something solid to test against. 