Hi @tom, I just saw one of Marco Cuturi’s recent papers, and if I understood correctly, it gives a method to calculate the divergence between distributions using something like SGD, rather than Sinkhorn’s algorithm
That’s much more understandable, but I’m not sure how easy it is to implement? Basically, it would involve constructing a layer which itself would involve a sgd loop! Pretty funky
I know that there’s already a leanrable quadratic programming layer that’s been implemented, https://github.com/locuslab/qpth but this seems more general than that
Anyway here’s the link, "Stochastic Optimization for Large-scale Optimal Transport "