I have been thinking on methods for image comparison. An idea that I had would be to compute the mutual information of two images. This is pretty straighforward to do in numpy, but unusable for a loss function as there is no backward method.
One idea I had was to look at the KL divergence loss function and try to do something similar but I got lost in the code. KL divergence and mutual information look somehow alike, with the notorious exception of the joint probability distribution in mutual information.
Maybe you can point me at the correct location of KL divergence in the code? Or maybe you even know somebody who explains how to implement KL divergence (even mutual information) as a loss function and can give me a link to his blogpost?