How can I use KL divergence loss instead of MSE loss for regression? Let’s say in a batch of 30 samples we have 30 ground truth labels. In the case of MSE utilization, the mean square of these 30 labels would be calculated and backpropagated to the layers. Now I want to fit a gaussian or a GMM to the 30 labels and get the KL divergence between the true and the predicted distribution and use it as the loss function of the network. How can I do so with Pytorch tensors?
What is the true distribution in this case? You can try something similar to VAE (Variation Auto Encoder), let the network predict the parameters of the distribution in case of Gaussian it is N means and N std’s. Then sample from the output and try to match that with the label and use KL between a predefined Gaussian such as N(0, 1) and your model output.