A loss function to estimate the difference between two distributions?


I know there is KL divergence loss in pytorch.

but the limitation with KL divergence is that sum of the distribution should be 1.

however, my target distribution is like 1d-Gaussian distributions where the sum of the distribution is not equal to1.

for example,


I tried to model this as a multi-label sigmoid cross entropy. for example,

target * (pred) + 1 - target(1-pred)

however, I found out this is not a good loss to model this problem.

for example, say that one of the target probability is 0.6. when pred is approaching to 0.6, the loss is not approaching to 0.

-(0.6  * (torch.log(0.6)) + 0.4 * (torch.log(0.4)))= 0.9163

I tried to model this problem as L1 loss, but the result is pretty bad.

any good advice for modeling this problem?

May be you could use KL Div loss on the normalized data (instead of applying it on the original data). So that the sum is always 1.