I know there is KL divergence loss in pytorch.
but the limitation with KL divergence is that sum of the distribution should be 1.
however, my target distribution is like 1d-Gaussian distributions where the sum of the distribution is not equal to1.
I tried to model this as a multi-label sigmoid cross entropy. for example,
target * (pred) + 1 - target(1-pred)
however, I found out this is not a good loss to model this problem.
for example, say that one of the target probability is 0.6. when pred is approaching to 0.6, the loss is not approaching to 0.
-(0.6 * (torch.log(0.6)) + 0.4 * (torch.log(0.4)))= 0.9163
I tried to model this problem as L1 loss, but the result is pretty bad.
any good advice for modeling this problem?