KLDivLoss returns negative value

I was calculating KL Divergence loss, and it is negative, that leads me to here. Thanks for all your previous answers.

This is the mathematical proof of why KLDLoss should be above zero:

The cornerstone of the proof is that for KLDLoss(p, q), sum(q) needs to equal one to make sure the loss is above zero. So even if you have p = log_softmax(tensor), you might still get negative values if your target is not a true distribution: sum(q) != 1

Also, see the awesome discussion here: