KLDiv loss reduction


In torch.nn.KLDivLoss, when I set “reduction = mean”, I receive this warning:

UserWarning: reduction: ‘mean’ divides the total loss by both the batch size and the support size.‘batchmean’ divides only by the batch size, and aligns with the KL div math definition.‘mean’ will be changed to behave the same as ‘batchmean’ in the next major release.
warnings.warn(“reduction: ‘mean’ divides the total loss by both the batch size and the support size.”

What does Support size mean in this context?

“Support size” refers to the dimension of the probability vector.

For example, if x, y are of dimension B x D (batch size = B, support size = D),

F.kl_div(x, y, reduction='mean') * D is equal to (F.kl_div(x, y, reduction='batchmean') 
1 Like

Thank you for your reply. Does this “mean” reduction in KLDivLoss is equal to the functionality of “mean” in CrossentropyLoss as well?
In the documentation for torch.nn.CrossEntropyLoss it has mentioned :

'mean': the weighted mean of the output is taken

In my understanding, batchmean of KLDiv is similar to the mean of CrossEntropyLoss.

hi, @InnovArul , i also met this warning in computing kl loss, do i need to set the reduction to ‘none’ to avoid this warning? or just ignore it? which is the right way?

In my experience, for most cases, reduction='batchmean' would suffice.