KLDiv loss reduction

Aryan_Asadian · January 17, 2021, 5:53pm

Hi,

In torch.nn.KLDivLoss, when I set “reduction = mean”, I receive this warning:

UserWarning: reduction: ‘mean’ divides the total loss by both the batch size and the support size.‘batchmean’ divides only by the batch size, and aligns with the KL div math definition.‘mean’ will be changed to behave the same as ‘batchmean’ in the next major release.
warnings.warn(“reduction: ‘mean’ divides the total loss by both the batch size and the support size.”

What does Support size mean in this context?

InnovArul · January 17, 2021, 7:28pm

“Support size” refers to the dimension of the probability vector.

For example, if x, y are of dimension B x D (batch size = B, support size = D),

F.kl_div(x, y, reduction='mean') * D is equal to (F.kl_div(x, y, reduction='batchmean')

Aryan_Asadian · January 17, 2021, 7:57pm

Thank you for your reply. Does this “mean” reduction in KLDivLoss is equal to the functionality of “mean” in CrossentropyLoss as well?
In the documentation for torch.nn.CrossEntropyLoss it has mentioned :

'mean': the weighted mean of the output is taken

InnovArul · January 18, 2021, 5:34am

In my understanding, batchmean of KLDiv is similar to the mean of CrossEntropyLoss.

karl7 · May 14, 2021, 9:21am

hi, @InnovArul , i also met this warning in computing kl loss, do i need to set the reduction to ‘none’ to avoid this warning? or just ignore it? which is the right way?

InnovArul · May 17, 2021, 10:04pm

In my experience, for most cases, reduction='batchmean' would suffice.