KL divergence loss

Trying to implement KL divergence loss but got nan always.

p = torch.randn((100,100))
q = torch.randn((100,100))
kl_loss = torch.nn.KLDivLoss(size_average= False)(p.log(), q)
output = nan
p_soft = F.softmax( p )
q_soft = F.softmax( q )
kl_loss = torch.nn.KLDivLoss(size_average= False)(p_soft.log(), q_soft)
output = 96.7017
Do we have to pass the distributions (p, q) through softmax function always?

According to the docs:

As with NLLLoss , the input given is expected to contain log-probabilities and is not restricted to a 2D Tensor. The targets are given as probabilities (i.e. without taking the logarithm).

your code snippet looks alright. I would recommend to use log_softmax instead of softmax().log(), as the former approach is numerically more stable.

1 Like

Hi,
As i guess, KL divergence is supposed to return a numerical value representing the distance between 2 probability distributions in feature space. However, this is what I got using log softmax…

>>> k=torch.rand(256)
>>> k1=k.clone()
>>> F.kl_div(F.log_softmax(k),k1,reduction="none").mean()
tensor(2.4483)

On the other hand when I use simple log, the answer is zero, which is expected. Can you let me know what i should use when comparing 2 layers in pytorch with KL divergence?

The target should be given as probabilities:

k = torch.rand(256)
k1 = k.clone()
F.kl_div(F.log_softmax(k, 0), F.softmax(k1, 0), reduction="none").mean()
> tensor(6.2333e-10)
1 Like

Thank you ! I was missing that out :sweat_smile: