I am using torch.nn.functional.kl_div() to calculate the KL divergence between the outputs of two networks. However, it seems the output of F.kl_div() is not consistent with the definition.
For example, let assume the normalized pred = torch.Tensor([[0.2, 0.8]]), and target = torch.Tensor([[0.1, 0.9]]).
Then the output of F.kl_div() would be:
F.kl_div(pred, target, reduction=‘sum’, log_target=False) —> -1.0651
or
F.kl_div(pred, target, reduction=‘sum’, log_target=True) —> 0.1354
However, if I calculate the KL divergence according to the definition:
(pred * torch.log(pred/target)).sum() —> 0.0444
Does anyone know what is the reason for the difference (the torch version is 1.8)?