KL divergence different results from tf

Hi,
I have not read the distribution package source code, but from what I know from the C++ source code, I prefer using torch.nn.functional.kl_div function to calculate the divergence.

Based on the source code, you should provide log_probs for the target.
Notice that PyTorch use kl_div like this: kl_div(b, a) for kl_div(a||b), so it means you need to use following code to get the same result as Tensorflow.

preds_torch = torch.Tensor(preds)
labels_torch = torch.Tensor(labels)
out = F.kl_div(labels_torch.log(), preds_torch, reduction='sum')
print(out.item())  #0.2038460671901703

Also, it is equivalent to:

out = (preds_torch * (preds_torch / labels_torch).log()).sum()
print(out.item())

In the end, I am really not sure about distribution package yet. I will check it out and let you know if you are interested.

Further reading:

Good luck
Nik

1 Like