I’ve noticed that the pytorch implementation of KL divergence yells different results from the tensorflow implementation. The results differ significantly (0.20, and 0.14) and I was curios what could be the reason. Below you can find a small example. Any help will be more than appreciated.
Hi,
I have not read the distribution package source code, but from what I know from the C++ source code, I prefer using torch.nn.functional.kl_div function to calculate the divergence.
Based on the source code, you should provide log_probs for the target.
Notice that PyTorch use kl_div like this: kl_div(b, a) for kl_div(a||b), so it means you need to use following code to get the same result as Tensorflow.
I just found the solution using distribution package too.
As I mentioned in the previous post, the target should be log_probs, so based on, we must have these:
Could you also help me with the differences between tf/pytorch and numpy. It seems to be working fine when the input is 2d, but when the input has more than 2 dimensions it doesn’t. For example now I’m trying with a 4d array where the distributions are on the last axis. This is my implementation:
@razvanc92 Sorry for late reply, I was dealing with a bunch of problems.
If I want to be frank with you, I could not get same output for random generated numbers using both nn.kl_div or formula itself. Can you state your last post as a separate question?
And please mention me there too, so I can understand what is really happening there.
Maybe other experienced could help us too.